The purpose of a matched-workload pilot is narrow: compare the candidate engine against your current baseline on the same universe, constraints, timing protocol, and quality thresholds. Anything else becomes sales theater quickly.

Define the representative workloads first

Pick a small set of cases that reflect production: a normal rebalance, a transition, a tax-aware scenario, and a stress case. Include the data preparation, solve, verification, and output formatting steps in the timing window.

Measure operational behavior

Runtime matters, but so do failure modes. The pilot should record solve failures, infeasible requests, constraint violations, quality gaps, audit artifacts, and replay behavior. Silent failure is more dangerous than a loud failed solve.

Decide before seeing results

Before the pilot starts, write down the acceptance criteria. For example: maximum runtime, acceptable quality gap, required constraint pass rate, integration requirements, and deployment boundaries. This prevents the evaluation from drifting after the numbers arrive.

Practical next step: turn one existing production rebalance into a pilot fixture with anonymized inputs, expected outputs, and baseline timings.