Optimization benchmarks in finance often collapse into one number: speedup. That number is easy to remember and easy to misuse. A solver can be fast on a toy problem, fast after expensive preparation, or fast while producing a portfolio that misses the business constraints.

Separate cold-start evidence from repeated-run evidence

A cold-start run answers one question: how long does the system take when nothing useful is already warm? A repeated-run benchmark answers another: what happens when the workflow is already in production shape and the same structure is solved many times. Both are valid, but they should not be mixed.

For PRISM, this is why the public evidence separates corrected cold-start timing on a smaller real-data universe from repeated-run transition workflows at larger scale. The buyer should ask which case resembles their production path.

Quality gates matter as much as runtime

In portfolio construction, a fast answer with unacceptable tracking error, constraint drift, or tax behavior is not a win. Benchmark tables should report the quality gap against the reference method and state the pass/fail threshold before the result is interpreted.

Use matched workloads for final judgment

Public benchmarks establish credibility. They do not replace your own workload. The strongest evaluation uses your universe, your constraints, your risk model, your trading rules, and your current baseline. That is the point of a matched-workload pilot.

Practical next step: define one representative daily rebalance, one transition scenario, and one stress case. Measure runtime, quality gap, failure rate, and operational artifacts for each.