Quality Threshold is the minimum acceptable output standard for a given task class — defined before routing decisions are made and used to bound Intelligence Arbitrage such that cost optimisation never routes a task to a model incapable of meeting the standard the revenue loop requires. The framing of model routing as a cost reduction tool is accurate but structurally incomplete. Routing to a cheaper model reduces cost. Routing to a cheaper model without a defined quality bound may also reduce output quality — silently, without triggering an error signal, while the Escalation Rate rises for task classes where the cheaper model is insufficient and the system does not know it.

The Inference Floor argument established that frontier model capability has converged on most operational task classes — making model selection procurement rather than strategy. This convergence is what makes Intelligence Arbitrage economically significant: if multiple models can produce equivalent output on a given task class, the routing decision is straightforward — use the cheapest capable one. The engineering discipline that makes this decision safe is the Quality Threshold: the specification of what “equivalent output” means for each task class, defined precisely enough that it can be evaluated by logic rather than by impression.

Two task classes, two routing strategies

Structured output tasks and open-ended reasoning tasks require different routing strategies. For structured output tasks — extracting fields from a document, classifying an input into a predefined taxonomy, generating a formatted record from raw data — the Quality Threshold is a schema specification: the model must produce output that conforms to the defined structure, within the defined field constraints, without hallucinated values. This threshold can typically be met by small, fast, inexpensive models. The Inference Floor has already reached most structured extraction tasks. Routing these to a frontier model at frontier pricing is Operational Arbitrage surrendered.

For reasoning tasks — generating a sales coaching recommendation based on a call transcript, identifying the root cause of an escalation from an operational log, producing a novel exception resolution — the Quality Threshold is harder to specify because the correct output is not a schema. For these task classes, the Quality Threshold must specify an accuracy benchmark against a sample dataset: the model must produce outputs that a domain expert rates as acceptable on at least X% of test cases in the defined task domain. This threshold requires pre-validation before the routing decision is made — a model that passes the threshold on 92% of test cases is eligible for routing; one that passes on 76% is not, regardless of how much cheaper it is. The Quality Threshold is the instrument that makes this pre-validation systematic rather than anecdotal.

Structured output and routing compose directly

The most productive routing insight in practice is that structured output specification and model routing are architecturally complementary. A task class with a schema-defined Quality Threshold can be routed to small, cheap models because schema compliance is evaluable by logic — the routing decision is provably safe for any model that reliably produces schema-conformant output. This includes the majority of T1 task classes in most revenue loops: document processing, field extraction, classification, formatting, and data normalisation. For these tasks, the Quality Threshold is a one-time design decision that makes Intelligence Arbitrage available indefinitely: as new models emerge with equivalent schema-compliance capability at lower cost, the routing decision updates automatically within the defined threshold, and the cost advantage compounds without requiring architectural changes.

AI Gateway implements this mechanism in production: a provider-neutral routing layer that connects to different LLMs and updates when new models are released. “You can try the latest model when it gets updated on a Sunday without touching your application code.” This is Architectural Decoupling at the routing layer — the application code defines the Quality Threshold; the gateway handles the routing decision against that threshold as the model landscape evolves. The threshold is stable. The routing is dynamic. The cost compounds.

What happens without the Quality Threshold

Without a defined Quality Threshold, routing becomes a cost tool that operates by trial and error. The team routes a task class to a cheaper model, observes whether the output “seems” acceptable, and makes the routing decision on impression. This approach has two failure modes. First, gradual output degradation that does not trigger visible errors: the cheaper model produces outputs that are structurally correct but subtly wrong in ways that compound in downstream task classes before the error becomes visible. The Escalation Rate for downstream task classes rises; the cause is the routing decision upstream. Second, Execution Divergence that appears as a routing failure but is actually a threshold failure: the cheaper model handles 92% of inputs correctly but fails on the 8% that require reasoning the model cannot perform. Without a defined threshold, the 8% failure rate is a surprise. With a threshold, it is a pre-validated exclusion.

The Operator’s Verdict

The Quality Threshold is a design decision, not an optimisation. Made at design time, it makes Intelligence Arbitrage available for the lifetime of the system, compounding the cost advantage every quarter as the Inference Floor advances to new task classes. Made after the fact, it is a debugging exercise that costs more than the routing savings it was designed to capture.

Technology changes what models cost. The Quality Threshold determines what routing safely captures.

KEY TAKEAWAY

What is the Quality Threshold and why is it the precondition for safe model routing?

The Quality Threshold is the minimum acceptable output standard for a given task class — defined before routing decisions are made and used to bound Intelligence Arbitrage routing such that cost optimisation never routes a task to a model incapable of meeting the standard the revenue loop requires. For structured output tasks, the threshold is a schema specification: the model must produce schema-conformant output without hallucinated values. For reasoning tasks, the threshold is an accuracy benchmark against a pre-validated test dataset. Without a Quality Threshold, routing to cheaper models degrades output quality silently — the model produces plausible results that violate business constraints without triggering error signals, raising the Escalation Rate for downstream task classes before the cause is visible. With a Quality Threshold, routing is provably safe for any model that meets it, and the cost advantage of routing to cheaper models compounds automatically as the Inference Floor advances to new task classes. Key metric: structured output tasks — the majority of T1 task classes — can be routed to small, inexpensive models when the Quality Threshold is schema-defined, because schema compliance is evaluable by logic. The Inference Floor has already reached most structured extraction tasks.