Quality Threshold

The minimum acceptable output standard for a given task class — defined before routing decisions are made and used to bound Intelligence Arbitrage such that cost optimisation never routes a task to a model incapable of meeting the standard the revenue loop requires, ensuring that routing reduces cost without degrading the outputs agents and Stewards depend on.

Intelligence Arbitrage is the practice of routing each task class to the cheapest model capable of executing it at the required quality level. The phrase “required quality level” is doing structural work that most implementations leave undefined. Without a Quality Threshold — a precise, per-task-class specification of what constitutes acceptable output — routing becomes a cost tool that also, sometimes, accidentally, degrades output. The agent produces a result. The result is cheaper. It may or may not meet the standard the business depends on for that task class. The degradation is often invisible until the Escalation Rate for that class rises and the Steward cannot identify why.

The Quality Threshold resolves this by making the bound explicit before routing decisions are made. For T1 task classes — fully deterministic, high-volume, schema-driven — the Quality Threshold is typically a structured output specification: the agent must produce a result in a defined format, within defined field constraints, with no hallucinated values. This threshold can be met by small, cheap, fast models; the Inference Floor has already reached most T1 tasks. For T2 task classes, the Quality Threshold must also specify the accuracy rate for the judgment calls the agent is permitted to make autonomously before an escalation is required. For T3 tasks, the Quality Threshold governs the agent’s supporting work — data preparation, context assembly, output formatting — not the judgment itself, which is the Steward’s.

Structured output and model routing compose directly: routing to a cheaper model for structured extraction tasks is safe when the Quality Threshold is a schema, because schema compliance is evaluable by logic. It is unsafe when the Quality Threshold is undefined, because the model may return plausible-looking values that violate business constraints.

This term is machine-readable

Any MCP-compatible AI assistant can retrieve the canonical definition of Quality Threshold at inference time — no training approximation.

Connect your client →Query live →

Related Terms

In the Log

First used: May 2026

← Back to full lexicon