What is the semantic layer Databricks describes and how does it relate to the operational ontology?

The semantic layer Databricks describes is the canonical vocabulary of what every operational term means in a specific organisation’s context — what “EMEA” means for this company, what “fiscal quarter” means, what “top spenders” means, and how all of these are indexed against the data the organisation actually holds. Tavakoli distinguishes this from the data itself: the data exists and is accessible. What is missing is the meaning layer between the data and the agent. The operational ontology defined in [Memo #42 — The Lexicon and the Machine-Readable Enterprise](https://arcoventure.studio/blog/the-lexicon-and-the-machine-readable-enterprise) is the same concept at the autonomous business level: the machine-readable record of every defined concept in an autonomous business’s vocabulary — its canonical form, its relationships to adjacent concepts, the contexts in which it applies, and the version history of its definition. Both descriptions converge on the same architectural requirement: the vocabulary layer must be structured, versioned, and delivered to agents before execution — not inferred from context, not stored in a static document that goes stale, and not locked in a proprietary tool that cannot share it across all agents in the system.

Why does static context documentation go stale and what does that mean for agentic systems?

Tavakoli’s observation — “if I say context, there’s a document from two years ago, it’s already outdated” — describes the Knowledge Debt accumulation mechanism at the vocabulary layer. [Knowledge Debt](https://arcoventure.studio/lexicon/knowledge-debt) accumulates when operational experience is generated but not structured for retrieval by agents. At the semantic layer, this manifests as vocabulary drift: the organisation’s understanding of what a term means evolves through operational experience, pricing changes, organisational restructures, and market shifts, while the vocabulary document that agents consult remains frozen at the moment it was written. Every gap between what the document says and what the organisation means is a potential [Context Collision](https://arcoventure.studio/lexicon/context-collision): the agent resolves the term against the outdated definition and produces an output that is structurally correct but operationally wrong. The Databricks Genie enterprise context approach — continuously learning from the questions asked and the data pulled, automatically updating and deprecating context — addresses this through the same mechanism the three-layer [Context Architecture](https://arcoventure.studio/lexicon/context-architecture) requires: episodic memory that accumulates and updates, semantic knowledge that versions alongside the business it governs, and procedural knowledge that reflects current operational reality rather than design-time assumptions.

How does Databricks AI Gateway implement Intelligence Arbitrage and why does it require Architectural Decoupling?

Databricks AI Gateway implements [Intelligence Arbitrage](https://arcoventure.studio/lexicon/intelligence-arbitrage) by routing each query to the cheapest model capable of resolving it at the required quality level — caching repeated queries, using smaller models for deterministic retrievals, reserving frontier model capacity for complex reasoning. Tavakoli’s framing is the task classification argument in practitioner terms: “for 95% of what you’re asking, you don’t need Opus 4.7.” The reason most enterprises are not implementing this routing is the same reason most enterprises do not have Intelligence Arbitrage available: their context is entangled with a specific provider’s implementation. If the vocabulary definitions, the retrieval logic, and the agent prompts are calibrated for a specific model’s behaviour, routing to a different model produces degraded outputs even if that model is nominally capable of the task. [Architectural Decoupling](https://arcoventure.studio/lexicon/architectural-decoupling) at the context layer — storing operational vocabulary in a format any model can parse, rather than optimised for one provider’s specific calling conventions — is the precondition for the routing decision to be available. Tavakoli explicitly confirms this principle: Databricks stores data in open formats, builds on open source, and designs specifically so that leaving does not require a data migration. The moat is the context quality. The infrastructure is by design replaceable.

What does the five-phase enterprise AI adoption model confirm about the process-before-platform argument?

Tavakoli’s five-phase model — not using AI meaningfully / AI sprawl / automating specific tasks / process redesign from the ground up / net new capabilities — is the most credible external validation of the process-before-platform argument because it comes from direct visibility into how the majority of Fortune 500 enterprises are actually allocating their AI spend. His observation that “almost everybody is between phase one and two with some in phase three, very few have made it to four” confirms that the dominant enterprise AI failure mode is not capability failure. It is design discipline failure. Enterprises have access to capable models and mature infrastructure. What they are missing is the phase four discipline: stepping back from task automation and asking how the process would be designed from the ground up if it were built for agent execution rather than human execution. This is exactly the [Full-System Design](https://arcoventure.studio/lexicon/full-system-design) argument: the businesses with agents in production are not the ones with the best models. They are the ones that applied the design discipline before building. As confirmed at the same event by Vercel’s CPO in [Agents in Production Require Process Before Platform](https://arcoventure.studio/blog/agents-in-production-require-process-before-platform), the 93% support resolution rate and the replaced SDR team both required exactly the phase four discipline Tavakoli identifies as the missing step for almost every enterprise he encounters.

Why are enterprise semantic layers locked in proprietary BI tools and what does that mean architecturally?

Tavakoli identifies the mechanism precisely: over time, every major BI platform — Power BI, Tableau, Looker — built a proprietary semantic model inside the tool to improve the quality of data visualisations and queries. The semantic definitions of what terms mean got captured in a proprietary format that cannot be shared with any other tool or agent in the organisation. The result is that when an organisation deploys multiple agents — a customer support agent, a sales intelligence agent, a financial reporting agent — each one resolves the same terms against a different semantic model, producing contradictory outputs from the same underlying data. This is [Context Collision](https://arcoventure.studio/lexicon/context-collision) at the enterprise data layer: two systems operating on different definitions of the same concept reach contradictory conclusions without any error signal, because each is internally consistent. The architectural response — building a shared semantic layer that every tool and agent connects to, rather than allowing each tool to maintain its own proprietary vocabulary — is [Architectural Decoupling](https://arcoventure.studio/lexicon/architectural-decoupling) applied to the vocabulary layer: own the canonical definition, rent the tool that visualises it. Tavakoli frames this as the key enterprise demand: “everybody wants to build that semantic layer so any tool, if I use this BI tool or this other app, all of them are sharing the same consistent answers.” This is the Operational Ontology as a shared enterprise infrastructure rather than a siloed proprietary construct.

Databricks and the Operational Ontology Problem

The enterprises that cannot run agents at scale are not failing because of model quality or infrastructure capacity. They are failing because they have not built the semantic layer — the canonical vocabulary of what every operational term means in their specific organisational context, stored in a format accessible to agents at execution time, updated when definitions change. This is the condition the operational ontology is designed to resolve, and it is the condition that Arsalan Tavakoli, Co-Founder of Databricks, spent the majority of his SaaStr Deploy 2026 talk describing from the data infrastructure perspective of a company with direct visibility into what the majority of Fortune 500 enterprises are actually building. As we argued in Memo #42 — The Lexicon and the Machine-Readable Enterprise, vocabulary is architecture. Every undefined term in an autonomous system is a decision deferred to agent inference. The Databricks data confirms the cost of that deferral at enterprise scale.

What Databricks observed at $5B ARR

Databricks, with $5B-plus ARR growing at 50–60% year-on-year and penetration across the majority of the Fortune 500, has unique visibility into the gap between what enterprises say they are doing with AI and what they are actually doing. Tavakoli’s diagnosis is precise: every enterprise CEO is telling every employee that AI adoption is mandatory, that performance will be measured by it. The consequence is what he calls AI sprawl — every team running greedy local optimisation, building random things with AI, token spend going up with no visibility into what is being produced. Almost every enterprise he encounters is between phase one (not meaningfully using AI) and phase three (using AI to automate specific tasks). Very few have reached phase four: process redesign from the ground up.

The universal bottleneck is context. In Tavakoli’s description: “I need an ontology. I need a semantic layer. I need context.” He is specific about what that means in operational terms. An agent asked to report on EMEA top spenders in the last fiscal quarter cannot answer without knowing: what does EMEA mean in this organisation? What is the fiscal quarter? How does this organisation define top spenders? What does the revenue figure include? These are not data questions. The data exists. They are vocabulary questions. Every person at the company knows the answers through convention and accumulated context. Every agent the company deploys must be told explicitly — in a structured, versioned, machine-readable form — or it will generate its own interpretation and produce an output no one can trust.

The semantic layer problem has two failure modes Tavakoli describes with precision. First, static context goes stale. An ontology document written two years ago does not reflect what EMEA means today, what the current fiscal quarter boundaries are, or how the organisation now defines top spenders after a pricing restructure. Knowledge Debt accumulates at the vocabulary layer: every operational change that is not reflected in the semantic layer is a future agent error propagating as a correct output. Second, semantic layers are currently locked in proprietary BI tool silos. Power BI, Tableau, and Looker all built proprietary semantic models inside their platforms, meaning the vocabulary definitions that govern what “revenue” means are trapped in one tool and unavailable to every other agent or application the organisation deploys. The result is Context Collision across systems: two agents using the same term reach contradictory conclusions because each resolved it against a different proprietary context store.

The structural arguments this confirms

The Databricks account confirms three Arco structural arguments that were previously established from first principles.

First and most directly: the operational ontology requirement from Memo #42. The memo argued that in an autonomous business, vocabulary is architecture — that agents interpreting the same concept differently produce Context Collision that propagates downstream as correct output, and that the semantic vocabulary layer must be machine-readable, versioned, and delivered to agents before execution rather than inferred during it. Tavakoli is describing this requirement from the other direction: the absence of it is the reason enterprises with access to good models and mature data infrastructure cannot get agents to produce reliable answers. The problem is not the model. It is not the data. It is that the meaning layer between the data and the agent has never been formalised. The 70,000 users that the car manufacturer added to Databricks are now generating correct answers at scale because Genie’s enterprise context layer is building and updating the vocabulary mapping automatically. That is the operational ontology implemented as a product.

Second: Intelligence Arbitrage from Memo #43 — BYOK as Architectural Decoupling. Tavakoli states the problem directly: enterprises are using frontier models for everything because they lack a task classification discipline. “For 95% of what you’re asking, you don’t need Opus 4.7. You can use an open source model or something simpler.” The Databricks AI Gateway implements the routing mechanism precisely as Arco described: routing each task class to the cheapest capable model, with caching and chunking strategies that reduce cost without sacrificing output quality. The critical observation is that this routing is only available to a business whose Context Architecture is provider-neutral. Tavakoli’s framing of why organisations should not be the ones picking models — “the average person is not actually qualified to decide should I be using Opus 4.7 or GPT or a smaller model” — is the task classification argument: T1 tasks should be routed to the cheapest capable model automatically, and doing that correctly requires both T1/T2/T3 classification and a Context Architecture that is not entangled with any specific provider.

Third: Legacy Liability and its structural reversal. Tavakoli argues that any industry with a monopoly today will not have one in 12 to 24 months — not because of competitive pressure but because the Rebuild Tax that previously protected incumbents has structurally dropped. AI can now analyse legacy codebases, map their logic, convert the code, migrate the data, and reconcile the outputs in a fraction of the time and cost the same process required eighteen months ago. The switching cost that made Legacy Liability a structural moat for incumbents is eroding. Enterprises that previously could not justify migrating off a legacy data warehouse because the migration cost exceeded the potential savings can now complete the migration in weeks rather than years. The Legacy Liability remains — the architectural debt accumulated through decades of human-centric system design — but the cost of addressing it is falling at the same rate the risk of ignoring it is rising.

Read alongside the Vercel perspective from the same event — Agents in Production Require Process Before Platform — the Databricks account completes the diagnostic. Vercel’s evidence is operational: the businesses with agents running at 93% resolution and replacing SDR teams are the ones that documented their processes before deployment. Databricks’ evidence is infrastructural: the enterprises that cannot get agents to produce reliable answers have not defined the vocabulary those agents must reason with. Both failures describe the same root condition from different layers of the stack. You cannot build the process documentation if you have not defined what the terms in that documentation mean. The operational ontology is not a nice-to-have after the process is documented. It is the precondition for the documentation to be machine-executable.

The vocabulary problem that Databricks is solving at the infrastructure layer and the process documentation problem that Vercel described at the application layer are the same problem at different levels of the stack. The enterprises that cannot run agents are not missing a better model. They are missing the Context Architecture that makes the model’s capability applicable to their specific operational reality. The semantic layer is not a data engineering problem. It is an architectural design decision that must be made before any agent is deployed. The organisations that make it first will not just run agents. They will own the Operational Ledger that makes every subsequent cycle cheaper and harder to replicate.

KEY TAKEAWAY

What does Databricks’ co-founder confirm about why enterprises cannot deploy agents at scale, and how does it connect to the Arco argument?

Arsalan Tavakoli of Databricks, at SaaStr Deploy 2026, confirmed that the primary reason enterprises fail to deploy agents at scale is not model quality or infrastructure — it is the absence of a semantic layer: the canonical vocabulary of what every operational term means in the organisation’s specific context, versioned so it does not go stale, and delivered to agents before execution. This is the operational ontology requirement Arco identified in Memo #42. Tavakoli describes the failure mode precisely: an agent asked to report on “EMEA top spenders in the last fiscal quarter” cannot produce a reliable answer without knowing what EMEA, fiscal quarter, and top spenders mean in this specific organisation. Currently, that vocabulary is either undefined, captured in static documents that go stale, or locked in proprietary BI tool silos that cannot be shared across agents. The Databricks Genie enterprise context layer builds and updates this vocabulary mapping automatically — the operational ontology implemented as a product. Tavakoli also confirmed the Intelligence Arbitrage mechanism: Databricks AI Gateway routes each task class to the cheapest capable model, implementing the same routing discipline Arco described in Memo #43. The observation that 95% of enterprise workloads do not require frontier model capability is the T1/T2/T3 classification argument confirmed from a company with direct visibility into how $5B ARR of enterprise token spend is actually allocated.