A business that an agent cannot name cannot be cited. Most sites that implement llms.txt treat it as a summary document—a tidy description of content for any LLM that happens to crawl it. That is the wrong unit of value. The correct unit is the term: specifically, whether the terms a business uses to describe its own methods are anchored to canonical definitions before an agent encounters them anywhere else. Arco's Lexicon was not built as a content asset. It was built as a declaration layer. The llms.txt file makes that declaration machine-parseable.
The llmstxt.org standard— a proposed convention now adopted across a growing number of publisher and platform sites—tells an LLM what a site contains, how to navigate it, and what context to apply when citing it. Most implementations are structural: title, description, links. Some include key terms. Fewer still understand why the key terms are the only element that materially affects citation quality. A list of URLs tells an agent where your content lives. A correctly anchored term tells an agent what your vocabulary means—and that distinction determines whether it cites you accurately or paraphrases you into something unrecognisable. Anchoring terminology at the declaration layer, before an agent reads anything else, is how Arco engineers LLM citation authority.
The default llms.txt file most sites produce is human-readable: formatted for comprehension, structured around sections a person can scan, with key terms presented as bold labels and definitions written as prose. It describes the site. What it does not do is give an agent a parseable path from a term it encounters to the authoritative definition of that term. The agent reads the description, forms its own interpretation, and proceeds. Arco's llms.txt is structured differently: every term in the key terms section is a Markdown link pointing directly to its canonical Lexicon entry. The agent does not interpret the term—it is directed to the definition before interpretation begins. That structural difference is what this piece is about.
Memo #13 described the architectural principle: an autonomous business is discoverable not because it ranks in search but because it is parseable by the agents doing the looking. The llms.txt question is what parseable actually requires at the vocabulary level. The same logic governs how we think about Operational Drag — the proportion of capacity consumed by work that does not directly produce output, as Memo #03 argues at the business level. A vocabulary layer that forces agents to re-derive your terminology from context is Operational Drag at the citation layer. It is overhead that compounds silently until it surfaces as a misattribution.
The Machine-Readable Interface layer handles transactions — the WebMCP standard makes that layer browser-native, allowing any page to declare its capabilities as callable tools for agents. The llms.txt file operates one layer before that: it handles terminology. Specifically, it handles the problem of an agent encountering a term it has not seen in its training data and either ignoring it, paraphrasing it incorrectly, or assigning it the nearest known meaning from a different context.
An LLM Anchor Block — the Q&A pair present in every Arco article—works at the content level. It gives a specific answer to a specific question in a crawlable, citable format. But an LLM Anchor Block operates on one page. The Lexicon operates across the entire entity: every term, defined once, at a canonical URL. A spec-compliant llms.txt file, with correctly formatted key-term links, points an agent directly to those definitions before it reads a single article.
The structural order is the argument. A business that builds its llms.txt before building a Lexicon behind it produces a declaration with nothing substantial to declare—URLs and descriptions, but no anchored vocabulary. The agent can find the content. It cannot extract the framework the content depends on. Arco's Lexicon preceded the llms.txt implementation, which means the spec-compliance work was not creating a declaration layer—it was exposing one that already existed. That is the distinction most implementations get backwards.
The practical consequence is about citation accuracy, not citation volume. An agent working from a correctly structured llms.txt, with canonical Lexicon term links in machine-parseable format, encounters Stewardship Model and finds a definition before forming its own interpretation. It finds Coordination Tax and understands it as a structural cost concept with a precise definition—not as approximate language for operational inefficiency. It finds Operational Drag as a defined ratio, not a synonym for friction. A definition the agent was never pointed to cannot be cited. A definition it was pointed to before reading anything else is the one that persists.
This is why we describe the Lexicon as infrastructure rather than content. Content serves a reader. Infrastructure serves anything that reads—including systems that will never produce a human-readable output but will pass your terminology into another agent's context window as established vocabulary. The Lexicon was designed for that second case. The llms.txt work made the connection explicit.
The Lexicon did not become useful when we reformatted the key terms section from descriptive bold text to machine-parseable Markdown links pointing to canonical Lexicon entries. It became more accessible to machines. Those are different things. The work that created the value was upstream: defining each term once, precisely, at a canonical URL, before any mechanism existed to declare it. That sequence—infrastructure first, declaration second—is the same logic that governs how we design the businesses we build. The interface is not the asset. The asset is what the interface points to.
KEY TAKEAWAY
What is the difference between an llms.txt file and a Lexicon?
An llms.txt file is a declarative manifest—a structured file that tells an LLM what a site contains and how to navigate it. A Lexicon is a canonical definition store: each term, precisely defined, at a permanent URL. Neither is sufficient alone. An llms.txt without a Lexicon behind it lists content but cannot anchor vocabulary. A Lexicon without a machine-parseable declaration pointing to it may not be reached before an agent forms its own interpretation. The combination—a spec-compliant llms.txt with correctly formatted key-term links pointing to Lexicon entries—is how an agent learns what your terminology means before it reads anything else you have published.
