RFD 0050 — Documentation architecture: three books, correctness by construction, and a verified authoring pipeline

State: discussion
Depends on: the reference manual (Part I of the book) and its crash course; RFD 0049 (error-tolerant diagnostics — the What/Where/Why/Fix model and ox explain corpus the diagnostic docs draw on); RFD 0044 (packages and the registry — the example corpus is a set of real packages); the @[language_interface] drift gate (spec/lean/Argon/Interface.lean), whose discipline the documentation-freshness gate mirrors; the editor-experience work (the LSP shares the diagnostic corpus these renderers feed).
Prior art: Rust’s three-artifact split — The Rust Programming Language, The Rust Reference, Rust by Example — plus the rustc error index (rustc --explain); the Diátaxis documentation framework (tutorial / how-to / reference / explanation as four distinct reader-needs); mdbook’s {{#include}} transclusion; the Oxide Computer RFD process this series already follows; literate-specification precedents that cite a mechanization alongside prose.

Question

How should Argon’s documentation be structured, authored, and kept correct as the language moves — given that the reference manual systematically drifts from the implementation today, that the substrate is mechanized in Lean while the reasoner is implemented Rust-first, and that “what runs today” cannot be read reliably from prose or even from code comments?

Context

The drift is structural, not a matter of diligence. The reference manual asks one artifact, in one voice, to do three different jobs — teach a newcomer, specify the language normatively, and record design history — and to track two different timelines at once: the language as designed (which changes by deliberate decision, rarely) and the language as implemented (which changes every merge). An inline status badge or a “refuses today” sentence in normative prose is therefore a time-bomb with a one-merge fuse.

This is measurable, not hypothetical. A fact-check of the prose against the tree at main found shipped capabilities described as unbuilt: the temporal value library (real, jiff-backed) and the in-language test declaration (real, with its own ox test runner) were both still called “future” in committed prose. And reading the implementation’s own code comments and reserved markers *under-*reported the language: well-founded semantics for cyclic negation and the defeasible-rule surviving extent both run today, yet are easy to read as “not built.” The lesson is sharp and load-bearing for everything below: implementation status is unreliable read from prose, and unreliable read from code comments. The only trustworthy signals are an example that compiles and runs in CI, the deliberately-honest “what runs today” sections of the crash course, and commit history.

Three further facts shape the design:

The layers move at different speeds. For the surface and substrate semantics, the book runs ahead of the Lean and the Lean ahead of the Rust. The one inversion is the reasoner: well-founded semantics, the join/optimizer engine, incremental maintenance, and defeasible evaluation are implemented in Rust ahead of their Lean mechanization. Documentation must make this legible without misleading a reader about what is proven versus what runs.
The crash course already demonstrates the target. It teaches the meta-calculus before any vocabulary, frames a foundational ontology as an ordinary package rather than a language feature, and states plainly what is specified versus what runs. The drift lives in the numbered reference chapters, not here. The crash course is the seed, not a thing to replace.
The language is unfamiliar and dual-purpose. A declared (not built-in) classifying vocabulary, four-valued Truth4, per-concept world assumptions, a seven-tier cost ladder, defeasibility, and standpoints are not what a reader arrives expecting from OWL, SQL, or Prolog. Argon is also both a language and a database, with a modeler audience and a data-systems audience. The documentation has to install a correct mental model, not just list features.

Decision

One principle is load-bearing, and the rest follows from it:

The artifact most at risk of being wrong carries the fewest independently-falsifiable claims. Narrative teaches by pointing — at examples that compile in CI, and at a reference whose claims are mechanically anchored — rather than by restating facts it could get wrong. Risk is inverted on purpose: the highest-variance prose is made the lowest-risk by construction.

D1 — Three user-facing books, one information architecture

Not three silos. One system, layered by who owns truth, each book answering one reader-question (the Diátaxis split, adapted):

Argon by Example — the verified substrate (“show me it working”). Real .ar packages, compiled and run in CI. This is the only place code lives. Both other books transclude from it, so a broken example is a failed build, not a stale snippet. Indexed two ways: by concept and by task (“how do I model a role / a temporal fact / a defeasible exception”).
The Argon Reference — the normative surface (“what is the exact rule”). Terse and complete. Each section carries a Lean-provenance link (existence-checked in CI); grammar sections are generated from grammar.toml; code is transcluded, never pasted; the diagnostic appendix is the error index, single-sourced with ox explain <CODE>. It describes the language as it is — no inline status badges (see D2).
The Argon Book — the teaching narrative (“how do I think about this”). It grows from the crash course, teaches substrate-first, and makes minimal original factual claims: it motivates, sequences, and explains, but every code sample is a transclusion and every precise rule is a link to the Reference. It is correct by construction.

D2 — One glossary, one cross-link grammar, no published status surface

No published implementation-status surface. The language is done enough that “what is implemented today” is no longer a question the documentation must answer — so the published status surfaces (inline chapter badges, the feature-status page, and a generated coverage grid) are retired. The three books describe the language as it is. The anti-drift value those surfaces carried — catching when the reference falls behind the language — is preserved as an internal CI check (cargo xtask check-coverage) over a feature registry (coverage-features.toml): every feature names its CI-checkable signals (a runnable example, a diagnostic code in the generated catalog, a Lean file) and the check fails the build if any cited signal no longer exists. That is the anti-drift mechanism, not a reader-facing grid. The live, CI-verified evidence of what runs is Argon by Example — every package there compiles and runs in CI.
A single glossary: one definition per term (metaxis, metatype, metarel, refinement, standpoint, tier, defeasible, the value/ontology boundary), authored once and transcluded into every book.
One cross-link grammar, so the books read as one system: learn it → Book §; exact rule → Reference §; see it run → By Example #; why this design → RFD; is it proven → Lean.

RFDs and the Lean are referenced archives, not part of any reading path. The AGENTS.md intent nodes remain contributor-facing and separate from the three user books.

D3 — The Book spine

The expanded crash-course arc — a modeler’s workflow, not an enumeration of atoms (the atom-by-atom organization belongs to the Reference):

Orientation — what Argon is and why (a typed knowledge graph, a rule engine, and a bitemporal store in one); install; a fast end-to-end taste.
Modeling a domain — the meta-calculus, substrate-first (declare your own metatype/metaxis/metarel, plus the reflective intrinsics); the value/ontology boundary; data versus concepts; <: (specialization) versus : (instance-of); first-class relations; refinement (iff versus where). Examples are neutral; a foundational ontology appears only as one bounded worked example.
Reasoning — the five rule modes; derivation and recursion; the stratified fixpoint and well-founded semantics; the write path (mutate); queries.
Truth under incompleteness — Truth4; world assumptions (closed by default, the per-concept open-world opt-in); defeasibility; standpoints and federation.
Confidence — the decidability ladder (why a model terminates and what it costs) and checks-and-diagnostics as the trust surface (ox explain, the justification “why”).
Building real systems — packages and the registry; traits; macros; tests; the runtime and serving; the bitemporal store; generated SDKs.

D4 — The anti-drift machinery

A feature registry (coverage-features.toml) checked by cargo xtask check-coverage: every feature’s cited signals (example, diagnostic, Lean file) must exist, so the reference cannot silently fall behind the language (D2). The check is internal — it does not publish a status surface.
Example transclusion (mdbook {{#include}} with named anchors), so no book contains a code snippet that is not a region of a compiling package.
A source-commit drift-check: each generated/anchored claim pins the commit it was verified against; a check flags cited sources that moved.
A documentation-freshness gate that mirrors the @[language_interface] drift gate: a feature PR that flips an RFD state to shipped, changes a drift-checked carrier, adds a diagnostic code, or adds an example prompts the matching doc touch. Soft warning first, hardenable.
Grammar generation of the Reference’s syntax sections from grammar.toml.
The diagnostic corpus single-sourced into both ox explain and the Reference error index.

D5 — The verified authoring pipeline

Per section, a pipeline (not a single pass): structure (charter: what it covers, which examples it owns, dependencies) → content ledger (every claim paired with a source — a Lean module/theorem, an oxc location, an RFD, a diagnostic code, or an example id; every code sample a real package or a flagged gap) → adversarial verify (the compiler for code claims, an existence-check for Lean links, an agent told to refute for prose semantics) → cross-section reconcile (resolve contradictions, build the glossary, confirm coverage) → constrained authoring (prose written only from the verified ledger; code only by transclusion) → review (voice, then a technical re-check that prose still matches the ledger and examples still pass).

Three properties make it sound rather than merely orderly:

Ground-truth order and a blocklist. Authority runs Lean (substrate semantics) → oxc source (surface, runtime, diagnostics) → reference prose. Explicitly not authoritative: stale editor grammars, superseded surface versions, and any external note. Where reference prose and code disagree, code wins; where the Lean covers the substrate, the Lean wins.
Verification is mechanical wherever possible. Code claims are checked by the compiler in CI; grammar is generated; status is a CI signal; Lean links are existence-checked. The agent’s judgment is the fallback for prose semantics only — the smallest surface that must rely on it.
The drift/gap ledger is a first-class output. A claim that cannot be verified becomes one of three things: a documentation fix, a spec-or-code drift ticket, or a language gap that needs a decision. The documentation build doubles as a correctness audit of the language.

The pipeline is layered by book (By Example green first, then the Reference, then the Book) and is designed for steady state: the freshness gate, drift-check, and CI examples run for the life of the project; the initial authoring is run number one, not a finished project.

D6 — The plan

Wave 0 — the anti-drift tooling (coverage registry, transclusion, drift-check, freshness gate, shared-glossary infrastructure, Lean-link check) plus the small, already-identified prose drift-fixes.
Wave 0.5 — a tracer bullet: one concept taken through the entire system end-to-end (a CI-verified example → a Reference section with provenance → a Book passage transcluding it → its feature-registry entry → its freshness hook), to validate every seam before fan-out.
Wave 1 — Argon by Example (the verified substrate).
Wave 2 — the Reference (normative, status-free, literate).
Wave 3 — the Argon Book (narrative, transcluding and linking).
Steady state — the gates run continuously; new features ship with their doc touch, example, and feature-registry entry by construction.

Rationale

The correctness-by-construction inversion is the whole design. Documentation rots because prose asserts facts that later change; the fix is to let prose assert as little as possible and point at things that are checked. Examples are checked by compilation; the Reference is anchored to the Lean and to generated grammar; status is a CI signal. The narrative book — the artifact a model is most likely to hallucinate — ends up carrying almost no standalone claims, so it cannot drift in the ways that matter.

Three books rather than one because the three reader-questions are genuinely different and have different cadences: a learner wants a motivated path, a practitioner wants a precise rule, and either wants to see code run. Folding them into one artifact is exactly the conflation that produced today’s drift. Splitting them lets each take the drift-control that fits — transcluded-and-compiled for examples, anchored-and-generated for the reference, point-don’t-restate for the book.

No published status surface, and that is forced. The Context shows status is wrong when read from prose and wrong when read from code comments; any hand-maintained surface is a smaller copy that drifts the same way. With the language done enough that “what is implemented today” is no longer a question the documentation must answer, the right move is to publish no status surface at all and let the books describe the language as it is. The anti-drift value is kept where it belongs — as an internal CI check (check-coverage) over signals that are already checked: a green example, a present diagnostic, an existing Lean symbol. The check fails loudly if a cited signal vanishes, so the reference cannot quietly fall behind, without any reader-facing grid to drift.

Lean-provenance in the Reference earns its place twice: it is a trust asset unique to a language with a mechanized substrate (“this rule is proven sound; here is the theorem”), and it is a drift anchor (a CI check that the cited symbol exists). It must be honest about the reasoner inversion: where Rust leads the Lean, the provenance says so rather than implying a proof that is still owed.

The plan is sequenced by dependency and by risk. Tooling first because everything stands on it. A tracer bullet before fan-out because the integration seams — transclusion anchors, the provenance link format, status derivation — are where a docs system of this size will actually break, and proving them on one concept is far cheaper than discovering them across the whole corpus.

Alternatives considered

One mdbook with clearer parts. Lower effort, but the audience-and-timeline conflation that causes the drift survives. Rejected.
Keep inline status badges, auto-generated from the registry. Preserves at-a-glance reading, but it re-introduces status into the normative prose surface. Rejected: keeping the spec timeless means publishing no status at all (the decision retired even the generated grid).
A hand-maintained coverage ledger. This is a smaller version of the old hand-maintained STATUS.md feature matrix and drifts the same way. Rejected in favor of CI-derived status.
A single big-bang reference rewrite. The reference is largely sound; the drift engine is the problem, not the prose. A rewrite would re-create the drift the day it shipped. Rejected in favor of decoupling status, anchoring claims, and fixing the localized offenders.
RFDs and Lean inline in the reading path. Rejected: they are referenced archives. Inlining design history and proof into a learner’s or practitioner’s path is the conflation again.

Consequences

New build tooling and CI gates (registry, transclusion preprocessor, drift-check, freshness gate, Lean-link check).
The Reference loses inline status badges and publishes no status surface; the anti-drift signal lives in the internal check-coverage check instead; RFDs leave the Reference reading path; the diagnostic corpus becomes a shared single source feeding both ox explain and the error index.
A standing obligation: a feature PR touches its example, its registry signal, and (when relevant) its doc section — enforced softly at first.
A positive side effect: the documentation build surfaces language drift as tickets, making the docs a continuous correctness audit.
Migration is evolution, not rewrite — the crash course seeds the Book, the numbered chapters become the Reference once decoupled and anchored, and the existing example packages become the By Example substrate once re-verified against main.

Open questions

The tracer-bullet concept for Wave 0.5 — first-class relations (exercises concepts, relations, and Lean-provenance together, and is the framing most in need of being publicly precise) versus refinement (iff/where).
Whether the freshness gate ships soft-warning or hard-fail, and on which triggers.
The Book’s per-chapter granularity, and whether a task/idiom cookbook is part of Argon by Example or a separate surface.
Hosting layout: three books under one site with a shared theme and a portal landing page.
How the value/ontology boundary is taught — as its own early chapter, or woven through Modeling.

Keyboard shortcuts

Argon Design Records (RFDs)