RFD 0048 — The `test` atom: in-language unit tests, and why a test is substrate

State: accepted — partially implemented (v1 shipped in #804; deferred assertion forms remain)
Depends on: RFD 0015 (mutate body surface — a test body is a mutate body plus assert), RFD 0025 (check discharge — the closest relative, and the contrast that defines a test), RFD 0020 (runtime engine — Engine::evaluate, the one read path a deductive-plane assert uses), RFD 0042 (self-validating .oxbin — the trust argument for carrying tests in the artifact), the scenario harness (#764 — the integration-test sibling).
Prior art: the §19 walking-example test "…" { … } sketch (a pre-implementation surface, superseded here); the scenario-harness expect vocabulary (oxc-driver/src/harness.rs); Rust’s #[test] / #[cfg(test)] (the source-discovery model this RFD weighs and rejects for Argon).

Question

Argon needs in-language tests: a way to write, inside a package, a unit test of what its rules, fns, and mutations actually do. Two questions follow. (1) Surface: what does a test look like and what does it assert? (2) Carrier: is a test a substrate concern — an elaborated declaration persisted in the .oxbin axiom-event log and mechanized in the Lean carrier taxonomy, like mutate/query/check — or a tooling concern, discovered from source by ox test and never persisted? The carrier question is the load-bearing one, because it decides whether the Lean mechanizes anything about tests at all.

Context

The test keyword has been reserved since the early grammar, parsing-but-silently-swallowing its body — the last entry on the v0.2.1 loudness burn-down (#342). The scenario harness (#764, ox run-scenario over scenarios/*.toml) shipped as the integration-test surface and deliberately reserved the tests/ directory for the in-language test atom, the unit-test surface. The §19 walking-example sketched a test form (imperative let/insert + assert) that never parsed.

The construct shipped in #804. During review one architectural question was deferred to this RFD: putting a test in the substrate means adding AxiomKind::TestDecl, which — because AxiomKind is @[language_interface]-mechanized — forces a testDecl variant into the Lean. AGENTS.md says “the Lean does not cover … tooling.” So either tests are not tooling, or the carrier choice is wrong. This RFD settles it.

Decision

A test is substrate, at the carrier / data-shape layer. The runner is tooling. Precisely:

The test declaration is an elaborated Core IR artifact, carried in the axiom-event log as AxiomKind::TestDecl with body TestDeclBody { name: String, operations: Vec<Operation> } — structurally identical to MutationDeclBody. The body is the same Operation Core IR a mutate lowers to, plus the test-only assertion ops: Operation::Assert { condition, rendered } (value/boolean), Operation::AssertDerivable { predicate, args, negated, rendered } (derivability), and Operation::AssertRejects { ops, expected_code, rendered } (negative enforcement — the block’s own lowered Operation stream, run isolated). The Lean carries testDecl in the AxiomKind taxonomy (Argon/Storage/AxiomKind.lean).
The runner is tooling, in Rust and prose only: ox test, the per-test fresh-store orchestration, the PASS/FAIL/ERROR classification and reporting, --filter, the non-zero exit. The Lean mechanizes none of this, and Operation::Assert’s arity is not in the drift gate (Operation is not a @[language_interface] inductive — same as every other mutate op).

The surface (shipped):

test "active leases are returned for tenant" {
    let alice = insert Person { name: "Alice", age: 30 };
    let unit  = insert Property { address: "1 Main", sqft: 700 };
    let l = sign_lease(alice, unit, 2500, 365);
    assert alice.active_leases() == [l];
}

A test "<string-name>" { … } is a named imperative block run top-to-bottom against a fresh store. The body is the mutate-body statement set (RFD 0015) interleaved with assert <bool-expr>; — no separate fixture block. assert mirrors require, with one difference: a failed require aborts the body; a failed assert records a pass/fail outcome and execution continues. An assert whose condition names a derived predicate / pub query / nav-method is evaluated against the reasoner’s materialized extent (via Store::query_derive, the one read path), so a test asserts what the rules derive, not just what was stored. assert is test-only: a stray assert in a mutate/fn body refuses at build time with OE1318.

The assertion vocabulary completes its negative-enforcement half (shipped): alongside the positive assert <bool-expr> and assert [not] derivable F(args), the rejection form assert rejects [( Pkg::Code )] { <mutate-body-stmts> } asserts that a write block is refused by a write-path guard. Argon is a constraint language — checks, where/iff invariants, group axioms — and without this form the test atom could assert what a model accepts but not what it refuses, which is the more important property of a constraint. The block runs against an isolated copy of the test world (committing nothing back) and the result is classified: a genuine write-path guard rejection (a where-invariant OE0668, a check delta-guard, a group axiom, an endpoint refusal) PASSes — matching a pinned Pkg::Code when given, a different code FAILing as the wrong reason; an accepted write FAILs; and a non-guard error (a typo, an unbound reference, a type error) ERRORs loudly. The guard-vs-non-guard line is load-bearing: only a genuine constraint refusal satisfies rejects, so a broken test can never masquerade as a passing rejection test. rejects, like derivable, is a contextual keyword (leading position after assert).

Rationale

The carrier ruling is validatable on four independent grounds; each was checked against the code and the Lean.

AxiomKind is the event-log taxonomy, not the five-atom set — and tests join an existing precedent. The five atoms (meta-calculus, constructs, rule, trait, macro) are the vocabulary-introduction primitives and are fixed. AxiomKind is broader: it already carries ruleDecl, queryDecl, mutationDecl, computeDecl, and bridgeDecl — none of which is one of the five atoms. testDecl sits in exactly that company. So calling it “the test atom” is colloquial naming of a declaration form, not a claim of a sixth substrate atom. (Verifiable: Argon/Storage/AxiomKind.lean, oxc-protocol/src/storage.rs.)
A test body is Core IR, which the Lean scope already covers. TestDeclBody.operations is Vec<Operation> — the identical elaborated IR a mutate lowers to. AGENTS.md’s mechanization scope lists “Core IR: elaborated intermediate representation; surface-to-IR lowering preservation” and the Storage event log. A test, once elaborated, is a member of both. Carrying it is consistent with what is already mechanized; excluding it would be the special case requiring justification.
The carrier/runner split is exactly what “the Lean does not cover tooling” means. The Lean testDecl is a tag in the carrier enum; it carries no test semantics — there is no mechanized pass/fail relation, no ox test model, no isolation theorem. The tooling (the runner and its reporting) lives in Rust and prose, untouched by the Lean. So the scope line holds verbatim: the event-log taxonomy (Storage) is substrate and mechanized; the runner is tooling and is not. Adding testDecl to the taxonomy no more “mechanizes tooling” than mutationDecl does.
Trust and reproducibility (RFD 0042). The .oxbin is the package’s complete elaborated form, and it is self-validating: it re-checks its own invariants at load. A test carried in the artifact runs against the exact elaborated state that ox check/ox build validated — there is no second elaboration that could drift from the checked one. Source-discovery would re-parse and re-elaborate test bodies independently of the artifact, opening precisely that drift. Carrying tests is the choice that preserves “the artifact is the package.”

Alternatives

Source-discovery (the Rust #[cfg(test)] model): rejected. ox test would parse tests/*.ar, elaborate test bodies on demand, and run them, persisting nothing. It is appealing on the intuition that “tests aren’t knowledge,” but it (a) reintroduces a second elaboration path that can drift from the checked artifact (against RFD 0042), (b) makes a test the only package-level declaration not carried in the artifact, splitting the declaration model for no semantic gain, and (c) buys nothing the carrier model lacks — tests are already inert at query/serve time, so persisting them costs nothing at runtime.
A sixth atom: rejected, and a category error. A test introduces no vocabulary and no new substrate primitive; it is a declaration form whose body is existing Core IR. The five atoms are fixed (see ground 1).
Stripping tests from every artifact: not now. Whether a release artifact should omit its tests (as a Rust release binary omits #[cfg(test)]) is a build-profile question, not a carrier question — it does not bear on whether tests are substrate. Deferred (see open questions).

Consequences

AxiomKind::TestDecl is in the .oxbin event log and the Lean carrier taxonomy (the variant count is now 26). Tests are inert at query/serve time — only ox test enumerates them.
A test is not a check. A check is a standing universal obligation that fires a diagnostic over whatever world is loaded and is observer-only (it populates no IDB); a test is an existential example that constructs a known world and asserts a specific expected outcome. Check-firing is one thing a test can assert — directly, via assert rejects [( Code )] { <write> }, which asserts the write is refused by a guard (and on the exact code when pinned) — but the two are duals, not the same construct.
Isolation is fresh-store-per-test (the scenario-harness discipline), so tests are order-independent; within a test, writes are read-your-writes over the committed + deductive state.
A published artifact currently carries its tests. This is acceptable (they are inert) and is the trust-preserving default; a future release profile may strip them.

Open questions

Shared / parameterized fixtures across tests — the one deferred assertion-surface item (the fixture / expect block forms; refuses loudly until built, never silently accepted). The dedicated derivability forms assert derivable F / assert not derivable F and the three-valued outcomes (an INCONCLUSIVE result distinct from FAIL when a derivability assert is evaluated over an open-world relation, where absence is can, not not — the same fail-closed discipline RFD 0045 forced for the write side) are now built (§17.14): present ⇒ PASS/FAIL, absent under CWA ⇒ FAIL/PASS, absent under OWA ⇒ a loud INCONCLUSIVE that exits ox test non-zero. The negative-enforcement form assert rejects [( Pkg::Code )] { … } is also now built (§17.14): a write block is run isolated and a write-path guard rejection PASSes (on the pinned code when given), an accepted write FAILs, and a non-guard error ERRORs loudly — so the assertion vocabulary covers both what a model accepts and what it refuses.
Release-profile stripping of tests from a distributed artifact (a build-profile lever, orthogonal to the carrier ruling).
An assertion-soundness theorem in Lean (optional research): that a passing deductive-plane assert implies the asserted derivation holds at the fixpoint. The runner stays tooling regardless; this would only mechanize the meaning of the carried Assert op, adjacent to Reasoning/Checks.lean.

Keyboard shortcuts

Argon Design Records (RFDs)