RFD 0037 — The macro atom: a phase-separated, hygienic, declarative-first expander over surface syntax
- State: discussion
- Opened: 2026-06-16
- Decides: how Argon realizes its macro atom — the last unbuilt of the five substrate
atoms (meta-calculus, constructs, rule, trait, macro). Settles: (1) expansion is a real
pipeline phase between parse and resolve, not a rewrite smuggled into elaboration; (2) a macro
expands to surface syntax, which is re-parsed and re-elaborated through the one existing
path — never to events directly; (3) hygiene is scope-sets + Racket-style binding spaces,
white-box, over Argon’s existing binder namespaces; (4) rule variables are
alpha-canonicalized at AST→event lowering — a content-addressing correctness fix that also
dissolves the macro determinism hazard; (5) v1 is purely declarative (
pub macro, pattern→template) — verified sufficient for every current client including MLT; the procedural / analytic layer is deferred with its end-state shape committed (total structural recursion over reflected syntax, Lean-mechanizable); (6) migration re-homes the relation-property family to a library declarative macro byte-identically, and the MLT decorators via a library-surface-stub / privileged-native-expander split (RFD 0009 RP-003 GAP-3); (7) collisions resolve by uniqueness-at-registration (loudOE0705-class, not silent priority); (8) the Lean line holds at the typed AST (rung a) with a re-check obligation. Two acceptance tests gate v1 (§8). This RFD leads the arc per the workflow table — language surface: RFD + reference draft → Lean → code. Hard prerequisite for the directive surface (RFD 0028) and the std theory libraries (std::temporal,std::lifecycle,std::mlt).
Prior state (verified at 2c2854959):
- The surface parses; nothing expands.
pub macro Name { … }parses toMACRO_DECLwith the body eaten as opaque balanced tokens — no pattern/template structure, no fragment specifiers recognized (oxc-parser/src/grammar.rs:2134,eat_balanced).#[procmacro]is a reserved directive (OE0706,directives.rs:468); there is noTokenStreamtype.name!(…)bang-invocation does not parse at all — noMACRO_INVOCATIONnode, theBANGtoken is unused in expression position (expr.rs:238). Lowering intentionally skips macro decls (oxc-instantiate/src/lower.rs:5). - A working proto-macro already exists.
synthesize_relation_property_rules(lower.rs:5095) is a string-template macro in all but name: itformat!s Argon source —"pub derive {rel}(x, z) :- {rel}(x, y), {rel}(y, z)"(lower.rs:5172) — thenparse_files it, extracts theItem, and lowers it with the livectxso the body resolves to the right qualified paths (lower.rs:5135). It works only because it runs inside instantiate on self-contained synthesis whose introduced names nothing else resolves against. - The pipeline is hard-gated, parse-frozen.
parse → [error gate] → resolve/check → [error gate] → instantiate → tier-classify → discharge → write(oxc-driver/src/lib.rs:1520– 1647).parse_file(db, SourceFile)is a memoized salsa query (oxc-db/src/lib.rs). Each phase assumes the prior AST is immutable; there is no phase that rewrites the AST before resolution. - Rule variables are name-carried and un-canonicalized.
Term::Var { name: Ident { text: String } }(core_ir.rs:240,142);content_id = BLAKE3(encode_rule_decl(body))(lower.rs:3969); no alpha-canonicalization anywhere (grep acrosscompiler/is empty). So alpha-variant rules get differentAxiomKeys — contradicting the stated intent “same proposition asserted twice has the sameAxiomKey” (ids.rs:393).#[forall]lowering already mints counter-named$fa_Nvars (atom_lower.rs:226) — a latent reproducibility hazard. - The §3.4 gate and head resolution are concrete and reusable.
resolve_metatype_introducer/resolve_metarel_introducer(lower.rs:1171/1209): three arms (localpub metatype→ ambientstd::core→ workspace-unique, elseOE0605/0606). Rule-atom resolution is local-first → unique-workspace → ambiguous-refuse (“never a silent union or an arbitrary pick”,lower.rs:1528). Same-module same-head derives union; cross-module same-name refuses. - MLT is declarative templating, not computed expansion.
#[categorizes(T)]reads only T’s name and emits aMetaPropertyevent (lower.rs:4331); it does not inspect T’s structure. The Lean model is a five-kind enumeration with a round-trip theorem (MLTKinds.lean,classify_declOfKind).
Question
Macros are the last unbuilt atom and the extensibility substrate the ontology-neutral doctrine
depends on: every hard-coded compiler substitute is a macro waiting to exist — the #[…]
directive registry, the elaborator-native MLT decorators (RFD 0009 commits to re-homing them
byte-identically), the just-built relation-property directives (#434, a string-template
proto-macro), std::temporal’s ten DatalogMTL operators, std::lifecycle, and derive. Without
macros, “the core stays small; theories are libraries” cannot hold — every vocabulary a user wants
needs a compiler patch.
The campaign that precedes this RFD (vault: Efforts/On/Argon/research/macro-system/) converged on
a design and then code verification overturned two of its premises, which this RFD encodes:
- The research assumed expansion could sit “between parse and elaborate” on the existing pipeline.
It cannot — the pipeline is parse-frozen and hard-gated (prior state). v1 requires a real
expansion phase, a driver restructuring. This is the correct solution; hacking expansion into
instantiate (as the proto-macro does) only works for self-contained synthesis and breaks the
moment a macro introduces a name another declaration resolves against, or rewrites a rule body
the resolver would otherwise reject as unknown atoms (
since/ever). - The research treated the procedural fork (declarative-first “C” vs bootstrapped-Argon “B”) as the
central decision. Verification collapses it: every current client — relation-property, the
ten temporal operators,
std::lifecyclesugar, navigation, and even MLT — is declarative (pattern→template ± name resolution). The only workload needing analytic computation (inspect a concept’s fields) is genuinederive-class, which has no current client. So v1 is declarative, full stop; the analytic layer is deferred, not staged-around.
A worked motivating case (a real user’s tax-code rule):
pub derive realizes_gain(pcr: PCR) :-
Performs(pcr, perf), HasContent(perf, content),
HasObjectConstraint(content, oc), ConstrainsObject(oc, obj),
compute_1001b(pcr) > obj.basis_for_gain;
The long relational-navigation chain is exactly what a declarative navigation macro should desugar;
the transitive-closure sugar x.Role+(y) is already a loud refusal telling users to hand-write
the recursion (rule_atom.rs:338, #297). These are the macro engine’s first clients, and they are
all pattern→template.
Decision
D1 — Expansion is a phase between parse and resolve
Insert a new compilation phase. The pipeline becomes:
lex → parse → EXPAND (to a fixed point) → resolve → check → instantiate → tier-classify → discharge → write
EXPAND consumes the parsed module set and produces an expanded module set (fresh green trees
/ synthetic SourceFiles) that resolve/check/instantiate consume as if hand-written. It is a
memoized salsa query keyed on the parsed input + the in-scope macro definitions. Expansion runs to a
fixed point (macros producing macro invocations re-expand) with a fuel cap that errors on
exhaustion (OE-coded). The relation-property and MLT synthesis currently living inside instantiate
move to this phase (or, for MLT, to the stub/expander split of D8).
Why a phase, not a rewrite in instantiate (the correctness call): a macro that introduces a concept/relation other code references, or body sugar the resolver would reject pre-expansion, must expand before name resolution runs. The proto-macro’s instantiate-time re-parse only works because its output references already-declared names and introduces nothing referenced elsewhere. Generalizing requires the phase. The error gates re-order accordingly: a parse gate on the original source, then expansion (which may itself emit diagnostics), then the resolve/check gate on the expanded tree.
D2 — A macro expands to surface syntax, re-parsed and re-elaborated
A declarative macro is Syntax → Syntax: it matches a token pattern and produces surface tokens,
which are spliced into the module and re-parsed, then flow through the one existing
parse→resolve→check→instantiate→lower path. This is the model the proto-macro already validates
(emit source text → parse_file → lower). Consequences, all verified to hold:
- The drift contract (S2) is preserved for free — the only thing that ever produces events is the unchanged lowering, now running over post-expansion AST.
- The §3.4 gate runs unchanged — a macro-introduced concept flows through
resolve_metatype_introducerlike any other (lower.rs:1171). - Tier classification is correct by construction — the classifier already runs last, after
instantiate, on lowered events (
oxc-driver/lib.rs:1571); it sees post-expansion reality (S4). - Head composition is sound — a macro emits into its invocation module, where its rules union
with the user’s (same-module same-head) and structurally cannot pollute another module’s heads
(cross-module is ambiguous-refuse,
lower.rs:1528).
Direct event emission is rejected for user macros (it forks the drift contract and bypasses the gate and classifier). The one exception is the privileged MLT-style axis-event emitter (D8), which has no surface form and stays a compiler builtin behind a library surface.
D3 — Hygiene: scope-sets + binding spaces, white-box
Adopt Flatt-2016 scope-sets with Racket-style binding spaces — each of Argon’s existing binder
namespaces (rule vars, concept/type, rel/metarel, metatype/metaxis + axis values,
individuals, trait members) is an interned scope in one scope-set; the maximal-subset resolution
rule is unchanged. Hygiene is white-box (Lean-style effectful quotation): the expander does
not stamp scopes globally — the quotation applies a fresh macro scope to the identifiers it
introduces. (Black-box mark-and-invert is quadratic on Argon’s per-axiom-event structure.)
Integration is an extension of existing resolution: the resolver already does local-first →
candidate-set → ambiguous-refuse (lower.rs:1528) and tracks bound_vars: BTreeSet<String>. Hygiene
adds a scope dimension to candidate filtering; bound_vars becomes scope-tagged. The §3.4 gate
rides candidate-set disambiguation as a commuting pass — it rejects an ill-formed introducer but
never re-points a reference (candidate Lean theorem #1, deferred per D7).
Hygiene splits cleanly by what reaches identity (see D4):
- Rule variables (bound, clause-local): hygiene need only guarantee non-capture — alpha-canonicalization (D4) makes their names irrelevant to identity and the content hash.
- Introduced vocabulary (concepts/rels/metatypes a macro declares): names are semantic and
referenceable, so hygiene must produce a content-derived, stable name (derived from
macro-identity ⊕ argument-content ⊕ expansion-path, content not source span —.oxbinmust survive non-semantic edits). For macro-producing macros this is a content-derived path.
unhygienic! is not shipped in v1: a $crate-style targeted self-reference plus
syntax-parameter keywords cover the real needs; any future raw escape must be namespace-indexed
(name which binding space).
Implementation (realization Y). The white-box scope-set model above is realized over the
expand-to-surface carrier (D2) by rewriting each transcriber-literal identifier as expansion runs,
classified against the macro’s definition module: a reference to a global is qualified to its
package-anchored canonical path (reference hygiene); a macro-introduced variable is freshened to a
content-derived name in a reserved namespace (variable hygiene, marker · / U+00B7, forbidden in user
source — OE0725); metavariable substitutions (use-site syntax) are left untouched. The classification
reuses the resolver’s own local-first → workspace-candidate → ambiguous-refuse discipline at the
definition site, so the embedded §3.4 gate rides along unchanged. The carrier-level rewrite (oxc_parser::hygiene
oxc_workspace::hygiene) was validated before implementation: it is proven to induce the same binding as the scope-set specification —resolves_equivinspec/lean/Scratch/MacroHygiene.lean(branchresearch/0037-macro-hygiene; design memospec/rfd/0037-macro-atom-hygiene-design.md), with non-capture resting on the reserved namespace’s disjointness — and that proof depends on no project-specific axioms. v1 covers single-level expansion and the variable/reference partition; reference hygiene of a metaequality metatype target (c :: T) is the one documented carrier-level gap (the variablecis still freshened), pending the nested-scope generalization.
D4 — Alpha-canonicalize rule variables at AST→event lowering
Before computing content_id, rename every bound rule variable to a positional canonical form
(_0, _1, … by first-occurrence traversal), consistently across all ~25–30 binder sites: head
args, body predicate args, Comparison/Compute/Aggregate/TypeTest operands and outputs,
Comprehension binders (respecting shadowing), and the #[forall] $fa_N vars. Surface names are
preserved in a side table for diagnostics (errors keep the user’s names; the hash sees the
canonical form).
This is a content-addressing correctness fix in its own right, independent of macros: it makes
rule identity up-to-alpha (honoring the ids.rs:393 intent), fixes the latent $fa_N counter
non-determinism, and lets alpha-equivalent rules dedup and share lineage. For macros it is the lever
that dissolves the gensym→hash hazard: hygiene’s fresh variable names never reach the hash, so
hygiene reduces to non-capture and reproducibility is automatic. Verified sound — variables are
clause-local, string-identity-only, and carry no semantic weight beyond binding (no reflection,
no match-by-name, defeat-edge resolution is by-name but maps through the side table).
D5 — Fragment specifiers: a closed v1 set, each a binding-space-targeted parse
v1 specifiers: concept, rel, metatype, rule, plus the syntactic Rust-lineage set (expr,
ident, ty, literal, path, tt, and $( … )*/+/? repetition). standpoint is cut
from v1 (no forcing client; admit later if needed). Each ontological specifier is “parse this
syntax category, bind/reference in the corresponding binding space”: $r:rel binds r in the rel
space and re-resolves use-site; $c:concept clears the §3.4 gate at disambiguation. S7
(ontology-neutrality) and S4 (tier-honesty) hold by construction — the gate runs on emitted
references, and no specifier carries a tier (the classifier assigns it post-expansion).
$x:rule is the subtle one: it is a quotation of rule-plane syntax carrying its own sub-bindings
(the rule variables inside the matched fragment must keep their use-site identity and not be
captured by the macro’s introduced vars). It is the specifier the v1 prototype must exercise hardest.
This requires real grammar for the macro body (today an opaque token blob): a
(pattern) => { template } form with $name:spec metavariables — net-new parser work.
D6 — Invariants: classifier-last, strong normalization, denied sources
- Tier-honesty (S4): structural, by D1+D2 — the classifier runs last on lowered post-expansion events. “Nothing produces events after the classifier” is a checked pipeline invariant. No tier monotonicity rule — the classifier measures the true post-expansion tier; there is nothing to police (a tier cap, if a package declares one, is enforced on the measured tier with the breadcrumb pointing back to the invocation).
- Termination (S5): the declarative layer is strongly normalizing by construction (finite templates; macro-calls-macro bounded by a structural measure), with a fuel cap as a backstop against bugs, not the primary mechanism.
- Determinism (S5): the declarative layer has no I/O sources to deny; emitted collections are
emitted in canonical (sorted-by-content) order; and D4 removes the fresh-variable hazard. Result:
byte-identical
.oxbinacross builds with no author discipline required.
D7 — Lean line: hold at the typed AST (rung a) + re-check obligation
The expander is untrusted by design. Assurance comes from re-checking its output: the tier
classifier runs last (D6), the .oxbin content hash is re-derivable, and the drift gate covers the
@[language_interface] shape of MacroAtom = declarative MacroDecl | procedural ProcMacroDecl.
Mechanize the re-checker, not the producer — the CompCert pole, which §13.7 already follows, and
which the headline ITP precedent (Ullrich & de Moura) actually uses (it mechanized nothing; trust
= kernel re-check). Do not mechanize expansion in v1. The tier-honesty commutation theorem
(classify ∘ expand commutes with the true decidability class) is the one worth pursuing later — it
is Argon’s distinctive invariant — but it is statable only once expand is a Lean object, i.e. only
under the analytic layer (D9). Hygiene-algebra mechanization (POPLmark-scale) and full
expansion-preservation (likely ill-defined per Leroy — a macro’s meaning is its expansion) are out.
D8 — Migration: re-home by emit-target, surface-stable at every step
- Relation-property family (
#[transitive]/#[irreflexive]/#[asymmetric]/#[functional]): re-homes tostd::rel, but splits by what the declarative layer can express (design revised during implementation — the family is not uniformly declarative; PR #549/#556/#560):#[transitive]is a genuine declarativepub macro: it emits the closure rule (which has surface) into the invocation module, and because the library template is byte-identical to the proto-macro’s string ({rel}(x, z) :- {rel}(x, y), {rel}(y, z)), the lowered events are byte-identical — the v1 differential test (§8).#[irreflexive]/#[asymmetric]/#[functional]are builtin-backed macros — the same library-stub / privileged-expander split as the MLT decorators below. Their synthesizedcheckhas a computed head (__{rel}_{prop}), which the splice-only declarative layer cannot construct (noconcat_idents), and#[functional]-on-relis a cardinality cap, not a surface rule. So astd::rellibrary stub marked#[builtin](a new directive on apub macrodecl — Argon’s analogue of Rust’s#[rustc_builtin_macro]) owns the importable surface, the EXPAND phase leaves the attribute in place, and the unchanged privileged synthesis stays the implementation (byte-identical). They become genuine declarative macros once the procedural layer (D9) lands. All four requireuse std::rel::{…}(nothing is ambient — RFD 0038); a bare one is refused (transitive: OE0705; the checks: OE1362).
- MLT decorators (
#[categorizes(T)]…): re-home the surface tostd::mltvia the Rust#[rustc_builtin_macro]pattern — a library stub owns the surface (name, stability, visibility, importability, docs; byte-identical per RFD 0009 RP-003 GAP-3) while the expander stays a privileged compiler builtin that emits theMetaPropertyevent (which has no user-writable surface). The expander flips to library Argon only when the analytic layer (D9) lands and an axis-assertion surface exists — both phases surface-stable. std::temporal/ navigation /std::lifecycle: declarative library macros; a native-or-library scoping choice per operator once the engine exists.
D9 — Collisions, identity, and the deferred analytic layer
- Collision resolution: uniqueness-at-registration (Lean model) — one implementation per
#[name]; a user macro colliding with a builtin is anOE0705-class error, not a silent shadow (matches Argon’s loud-refusal posture and the existing ambiguous-refuse resolver).use mod::foois the module-qualified escape for distinct cross-package names. Identity is keyed to a stable diagnostic-item-style handle, not the path, giving resolution-invariance under re-homing. Implemented for duplicate macro definitions — twomacro namedeclarations in one module are refused with OE0726, never silently merged into one invocation name; builtin-name collision in the shared#[…]attribute namespace follows with the attribute-macro surface. - The procedural / analytic layer is deferred, with its end-state shape committed: a total,
structurally-recursive meta-language over reflected syntax — not general-purpose Argon. This is
the correct end-state because totality is what the substrate’s own doctrine demands (deterministic
content-addressed builds, decidability tiers, Lean-canonical): it is strongly-normalizing and
deterministic by construction, and — being a total function over an inductive
Syntaxtype — it is the natural object for the D7 tier-honesty theorem. A Turing-complete compile-time Argon would be more powerful and less correct. The layer’s trigger is the first genuinederive-class client (inspect a concept’s fields), and its acceptance test is the MLT expander flip.
v1 scope and the two acceptance tests
In v1: the EXPAND phase (D1); the declarative pub macro engine (structured pattern→template
grammar, fragment specifiers per D5, expand-to-surface→reparse per D2); scope-set/binding-space
hygiene (D3); alpha-canonicalization (D4); invocation surface (name!(…) parser + MACRO_INVOCATION
node; #[name(args)] argument plumbing, lifting OE0709 for macro-bearing attributes); the
relation-property re-home (D8); the MLT surface stub (D8).
Deferred: the analytic/procedural layer and derive-class (D9); type-directed/elab-class
expansion (no forcing client; the real ergonomic need — legal scoping — is context-directed, which
binding-space resolution already serves); standpoint specifier; unhygienic!.
v1 ships only when both tests pass:
- Migration proof (byte-identical):
#[transitive](and the relation-property family), re-homed from the hard-coded synthesis to a librarypub macro, produces byte-identical lowered events vs the current path — a differential test, trivially satisfiable because the template is the same and D4 makes variable naming hash-irrelevant. - Grow-a-language proof: at least one genuinely user-defined
pub macroexpands end-to-end through the full pipeline (parse → EXPAND → resolve → check → instantiate → classify), clearing the §3.4 gate and landing on its true tier.
What this RFD does not leave implicit
- Expansion is a phase, not an instantiate-time rewrite (D1) — the verified architectural correction.
- Alpha-canonicalization (D4) is a committed correctness fix, prerequisite to the clean hygiene story and valuable independently.
- The analytic layer’s end-state shape (total recursion over reflected syntax) and its trigger
(first
derive-class client; MLT-expander-flip acceptance test) (D9). - The
originbreadcrumb for diagnostics (mapping a lowered event back to its macro invocation) is metadata excluded from the semantic content hash, carried in a separate diagnostic index — decoupling reproducibility (D6), re-homing identity (D9), and diagnostics. No full unexpander in v1. - The
$x:rulefragment’s sub-binding hygiene (D5) — the prototype’s hardest case.
Open items for ratification
- The exact content-derivation function for introduced-vocabulary scopes (D3).
- The fragment-specifier grammar for the macro body (D5) — the parser work that replaces the opaque-token-blob body.
- Whether
std::temporal/std::lifecycleoperators land native or library in the first cut (D8) — a scoping call, not a design blocker. - The driver/salsa shape of the
EXPANDphase (D1) — incremental re-expansion granularity.