Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD 0015 — mutate body surface: EdgeQL-shaped, set-semantic

  • State: committed
  • Opened: 2026-06-03
  • Decides: the surface and semantics of the imperative mutate body that §7.5 promises but the v0.1 implementation never shipped (issue #15) — require guards, let bindings, insert-as-expression with named-field entity construction, update … set { = / += / -= }, insert … into …, for iteration, and return; the collection model these operate over (first-class, stored, generic) and its determinism contract (adopts RP-006 Track A / D-133); the subset that ships first to unblock the overlay vs. the full surface; and the commitment to mechanize mutation execution semantics in Lean. Rejects the legacy orca do { } / retract { } / emit { } clause structure. Relates to RFD 0001 (identity), RFD 0006 (field mutability), RFD 0011 (aggregates in guards).

Question

§7.5 documents a full imperative mutate body — preconditions, local bindings, typed-literal entity construction, collection inserts, control flow, and return. §7.5.1 admits the v0.1 implementation ships only a datalog-style subset (insert iof, relation-tuple insert, simple-target update … set, delete, forget) and that everything else “is designed but not yet implemented in the parser, elaborator, or runtime.” Two production mutations in packages/overlay (Ayush & Aryan’s materializeExpectedSatisfaction / recognizeSatisfaction) cannot be expressed without the missing forms; the parser reports OE0001 unexpected token … at module level the moment it meets let/for/return after a require block.

What is the mutate body’s surface and semantics, what does it operate over, and what ships first?

Context

The blocker. The overlay needs, per mutation: a precondition (one with an aggregate — occurrence.value == sum(r.value for r in records)), construction of fresh entities with named fields whose handles are reused downstream (let record = insert ExpectedSatisfactionRecord { … } then timeInterval: period), appending to a collection at a nested path, per-element posting under a for, and a returned result.

Why it doesn’t parse. mutation_stmt (compiler/crates/oxc-parser/src/grammar.rs) recognizes only insert/delete/forget/update; any other leading token falls through recovery, the require { } block’s } is mistaken for the body’s, and the remaining statements are re-parsed at module level.

Three layers, only one of them easy.

  1. AST — the Lean already mechanizes the whole imperative body (MutateDecl { require, body : List Stmt }; Stmt = letStmt | insertStmt | updateStmt | forLoop | ifStmt | returnStmt | exprStmt | …; InsertForm = typedLiteral | namedTyped | intoCollection | relation), all @[language_interface]-tagged, so the Rust AST is generated. The grammar exists; the parser just doesn’t build it.
  2. Execution — today a body lowers to a flat Vec<Operation> (a datalog DML batch: InsertIof/InsertTuple/Update{set}/Delete/Forget). There are no bindings, no data-flow between statements, no entity minting, no control flow, no return. Individuals arrive as parameters and get classified/updated; nothing is created. This is the real gap.
  3. Substrate (Lean)mutate is deliberately not a substrate rule (lowerRule returns none; it is a “host-runtime entry point”), and its execution semantics are unspecified.

The legacy precedent we reject. The orca-era vault design (D-064) gave mutations a five-clause COBOL-like structure: require { } / do { } / retract { } / emit { } / return. The new Argon already dropped the divisions (the Lean has a flat body : List Stmt, not separate do/retract clauses). This RFD finishes that move in the opposite stylistic direction from imperative paragraphs: SQL/EdgeQL-shaped statements, not COBOL divisions.

The settled direction (this RFD’s design discussion, 2026-06-02/03). The mutate body is imperative-looking but set-semantic, rhyming with EdgeQL: named-field insert as a value, update … set with collection ops, for as set-mapped iteration, guards as preconditions. Collections are first-class and stored (not derived-from-edges) — Set/Map/List are already declared generics in §6 — and governed by a determinism contract (below). Rust-like return types and error handling (Option/Result/?) are explicitly out of scope here.

Decision

1. The body is a sequence of statements; insert is an expression

mutate-decl   ::= attribute* 'pub'? 'mutate' Ident generic-params? '(' param-list ')'
                  ('->' TypeExpr)? mutate-body
mutate-body   ::= '{' stmt* tail-expr? '}'
stmt          ::= require-stmt | let-stmt | update-stmt | insert-stmt
                | delete-stmt | for-stmt | expr-stmt | return-stmt
require-stmt  ::= 'require' (expr | '{' expr (',' expr)* ','? '}') ';'
let-stmt      ::= 'let' Ident (':' TypeExpr)? '=' expr ';'
update-stmt   ::= 'update' expr (':' TypeExpr)? 'set' '{' field-assign (',' …)* '}' ('where' expr)? ';'
insert-stmt   ::= 'insert' insert-form ('since'|'during'|'at' expr)? ';'
insert-form   ::= TypeExpr '{' field-init (',' …)* '}'        // typed-literal: construct + return entity
                | Ident ':' TypeExpr '{' field-init (',' …)* '}'
                | Ident 'into' expr                            // collection add (sugar — see §4)
                | Path '(' arg-list ')'                        // relation tuple / iof — existing
delete-stmt   ::= 'delete' delete-form ('at' expr)? ';'
for-stmt      ::= 'for' Ident 'in' expr '{' stmt* '}'
return-stmt   ::= 'return' expr ';'
field-assign  ::= Ident ('=' | '+=' | '-=') expr               // set / collection-add / collection-remove

insert TypeExpr { … } is an expression that evaluates to the freshly-constructed entity (EdgeQL / SQL RETURNING). This is forced by the binding case (let record = insert … { … } then referencing record), and it makes entity construction compose. Constructing a fresh entity mints a fresh identity per RFD 0001.

2. Return: tail expression, or explicit return (which stays first-class)

  • A mutate with -> T must produce a T. The body’s tail expression (Rust’s rule — no trailing ;) is that value; since insert is an expression, ending the body with an insert T { … } returns the new entity with no keyword.
  • A value bound with let is returned by a bare tail (sr) or by explicit return sr;.
  • return expr; is first-class, not merely an early-exit affordance — it is what makes returning collections, tuples, and (later) Option/Result readable.
  • No -> T ⇒ the mutate returns unit; the body is pure effects.
  • “Return everything inserted” is rejected as a default (fragile under reordering, ambiguous for multiple inserts). To return several things, the tail is an explicit tuple or collection.

3. Collections are first-class, stored, and governed by determinism-by-observability

Adopts RP-006 Track A / D-133:

  • The invariant: no language-observable result (return values, serialization, the mutation event-log order) may depend on hash-derived order. This is stronger than “the implementation is internally deterministic” and is the correct knob — a randomly-seeded hash table is admissible iff order never leaks; a fixed-seed one still couples program meaning to the hasher the moment order is observed, breaking reproducibility across Argon’s in-process / Postgres / DBSP backends.
  • Order-sensitivity is a type property. Vec/List are ordered (order is the contract). Set/Map are unordered: the contract forbids observing order; where an observable sequence is unavoidable the language yields a canonical key-sorted order, never hash order.
  • Default backing: BTreeSet/BTreeMap (canonical iteration for free; no rehash spikes). A SwissTable/hashbrown Map is the opt-in fast path for lookup-heavy, order-insensitive use. Elastic/funnel hashing (#[dense]) is deferred pending a measured high-density win (RP-006 §5, benchmark harness pending).

4. insert … into … is kept as sugar

insert x into <coll-path> is retained (users coming from SQL INSERT INTO expect it) and desugars to a collection add: insert x into a.b.recordsupdate a.b set { records += x }. The canonical/primitive form is update … set { coll += / -= … }; insert into is surface sugar over it. (Reverses this RFD’s draft decision to drop it.)

5. for is set-mapped, and the event-order leak is closed

for x in coll { … } applies its body per element. To keep the mutation event-log order deterministic (replay, provenance), iteration over an unordered collection proceeds in canonical key-sorted order by default (RP-006 §A.3). When the analyzer can prove the body’s per-iteration effects are commutative (independent updates to disjoint targets — e.g. the overlay’s for r in records { update r.account … } when the r.account targets are distinct), the canonical sort may be elided. DBSP’s Z-set deltas may discharge this automatically for IVM-backed mutations (open question).

6. Atomicity

A mutate body commits as one transaction: all of its effects, or none. A failed require aborts with no events emitted. (Consistent with the append-only event-log substrate.)

7. What ships first

The immediate subset — exactly what compiles residential_lease.ar:

FormIn first push
require expr; and require { e, … };, incl. sum / count aggregate (RFD 0011)
let x = insert Type { … }; (insert-as-expression + binding + identity minting)
update <nested-path> set { f = e, coll += e }
insert x into coll.path; (sugar → +=)
for x in coll { … } (effect; canonical iteration)
tail-expression / return expr;; atomic commit; [x] literals, indexing, field paths
delete/upsert set-patterns; if/match in bodies; for as a returning comprehension✗ later
full Rust-like collection stdlib API; #[dense]/elastic backing✗ later
emit / detach delete / forget integration; bitemporal qualifiers on the new forms✗ later
Rust-like return types + error handling (Option/Result/?)✗ later

Ayush & Aryan’s mutations in the shipped form:

pub mutate materializeExpectedSatisfaction(
    pair: CorrelativePositionPair, perPeriodValue: Real,
) -> MaterializedExpectedSatisfaction {
    require perPeriodValue > 0;
    let period = insert TimeInterval {
        start: pair.propositionalContent.recurrence.startsOn,
        end:   pair.propositionalContent.recurrence.endsOn,
    };
    let record = insert ExpectedSatisfactionRecord {
        account: pair.book.expectedSatisfactionAccount, value: perPeriodValue,
        timeInterval: period, allenRelator: AllenRelationType::Before,
    };
    insert record into pair.book.expectedSatisfactionAccount.records;   // sugar → update … set { records += record }
    insert MaterializedExpectedSatisfaction {                           // tail = return value
        records: [record], expectedAccount: pair.book.expectedSatisfactionAccount, postedCount: 1,
    }
}

pub mutate recognizeSatisfaction(
    occurrence: OccurrenceEvent, records: [SatisfactionRecord],
) -> SatisfactionRecognition {
    require occurrence.value == sum(r.value for r in records);
    for r in records { insert r into r.account.records; }              // per-element, canonical-ordered
    insert SatisfactionRecognition {
        recordedValue: occurrence.value, satisfactionAccount: records[0].account,
    }
}

8. Lean is a committed deliverable, not a vague “later”

To unblock the overlay we ship the parser + elaborator + runtime for the immediate subset ahead of the mechanized semantics. But the substrate work is a firm obligation, tracked, not optional: (a) refine the Lean AST (insert as expression; +=/-= field-assign ops; for set-semantics); (b) mechanize mutation execution semantics (entity construction & identity, statement sequencing with bindings, collection ops, atomic event emission); (c) the soundness obligation flow-typing already assumes (FlowTyping.lean states, without proof, that mutate writes respect the monotone fixpoint discipline). Per AGENTS.md this is Lean-first substrate work; the only concession is that code may land first to unblock, with the Lean following.

Rationale

  • EdgeQL/set-semantic over imperative-procedural. Argon’s working substrate is already relational (relation-tuple and iof edges); a set-at-a-time, expression-valued mutation surface is continuous with it and with the language’s declarative core, where a COBOL-style statement machine would not be. Crucially, the choice keeps ~90% of the syntax §7.5 already documents while changing its meaning — far less throwaway work than a ground-up redesign, and the Lean AST is already close.
  • insert as expression is not a stylistic preference; it is forced by let x = insert … + downstream reuse. Once forced, the return design (tail expression) falls out, and “what if multiple inserts” stops being ambiguous because only the tail is returned.
  • Determinism-by-observability (RP-006 Track A) lets the language be deterministic and free to use any map backend, by legislating observability rather than implementation. B-tree default because canonical iteration — on the hot path for a KR language — is free, while a hash table pays an O(n log n) sort to honor the same contract.
  • Keep insert into because the cost is one desugaring rule and the benefit is meeting SQL-trained expectations; the canonical += keeps the core small.

Alternatives

  • Build §7.5 imperative as literally documented (document/OOP model). Rejected: it bakes “entities own arrays you imperatively append to” into the language, away from the relational/EdgeQL direction, and still requires the entire new execution model — so it is not actually cheaper.
  • Pure-graph: make collections derived from edges (no stored collections). Rejected by decision: collections are first-class and stored (Set/Map/List), and the stdlib provides them. insert into/for are legitimate operations on stored collection fields, not smells.
  • Drop insert into, canonicalize only on +=. Considered and reversed — kept as sugar for ergonomics.
  • Auto-return the last/all inserts. Rejected: fragile and ambiguous.
  • EdgeQL with … select with no return keyword. Rejected: return must stay first-class for Option/collection/tuple returns and future error handling.
  • Elastic/funnel hashing as default backing. Deferred: real result (arXiv:2501.02305) but the win is at high load factors and no hardened implementation exists; gated on a measured Argon-workload win (RP-006).

Consequences

  • Parser: mutation_stmt grows from four verbs to the full statement set; insert becomes expression-valued; require/let/for/return/tail-expression parsing; struct-literal-in-tail disambiguation (the Rust if x { S {} } wrinkle).
  • AST/elaborator: consume the (already-generated) Stmt/InsertForm nodes; thread an environment for let bindings and insert handles; lower the new forms — the flat Vec<Operation> model grows into a statement/expression evaluation with data-flow.
  • Runtime: entity construction with identity minting (RFD 0001); collection field +=/-=; for evaluation with canonical ordering; atomic multi-effect commit; tail/return value.
  • Types: the ordered/unordered collection contract (§3) governs all collection use, not just mutations; it should also be reflected in §6 and the type-system Lean. sum/count in guards interacts with RFD 0011 under OWA.
  • Stdlib: Set/Map/List implementations + a Rust-like API (parallel workstream; the first push needs only += and list literals).
  • Reference: §7.5 / §7.5.1 rewritten from “designed, not implemented” to the shipped surface + the explicit later-list.
  • Lean: the obligations in §8 are now tracked work, including closing the flow-typing assumption.
  • Research: RP-006 (collection-backend benchmark; Bayesian; topology) proceeds in parallel; none gates this.

Open questions

  1. Identity minting details for insert Type { … } — resolved by RFD 0001; confirm the runtime hook and how a minted id appears in the event log / provenance.
  2. Are relation/extent tables insert-only-then-read-mostly? (The #[dense]/elastic precondition — RP-006 Q1.)
  3. Does iteration dominate the reasoning workload? Validate with a real op-trace before finalizing the B-tree-vs-hashbrown default (RP-006 Q2).
  4. Do DBSP Z-set commutative-monoid deltas discharge the for event-order mitigation automatically for IVM-backed mutations? (RP-006 Q3.)
  5. Postgres bridge order — can canonical iteration order be pushed into SQL ORDER BY? (RP-006 Q4 — the riskiest leak surface.)
  6. if/match in bodies and for-as-returning-comprehension — surface + value semantics, deferred past the first push. Resolved by Amendment 1.
  7. Error handlingOption/Result/? and Rust-like return types; their own RFD.

Amendment 1 (2026-06-05) — control flow is expression-valued; bodies and branches are blocks (resolves OQ #6)

OQ #6 left the surface and value semantics of if/match in mutate bodies open. While that gap stood, the two representations drifted: the Lean surface AST split the construct into a value form (Expr.ifExpr / Expr.matchExpr, with bare-Expr branches) and an effect form (Stmt.ifStmt / Stmt.matchStmt, with List Stmt branches), whereas the Rust grammar modelled a single IF_EXPR / MATCH_EXPR over BLOCK_EXPR branches. This amendment settles it.

Decision. if and match are expressions, and their branches/arms are blocks. A block { s₁; … ; sₙ tail? } is itself an expression: it executes its statements for effect and evaluates to its optional trailing expression, or to unit () when there is none (Rust’s block rule, continuous with §1–§2’s tail-expression return and insert-as-expression). The mutate body is a block. There is exactly one if and one match form — a value use (let x = if c { a } else { b }) and an effect use (if c { insert X } else { insert Y }) are the same construct, distinguished only by whether the branch blocks have a non-unit tail.

Why (not the split).

  • Rust/Cargo aesthetic — Rust has one if, an expression; Argon defaults to that idiom.
  • Consistency — §1/§2 already chose expression-orientation (tail-expression return, insert as an expression). A statement-only if contradicts it.
  • Expressiveness — the split cannot express let x = if c { let y = f(); g(y) } else { h() }: a bare-Expr branch admits no statements, and a statement-if yields no value. Block branches give both. The split is a strict expressiveness loss, for no gain.
  • No redundancy — one construct, not a value/effect pair that must be kept in sync.

AST / Lean consequence. Add Expr.block (stmts : List Stmt) (tail : Option Expr). Expr.ifExpr / Expr.matchExpr keep Expr-typed branches/arm-bodies (which now admit block), so their signatures are unchanged. Remove Stmt.ifStmt and Stmt.matchStmt — an effectful if/match is Stmt.exprStmt (ifExpr …) over block branches. (for remains a unit-valued effect statement for now; promoting it to an expression is uniform but not required, since it never produces a value — tracked, not blocking.) Because a block-expression runs statements, expression- and statement-evaluation become a single mutual clique; Argon.Runtime.MutationSemantics and its theorems (runMutation_error_noop atomicity, the fresh-monotonicity family) are re-mechanized against the merged induction. The @[language_interface] surface mirror and CoreIR lowering update accordingly.

Code consequence. The parser already yields IF_EXPR / MATCH_EXPR / BLOCK_EXPR. The elaborator routes a block to its body-op sequence and an effectful if/match to control-flow IR (Operation::If / Operation::Match over the branch blocks’ operations), while a pure-value if lowers to Term::IfExpr (evaluated by resolve_term_to_value). The _ => {} catch-all in mutate_lower::lower_body_stmt, which silently dropped unrecognized statements, is replaced by a loud diagnostic. Tracked in #74.

Amendment 2 (2026-06-09) — value-position match realized by the IfExpr desugar; constant-pattern subset; OE0203

Amendment 1 prescribed an Operation::Match control-flow IR alongside Operation::If. The implementation that landed realizes the value half of match differently — and the prescription is amended to match it.

Decision. A value-position match over constant patterns desugars at elaboration to the existing right-nested Term::IfExpr chainmatch s { P1 => v1, P2 => v2, _ => v3 }if s == P1 { v1 } else if s == P2 { v2 } else { v3 } — with no new IR (no protocol/drift change). For constant patterns the two semantics are identical: ordered first-match, the final arm’s value as the else-branch. The executable pattern subset is payloadless enum constant paths, Int/String/Bool/Date literals, or-patterns of those (consecutive conditions sharing a body), and the wildcard _; the richer §14 forms (binders, payload/record patterns, guards, type tests, is-outcomes) are refused loudly (OE1319) until they land. Exhaustiveness is required (OE0203 at ox check): a final _ arm, or full coverage of the scrutinee’s enum variants when statically known. The same desugar serves every value position — fn bodies, rule-body comparison operands (compiled to CompiledExpr::If in the reasoner), and mutate let RHS / update values / return values / require guards (evaluated by the existing Term::IfExpr arm of resolve_term_to_value); the Lean evalExpr gains the matching .matchExpr arm (evalMatchArms / patternMatchesConst).

Statement-position match (arms running effects) is now realized the same way — no Operation::Match: it lowers to a right-nested Operation::If chain over the arm blocks’ operation sequences (match s { P1 => { ops₁ }, P2 => { ops₂ }, _ => { ops₃ } }If(s == P1) { ops₁ } else { If(s == P2) { ops₂ } else { ops₃ } }), reusing the value desugar’s pattern classification and exhaustiveness (same constant subset, same OE0203 / OE1319 sinks; or-patterns expand to consecutive conditions sharing the arm’s operations; a wildcard-only match splices its arm unconditionally behind a hidden Operation::Let of the scrutinee, so an aborting scrutinee — a missing-field projection, say — aborts exactly as in the chained case, matching Lean’s evaluate-scrutinee-first order). Scrutinee Terms are pure, so per-link re-evaluation is effect-free. One scoping caveat: the Lean matchStmt/ifStmt semantics scope arm/branch-local bindings, while the Rust runtime threads one flat environment and currently leaks them past the match/if (pre-existing, recorded drift — #196). The enabling parser change threads an in-mutate-body flag so blocks at any nesting depth (match arms, if branches, for bodies) admit the full mutate statement set — which also fixed update not parsing inside if branches; blocks in value position (a let RHS block, a value-if branch, a value-match arm) refuse effectful statements loudly (OE1321) since value lowering reduces a block to its tail expression. On the Lean side, Stmt.matchStmt gains direct execution semantics (evalMatchStmtArms: ordered first-match, the selected arm’s block runs for effect, return propagates), intended-equivalent to the If-chain by construction. Stmt.matchStmt / Stmt.ifStmt are NOT yet removed as Amendment 1 ordered: the A1 encoding (exprStmt over Expr.block arms) needs block evaluation inside evalExpr, which merges the expression/statement interpreter cliques and requires evalExpr to carry control flow for return propagation — exactly the merged-clique re-mechanization A1 itself anticipated; doing it halfway would regress return-in-branch. The removal stays tied to that re-mechanization; the drift is documented at both constructors and in Argon.Runtime.MutationSemantics. OE1318 (Module::unsupported_mutation_forms) remains the gate for the genuinely unexecutable statement forms — emit now parses (EMIT_STMT) and lowers to Operation::Unsupported so it is refused there rather than as a generic parse error.