Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD 0029 — Derived values and aggregate terms: body-level binding, aggregate sources, rounding

  • State: accepted — implemented in this PR
  • Opened: 2026-06-12
  • Decides: the quantitative-modeling wall of the v0.2.1 program — that a rule can derive a computed numeric value and compare two aggregates. One mechanism: a body-level binding atom x = expr (single =, distinct from the comparison ==) that binds a fresh variable to the value of expr, where expr ranges over bound vars, literals, field projections, arithmetic over the exact tower, and aggregate expressions. Aggregate sources extend the existing comprehension form with comma-separated body atoms and admit relation atoms as sources; the brace form stays count/exists-only. Grouping is the outer bound variables (the standard Datalog reading). A rounding builtin family (round, round_half_even, trunc) is added to rule-body expressions and the fn/compute plane, banker’s rounding the money default. The exact-Decimal mutate-body status note is killed (the path already routes through the exact canonical core). The CLI/demo harness accepts decimal arguments for Decimal/Money/Real params. Settles register items R-B2, R-B3, R-B4, R-M4, R-M5, R-M7, and the harness half of R-B10.
  • Prior art: Datalog/Soufflé assignment (x = expr binds; range-restriction over the computed value) — Soufflé’s = constraint and arithmetic functors are the direct precedent ([Jordan, Scholz, Subotić, Soufflé: On Synthesis of Program Analyzers, CAV 2016]; the Soufflé manual’s “Assignments” and “Aggregates” chapters). Stratified-aggregate semantics follow Faber–Pfeifer–Leone 2011 (the aggregated predicate must live in a strictly-lower stratum).
  • Layer: language surface — RFD + reference draft → Lean → code (workflow table). The reasoner is the layer where Rust leads (AGENTS.md); the Lean obligation is recorded for #134’s catch-up scope, not mechanized here.

1. The problem

On origin/main @ v0.2.0, tax and accounting are not authorable. The domain-modeling lane proved it against the binary:

  • A rule head cannot carry a computed value. derive Tax(t, owed) :- appliesTo(b, t), owed == t.income * b.rate refuses with OE1303 — == is a filter over already-bound operands, never an assignment, so owed is never positively bound (R-B2).
  • Two aggregates cannot be compared. The double-entry invariant sum(debits) == sum(credits) is inexpressible: a comprehension aggregate parses only as the RHS of a field-path comp-op ___ comparison; on the LHS or in leading position it is OE0001 (R-B3).
  • Aggregates have no relation source and no grouped form. sum(amt for amt in rel(e, _, amt)) refuses (the comprehension source must be a materialized collection field); group by … having refuses with OE0007 (R-B4, R-M7).

Three adjacent gaps compound it: the §7 status claims mutate-body arithmetic is Real/f64 with “exact Decimal pending” (R-M4); there are no rounding primitives in any plane (R-M5); and the demo/CLI harness rejects a decimal mutation argument outright (R-B10, harness half).

2. The design — one mechanism

2.1 Body-level binding: x = expr

A new rule-body atom form binds a fresh variable to the value of an expression:

pub derive Tax(t, owed) :- appliesTo(b, t), owed = t.income * b.rate;

x = expr (single =) is assignment, distinct from x == e (double =, a filter). expr ranges over: bound variables, literals, field projections (t.income), arithmetic over the exact tower (a * b, a + b, exact /), nullary/other intrinsics, and aggregate expressions (§2.2).

This is the established PL concept — Datalog/Soufflé assignment. It is not a new IR constructor: the binding lowers to the existing AtomIR::Compute { result, expr } (the RFD 0020 D5 Map operator), whose result variable the range-restriction safety check already treats as a binding source (a Compute result is bound once its input variables are bound, closed to a fixpoint). The work is purely in the parser (recognize the = atom shape) and the lowering (emit Compute, relationalizing any Proj/aggregate sub-terms exactly as the existing comparison-operand hoist does).

Freshness and range restriction. The LHS x must be a FRESH variable, and the binding must be range-restricted. Two distinct refusal paths, never a silent Null:

  • Freshness (OE1335 BindingLhsAlreadyBound). If x is already bound — by a prior positive predicate atom, a head parameter bound elsewhere, a projection, an aggregate result, or an earlier binding — the = was silently degrading into an equality FILTER (x joined against the computed value) rather than a binding, collapsing the two readings of = (bind vs. compare) with no diagnostic. It refuses, naming x: use == to compare, or pick a fresh name. This is the path a rebind x = x + 1 takes when x is otherwise bound — the LHS is not fresh.
  • Range restriction (OE1303 RuleNotRangeRestricted). x = expr binds x positively iff every variable in expr is itself positively bound. An unbound RHS variable, or a pure self-reference / cycle where the result variable is bound by nothing else (x = x + 1 as x’s only binder; x = y, y = x), leaves x unbound under the binding fixpoint and refuses as unsafe. OE1303’s wording is extended to name the binding form.

So x = x + 1 refuses via OE1335 (freshness) when x is already bound elsewhere, or via OE1303 (range restriction) when x has no other binder — the two codes name two genuinely different errors. The freshness check runs first, so the freshness diagnostic wins when both could apply.

Why x = expr, not aggregate-on-LHS comparison sugar. a = sum(...), b = sum(...), a == b already solves the leading/LHS aggregate position (R-B3) with the same mechanism. We do not add a second spelling (sum(...) == sum(...) as direct comparison sugar): one mechanism, one spelling (house rule — nothing silently dropped, nothing redundantly admitted). A user who writes sum(...) == sum(...) directly is hoisted by the existing aggregate-operand relationalization (the aggregate becomes a Compute-bound var), so it also works — but the canonical, documented form is the binding.

2.2 Aggregates as bindable expressions

An aggregate call is a bindable expression:

pub derive trial(acct, bal) :- acct: Account,
    debits  = sum(e.amount for e in Entry, posted(e, acct), e.side == "D"),
    credits = sum(e.amount for e in Entry, posted(e, acct), e.side == "C"),
    bal     = debits - credits;

Because the aggregate binds a variable, comparing two aggregates is just comparing two bound variables (R-B3): a = sum(...), b = sum(...), a == b.

2.3 Aggregate sources — extend the comprehension, don’t add a form (R-B4)

The existing comprehension sum(e.value for e in S where φ) is extended so the body after the source is a comma-separated list of additional body atoms, replacing the single where φ:

agg( proj for binder in Source [, atom]* )
  • Source may be a type extent (for e in Entry) or a relation atom is expressed as one of the trailing atoms (posted(e, acct)).
  • The trailing atoms are ordinary rule-body filter atoms (predicates, comparisons, type tests, membership) lowered with the same machinery as the outer body.
  • Variables from the outer rule body are visible inside the comprehension. That visibility is the grouping (§2.4).

The legacy single-where form (… where φ) stays accepted as sugar for one trailing atom, so every existing example keeps parsing.

Bindings are NOT a trailing-atom form (decided under the no-hollow rule). An earlier draft of this enumeration listed nested bindings as a trailing-atom kind. They are not implemented and are removed from the design: the comprehension cond is a boolean-filter chain (right-nested &&), which has no assignment shape, and a fresh variable introduced inside an aggregate sub-scope is not safety-checked there (the outer range-restriction and freshness checks do not descend into aggregate bodies — a recorded conservative gap). Admitting a half-checked binding inside the fold would open a new silent-unsafe surface, and the useful shape is already expressible without it: sum(w for u in T, rel(t, u), w = u.v) is exactly sum(u.v for u in T, rel(t, u)) (project into the fold directly), and anything richer binds in the OUTER body and aggregates the bound variable. So a binding x = expr in a trailing position is refused with a directed hint, OE1336 BindingInComprehension, naming both rewrites — rather than a generic parse error from the unconsumed =. The comparison == is a real trailing filter and is unaffected.

The brace form stays count/exists-only. sum(expr){atoms} does not parse as a value aggregate; it refuses with a directed hint pointing at the comprehension form (OE1331). We do not add sum(expr){…} — one spelling for value aggregates (the comprehension), one for cardinality (the brace). This keeps the brace’s existing count { p in Person, R(a, p) } membership-join shape (tracked separately under #308) the only brace surface.

2.4 Grouping = the outer bound variables (R-M7)

Grouping is the standard Datalog reading: the group key is exactly the set of outer rule-body variables free in the aggregate. pub derive balance(acct, s) :- acct: Account, s = sum(e.amount for e in Entry, posted(e, acct)); groups per acct — the aggregate is evaluated once per binding of the outer variables, and acct appears in the aggregate body (posted(e, acct)), so each acct gets its own fold. This already matched the reasoner’s evaluation model (the aggregate sub-pipeline is seeded with the outer binding); the binding atom makes the result nameable so it can flow to the head or a second aggregate.

The SQL-ish group by … having keeps refusing (OE0007) but now with a directed hint showing the binding-form equivalent — a group by acct having sum(x) > 0 becomes acct: Account, g = sum(x for …), g > 0. We do not build group by: the Datalog grouping is the spelling, and a second one would be a silent-redundant surface.

Empty-group semantics — a deliberate split. When a group folds over no rows, the aggregators divide by role, and this is the documented behavior the double-entry example relies on:

  • sum / count / count_distinct emit a value for the empty group0. An additive/cardinality fold has a well-defined identity (the empty sum is zero, the empty count is zero), so the grouped row is produced with that identity.
  • min / max / avg drop the row — they have no value over an empty set (no least/greatest element; the mean is 0/0), so no grouped row is produced rather than fabricating one.

This split is load-bearing for the double-entry invariant (§4): sum(debits) == sum(credits) must catch an entry with credits but no debits. Because sum emits 0 for the empty debit side, the comparison is 0 == credits — which fails and flags the entry as unbalanced. Were sum to drop the empty group, the row would vanish and the imbalance would pass silently. The behavior is fixed in fold_aggregate (compiler/crates/oxc-reasoning/src/executor/eval.rs) and pinned by a test asserting BOTH halves (empty_group_split_sum_count_emit_zero_minmaxavg_drop).

2.5 Semantics and stratification

  • Stratification. An aggregate over a derived predicate requires the aggregated predicate stratified strictly below the rule (Faber–Pfeifer–Leone). This is already enforced: an in-SCC aggregate edge is recursion-through-aggregation and refuses loudly with OE1317 (issue #174). No change.
  • WFS interaction (#250). Aggregates evaluate over the definitely-true extent. If the aggregated atom set draws on a relation with a non-empty well-founded-undefined companion ($undefined::R), the fold would silently treat undefined as false — the #250 leak. We refuse the rule loudly at runtime for now (OE1332 / ReasoningError::AggregateOverUndefined) rather than fold undefined-as-false. #250 is the designed follow-up (three-valued aggregate intervals); this RFD does not close it, it stops the silent leak.

2.6 Exact-Decimal mutate-body path (R-M4)

The §7 status note (“mutate-body arithmetic is Real/f64; exact Decimal pending”) is false at HEAD and is removed. Mutate-body arithmetic (require guards, let/field expressions) already routes through eval_binary_values — the canonical exact core (RFD 0016) that the #272/#277 work made the single arithmetic path: pure-Int stays checked Int, any Real/Decimal operand promotes to exact BigRational, / is the exact field operation, no f64 in the arithmetic path. Money (carried as a structured CBOR value) folds through the same rational core. We delete the stale note and flip the §7 status row to ✓/✓, and add a money-arithmetic regression that proves 0.1 + 0.2 == 0.3 exactly in a mutate body.

2.7 Rounding (R-M5)

A builtin function family, available in both rule-body expressions and the fn/compute plane, exact-tower in/out (Decimal stays Decimal, never via f64):

BuiltinMeaning
round(x)nearest integer, ties away from zero
round(x, n)nearest multiple of 10^-n, ties away from zero
round_half_even(x, n)nearest multiple of 10^-n, ties to even (banker’s rounding)
trunc(x, n)toward zero at 10^-n

round_half_even is the money default — financial rounding rounds half to even to avoid the upward bias of half-away-from-zero. Documented in the book’s money section. All four operate on BigRational and return an exact value (integral results collapse to Int, as elsewhere in the tower). round / round_half_even / trunc are reserved builtin names: they resolve by surface name in both the rule plane (compile_expr) and the fn/compute plane (resolve_term_to_value), so a user-defined fn round does not shadow them. A wrong-arity call is OE1333 (RoundingBuiltinArity).

2.8 CLI/demo harness decimal arguments (R-B10, harness half)

demo.toml mutation args accept decimal values for Decimal/Money/Real params, two spellings:

  • A quoted decimal string via an explicit table form { decimal = "0.22" } / { money = "100.50" } — exact, parsed directly to the rational carrier (never via f64). This aligns with serve’s coerce_money_arg convention (a Money amount is a decimal string).
  • A bare TOML float (rate = 0.22) — converted via its shortest round-tripping decimal representation (the same value the user typed), then parsed exactly. The conversion is documented: a bare float is read as the shortest decimal that round-trips to that f64, so 0.22 becomes exactly 22/100, not the f64 artifact 0.2200000000000000011….

The serve-side wire half of R-B10 (decimal coercion at the HTTP boundary) belongs to arc2 and is not touched here.

3. Diagnostics

New codes (allocated next-free in the 13xx rule band, grammar.toml as truth, cargo xtask gen):

CodeNameMeaning
OE1331ValueAggregateBraceFormA value aggregate (sum/min/max/avg) was written in brace form sum(expr){…}; directs to the comprehension form sum(expr for x in S, …).
OE1332AggregateOverUndefinedAn aggregate folds over a relation with well-founded-undefined atoms; refused rather than silently treating undefined as false (#250 leak guard). The refusal is whole-relation, not per-group.
OE1333RoundingBuiltinArityA rounding builtin (round/round_half_even/trunc, §2.7) was called with the wrong number of arguments.
OE1334BindingLhsNotSimpleVarA binding atom x = expr (§2.1) has a non-fresh-variable LHS — a dotted projection (x.f = e) or a ::-qualified path (m::x = e); a binding’s LHS must be a bare identifier.
OE1335BindingLhsAlreadyBoundA binding atom x = expr (§2.1) names an LHS x already bound by a positive predicate, a head param, a projection, an aggregate result, or an earlier binding; the = would silently act as an equality filter. Use == to compare, or pick a fresh name. (Freshness path; see §2.1.)
OE1336BindingInComprehensionA binding x = expr appears as a comprehension trailing atom (§2.3), which is not a trailing-atom form; directs to project into the fold directly or bind in the outer body.

The Rust error variant for OE1333 is RuleCompileError::RoundingBuiltinArity, renamed from BuiltinArity to match the catalog name (only the rounding builtins reach this arity check).

Extended:

  • OE1303 (RuleNotRangeRestricted) — wording extended to name the binding form x = expr as a binding source and as a consumer of its RHS variables.
  • OE0007 (GroupByNotExecuting) — message extended with the binding-form equivalent hint.

4. Worked example: double-entry accounting

examples/double_entry_v0 is the journey’s exact wall made a running proof: accounts, journal entries with debit/credit lines, the balance invariant sum(debits) == sum(credits) as a check, and a trial-balance query that sums per account. It is corpus-pinned (an integration test asserts its extents) and demo.toml-driven.

5. Lean obligations

The reasoner is the layer where Rust leads (AGENTS.md): well-founded semantics, factorized aggregates, and the join machinery execute in Rust ahead of the Lean fixpoint. The binding atom and aggregate-source extension are evaluation-layer changes that ride the existing AtomIR::Compute / AtomIR::Aggregate constructors — already in the @[language_interface] wire shape — so no drift-gate change is required. Mechanizing the body-binding and grouped-aggregate semantics is added to issue #134’s catch-up scope (the Reasoning/Datalog/ layer), not mechanized in this PR.

6. Disposition against the register

ItemDisposition
R-B2 (derived value head)x = expr binding atom → AtomIR::Compute; head carries the bound var. LHS freshness enforced (OE1335) — a rebind-as-filter refuses loudly.
R-B3 (compare two aggregates)a = sum(…), b = sum(…), a == b via the binding atom.
R-B4 (aggregate sources)Comprehension extended: comma-separated trailing filter atoms + relation-atom sources; brace value-aggregate refused (OE1331). Trailing bindings are not a form — refused with a directed hint (OE1336); project into the fold or bind in the outer body.
R-M4 (exact mutate arith)Already exact via eval_binary_values; stale §7 note removed; money regression added.
R-M5 (rounding)round/round(_,n)/round_half_even/trunc builtins, exact tower, banker’s-default.
R-M7 (grouped aggregation)Grouping = outer bound vars (Datalog); group by … having refuses with binding-form hint.
R-B10 (harness half)demo.toml accepts { decimal/money = "…" } and bare floats (shortest-decimal).