RFD 0029 — Derived values and aggregate terms: body-level binding, aggregate sources, rounding
- State: accepted — implemented in this PR
- Opened: 2026-06-12
- Decides: the quantitative-modeling wall of the v0.2.1 program — that a rule can derive a
computed numeric value and compare two aggregates. One mechanism: a body-level binding atom
x = expr(single=, distinct from the comparison==) that binds a fresh variable to the value ofexpr, whereexprranges over bound vars, literals, field projections, arithmetic over the exact tower, and aggregate expressions. Aggregate sources extend the existing comprehension form with comma-separated body atoms and admit relation atoms as sources; the brace form stayscount/exists-only. Grouping is the outer bound variables (the standard Datalog reading). A rounding builtin family (round,round_half_even,trunc) is added to rule-body expressions and the fn/compute plane, banker’s rounding the money default. The exact-Decimal mutate-body status note is killed (the path already routes through the exact canonical core). The CLI/demo harness accepts decimal arguments forDecimal/Money/Realparams. Settles register items R-B2, R-B3, R-B4, R-M4, R-M5, R-M7, and the harness half of R-B10. - Prior art: Datalog/Soufflé assignment (
x = exprbinds; range-restriction over the computed value) — Soufflé’s=constraint and arithmetic functors are the direct precedent ([Jordan, Scholz, Subotić, Soufflé: On Synthesis of Program Analyzers, CAV 2016]; the Soufflé manual’s “Assignments” and “Aggregates” chapters). Stratified-aggregate semantics follow Faber–Pfeifer–Leone 2011 (the aggregated predicate must live in a strictly-lower stratum). - Layer: language surface — RFD + reference draft → Lean → code (workflow table). The reasoner is the layer where Rust leads (AGENTS.md); the Lean obligation is recorded for #134’s catch-up scope, not mechanized here.
1. The problem
On origin/main @ v0.2.0, tax and accounting are not authorable. The domain-modeling lane proved
it against the binary:
- A rule head cannot carry a computed value.
derive Tax(t, owed) :- appliesTo(b, t), owed == t.income * b.raterefuses with OE1303 —==is a filter over already-bound operands, never an assignment, soowedis never positively bound (R-B2). - Two aggregates cannot be compared. The double-entry invariant
sum(debits) == sum(credits)is inexpressible: a comprehension aggregate parses only as the RHS of afield-path comp-op ___comparison; on the LHS or in leading position it is OE0001 (R-B3). - Aggregates have no relation source and no grouped form.
sum(amt for amt in rel(e, _, amt))refuses (the comprehension source must be a materialized collection field);group by … havingrefuses with OE0007 (R-B4, R-M7).
Three adjacent gaps compound it: the §7 status claims mutate-body arithmetic is Real/f64 with
“exact Decimal pending” (R-M4); there are no rounding primitives in any plane (R-M5); and the
demo/CLI harness rejects a decimal mutation argument outright (R-B10, harness half).
2. The design — one mechanism
2.1 Body-level binding: x = expr
A new rule-body atom form binds a fresh variable to the value of an expression:
pub derive Tax(t, owed) :- appliesTo(b, t), owed = t.income * b.rate;
x = expr (single =) is assignment, distinct from x == e (double =, a filter). expr
ranges over: bound variables, literals, field projections (t.income), arithmetic over the exact
tower (a * b, a + b, exact /), nullary/other intrinsics, and aggregate expressions (§2.2).
This is the established PL concept — Datalog/Soufflé assignment. It is not a new IR
constructor: the binding lowers to the existing AtomIR::Compute { result, expr } (the RFD 0020 D5
Map operator), whose result variable the range-restriction safety check already treats as a binding
source (a Compute result is bound once its input variables are bound, closed to a fixpoint). The
work is purely in the parser (recognize the = atom shape) and the lowering (emit Compute,
relationalizing any Proj/aggregate sub-terms exactly as the existing comparison-operand hoist
does).
Freshness and range restriction. The LHS x must be a FRESH variable, and the binding must be
range-restricted. Two distinct refusal paths, never a silent Null:
- Freshness (OE1335
BindingLhsAlreadyBound). Ifxis already bound — by a prior positive predicate atom, a head parameter bound elsewhere, a projection, an aggregate result, or an earlier binding — the=was silently degrading into an equality FILTER (xjoined against the computed value) rather than a binding, collapsing the two readings of=(bind vs. compare) with no diagnostic. It refuses, namingx: use==to compare, or pick a fresh name. This is the path a rebindx = x + 1takes whenxis otherwise bound — the LHS is not fresh. - Range restriction (OE1303
RuleNotRangeRestricted).x = exprbindsxpositively iff every variable inexpris itself positively bound. An unbound RHS variable, or a pure self-reference / cycle where the result variable is bound by nothing else (x = x + 1asx’s only binder;x = y, y = x), leavesxunbound under the binding fixpoint and refuses as unsafe. OE1303’s wording is extended to name the binding form.
So x = x + 1 refuses via OE1335 (freshness) when x is already bound elsewhere, or via OE1303
(range restriction) when x has no other binder — the two codes name two genuinely different errors.
The freshness check runs first, so the freshness diagnostic wins when both could apply.
Why x = expr, not aggregate-on-LHS comparison sugar. a = sum(...), b = sum(...), a == b
already solves the leading/LHS aggregate position (R-B3) with the same mechanism. We do not add
a second spelling (sum(...) == sum(...) as direct comparison sugar): one mechanism, one spelling
(house rule — nothing silently dropped, nothing redundantly admitted). A user who writes
sum(...) == sum(...) directly is hoisted by the existing aggregate-operand relationalization (the
aggregate becomes a Compute-bound var), so it also works — but the canonical, documented form is
the binding.
2.2 Aggregates as bindable expressions
An aggregate call is a bindable expression:
pub derive trial(acct, bal) :- acct: Account,
debits = sum(e.amount for e in Entry, posted(e, acct), e.side == "D"),
credits = sum(e.amount for e in Entry, posted(e, acct), e.side == "C"),
bal = debits - credits;
Because the aggregate binds a variable, comparing two aggregates is just comparing two bound
variables (R-B3): a = sum(...), b = sum(...), a == b.
2.3 Aggregate sources — extend the comprehension, don’t add a form (R-B4)
The existing comprehension sum(e.value for e in S where φ) is extended so the body after the
source is a comma-separated list of additional body atoms, replacing the single where φ:
agg( proj for binder in Source [, atom]* )
Sourcemay be a type extent (for e in Entry) or a relation atom is expressed as one of the trailing atoms (posted(e, acct)).- The trailing atoms are ordinary rule-body filter atoms (predicates, comparisons, type tests, membership) lowered with the same machinery as the outer body.
- Variables from the outer rule body are visible inside the comprehension. That visibility is the grouping (§2.4).
The legacy single-where form (… where φ) stays accepted as sugar for one trailing atom, so every
existing example keeps parsing.
Bindings are NOT a trailing-atom form (decided under the no-hollow rule). An earlier draft of
this enumeration listed nested bindings as a trailing-atom kind. They are not implemented and are
removed from the design: the comprehension cond is a boolean-filter chain (right-nested &&),
which has no assignment shape, and a fresh variable introduced inside an aggregate sub-scope is not
safety-checked there (the outer range-restriction and freshness checks do not descend into aggregate
bodies — a recorded conservative gap). Admitting a half-checked binding inside the fold would open a
new silent-unsafe surface, and the useful shape is already expressible without it: sum(w for u in T, rel(t, u), w = u.v) is exactly sum(u.v for u in T, rel(t, u)) (project into the fold directly), and
anything richer binds in the OUTER body and aggregates the bound variable. So a binding x = expr in
a trailing position is refused with a directed hint, OE1336 BindingInComprehension, naming both
rewrites — rather than a generic parse error from the unconsumed =. The comparison == is a real
trailing filter and is unaffected.
The brace form stays count/exists-only. sum(expr){atoms} does not parse as a value
aggregate; it refuses with a directed hint pointing at the comprehension form (OE1331). We do not
add sum(expr){…} — one spelling for value aggregates (the comprehension), one for cardinality
(the brace). This keeps the brace’s existing count { p in Person, R(a, p) } membership-join shape
(tracked separately under #308) the only brace surface.
2.4 Grouping = the outer bound variables (R-M7)
Grouping is the standard Datalog reading: the group key is exactly the set of outer rule-body
variables free in the aggregate. pub derive balance(acct, s) :- acct: Account, s = sum(e.amount for e in Entry, posted(e, acct));
groups per acct — the aggregate is evaluated once per binding of the outer variables, and acct
appears in the aggregate body (posted(e, acct)), so each acct gets its own fold. This already
matched the reasoner’s evaluation model (the aggregate sub-pipeline is seeded with the outer
binding); the binding atom makes the result nameable so it can flow to the head or a second
aggregate.
The SQL-ish group by … having keeps refusing (OE0007) but now with a directed hint showing the
binding-form equivalent — a group by acct having sum(x) > 0 becomes
acct: Account, g = sum(x for …), g > 0. We do not build group by: the Datalog grouping is the
spelling, and a second one would be a silent-redundant surface.
Empty-group semantics — a deliberate split. When a group folds over no rows, the aggregators divide by role, and this is the documented behavior the double-entry example relies on:
sum/count/count_distinctemit a value for the empty group —0. An additive/cardinality fold has a well-defined identity (the empty sum is zero, the empty count is zero), so the grouped row is produced with that identity.min/max/avgdrop the row — they have no value over an empty set (no least/greatest element; the mean is0/0), so no grouped row is produced rather than fabricating one.
This split is load-bearing for the double-entry invariant (§4): sum(debits) == sum(credits) must
catch an entry with credits but no debits. Because sum emits 0 for the empty debit side, the
comparison is 0 == credits — which fails and flags the entry as unbalanced. Were sum to drop the
empty group, the row would vanish and the imbalance would pass silently. The behavior is fixed in
fold_aggregate (compiler/crates/oxc-reasoning/src/executor/eval.rs) and pinned by a test asserting
BOTH halves (empty_group_split_sum_count_emit_zero_minmaxavg_drop).
2.5 Semantics and stratification
- Stratification. An aggregate over a derived predicate requires the aggregated predicate stratified strictly below the rule (Faber–Pfeifer–Leone). This is already enforced: an in-SCC aggregate edge is recursion-through-aggregation and refuses loudly with OE1317 (issue #174). No change.
- WFS interaction (#250). Aggregates evaluate over the definitely-true extent. If the
aggregated atom set draws on a relation with a non-empty well-founded-undefined companion
(
$undefined::R), the fold would silently treat undefined as false — the #250 leak. We refuse the rule loudly at runtime for now (OE1332 /ReasoningError::AggregateOverUndefined) rather than fold undefined-as-false. #250 is the designed follow-up (three-valued aggregate intervals); this RFD does not close it, it stops the silent leak.
2.6 Exact-Decimal mutate-body path (R-M4)
The §7 status note (“mutate-body arithmetic is Real/f64; exact Decimal pending”) is false at
HEAD and is removed. Mutate-body arithmetic (require guards, let/field expressions) already
routes through eval_binary_values — the canonical exact core (RFD 0016) that the #272/#277 work
made the single arithmetic path: pure-Int stays checked Int, any Real/Decimal operand
promotes to exact BigRational, / is the exact field operation, no f64 in the arithmetic path.
Money (carried as a structured CBOR value) folds through the same rational core. We delete the
stale note and flip the §7 status row to ✓/✓, and add a money-arithmetic regression that proves
0.1 + 0.2 == 0.3 exactly in a mutate body.
2.7 Rounding (R-M5)
A builtin function family, available in both rule-body expressions and the fn/compute plane,
exact-tower in/out (Decimal stays Decimal, never via f64):
| Builtin | Meaning |
|---|---|
round(x) | nearest integer, ties away from zero |
round(x, n) | nearest multiple of 10^-n, ties away from zero |
round_half_even(x, n) | nearest multiple of 10^-n, ties to even (banker’s rounding) |
trunc(x, n) | toward zero at 10^-n |
round_half_even is the money default — financial rounding rounds half to even to avoid the
upward bias of half-away-from-zero. Documented in the book’s money section. All four operate on
BigRational and return an exact value (integral results collapse to Int, as elsewhere in the
tower). round / round_half_even / trunc are reserved builtin names: they resolve by surface
name in both the rule plane (compile_expr) and the fn/compute plane (resolve_term_to_value), so a
user-defined fn round does not shadow them. A wrong-arity call is OE1333 (RoundingBuiltinArity).
2.8 CLI/demo harness decimal arguments (R-B10, harness half)
demo.toml mutation args accept decimal values for Decimal/Money/Real params, two spellings:
- A quoted decimal string via an explicit table form
{ decimal = "0.22" }/{ money = "100.50" }— exact, parsed directly to the rational carrier (never via f64). This aligns with serve’scoerce_money_argconvention (a Money amount is a decimal string). - A bare TOML float (
rate = 0.22) — converted via its shortest round-tripping decimal representation (the same value the user typed), then parsed exactly. The conversion is documented: a bare float is read as the shortest decimal that round-trips to that f64, so0.22becomes exactly22/100, not the f64 artifact0.2200000000000000011….
The serve-side wire half of R-B10 (decimal coercion at the HTTP boundary) belongs to arc2 and is not touched here.
3. Diagnostics
New codes (allocated next-free in the 13xx rule band, grammar.toml as truth, cargo xtask gen):
| Code | Name | Meaning |
|---|---|---|
| OE1331 | ValueAggregateBraceForm | A value aggregate (sum/min/max/avg) was written in brace form sum(expr){…}; directs to the comprehension form sum(expr for x in S, …). |
| OE1332 | AggregateOverUndefined | An aggregate folds over a relation with well-founded-undefined atoms; refused rather than silently treating undefined as false (#250 leak guard). The refusal is whole-relation, not per-group. |
| OE1333 | RoundingBuiltinArity | A rounding builtin (round/round_half_even/trunc, §2.7) was called with the wrong number of arguments. |
| OE1334 | BindingLhsNotSimpleVar | A binding atom x = expr (§2.1) has a non-fresh-variable LHS — a dotted projection (x.f = e) or a ::-qualified path (m::x = e); a binding’s LHS must be a bare identifier. |
| OE1335 | BindingLhsAlreadyBound | A binding atom x = expr (§2.1) names an LHS x already bound by a positive predicate, a head param, a projection, an aggregate result, or an earlier binding; the = would silently act as an equality filter. Use == to compare, or pick a fresh name. (Freshness path; see §2.1.) |
| OE1336 | BindingInComprehension | A binding x = expr appears as a comprehension trailing atom (§2.3), which is not a trailing-atom form; directs to project into the fold directly or bind in the outer body. |
The Rust error variant for OE1333 is
RuleCompileError::RoundingBuiltinArity, renamed fromBuiltinArityto match the catalog name (only the rounding builtins reach this arity check).
Extended:
- OE1303 (
RuleNotRangeRestricted) — wording extended to name the binding formx = expras a binding source and as a consumer of its RHS variables. - OE0007 (
GroupByNotExecuting) — message extended with the binding-form equivalent hint.
4. Worked example: double-entry accounting
examples/double_entry_v0 is the journey’s exact wall made a running proof: accounts, journal
entries with debit/credit lines, the balance invariant sum(debits) == sum(credits) as a check,
and a trial-balance query that sums per account. It is corpus-pinned (an integration test asserts
its extents) and demo.toml-driven.
5. Lean obligations
The reasoner is the layer where Rust leads (AGENTS.md): well-founded semantics, factorized
aggregates, and the join machinery execute in Rust ahead of the Lean fixpoint. The binding atom and
aggregate-source extension are evaluation-layer changes that ride the existing AtomIR::Compute /
AtomIR::Aggregate constructors — already in the @[language_interface] wire shape — so no drift-gate
change is required. Mechanizing the body-binding and grouped-aggregate semantics is added to issue
#134’s catch-up scope (the Reasoning/Datalog/ layer), not mechanized in this PR.
6. Disposition against the register
| Item | Disposition |
|---|---|
| R-B2 (derived value head) | x = expr binding atom → AtomIR::Compute; head carries the bound var. LHS freshness enforced (OE1335) — a rebind-as-filter refuses loudly. |
| R-B3 (compare two aggregates) | a = sum(…), b = sum(…), a == b via the binding atom. |
| R-B4 (aggregate sources) | Comprehension extended: comma-separated trailing filter atoms + relation-atom sources; brace value-aggregate refused (OE1331). Trailing bindings are not a form — refused with a directed hint (OE1336); project into the fold or bind in the outer body. |
| R-M4 (exact mutate arith) | Already exact via eval_binary_values; stale §7 note removed; money regression added. |
| R-M5 (rounding) | round/round(_,n)/round_half_even/trunc builtins, exact tower, banker’s-default. |
| R-M7 (grouped aggregation) | Grouping = outer bound vars (Datalog); group by … having refuses with binding-form hint. |
| R-B10 (harness half) | demo.toml accepts { decimal/money = "…" } and bare floats (shortest-decimal). |