Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD 0021 — The reasoner execution engine (as built)

  • State: committed
  • Opened: 2026-06-06
  • Decides: the concrete query-execution engine of oxc-reasoning — the realization of RFD 0020’s Phase 2 (executor unification, the join engine, the optimizer) and the reasoner-side of Phase 3 (graph-native operators, factorization). It records the decisions settled by building: the execution model, the join algorithms, the optimizer passes, the Map operator, the factorization layer, the physical-relation layer, the incrementality discipline, and the correctness methodology. It also records two framing corrections that building surfaced — to RFD 0018 D3 (“fork Feldera”) and to the meaning of “full BYODS.” Built and merged across PRs #98, #102, #103, #104, #105, #108, #111, #114, #116, #118, #121, #122, #124 (and the forall triage #120).

This RFD records what was built and why. RFD 0020 is the umbrella vision; RFD 0018 is the incrementality plan; this is the engine that exists on main. It is Lean-first where it touches reasoning semantics: the executor conforms to the strict-stratified perfect model (spec/lean/Argon/Reasoning/Fixpoint.lean) and is held to it by differential testing against the semi-naive oracle. The rest — join algorithms, physical layout, optimizer, factorization — is engine architecture the Lean does not mechanize (per AGENTS.md scope), settled from first principles.


Question

RFD 0020 set the composable-engine vision but, at the time, the pipeline was “designed, not wired”: LogicalPlan/optimizer/PhysicalPlan/TierExecutor were orphaned, SemiNaiveExecutor::execute was a stub, the live path was AtomIR → CompiledRule → evaluate_to_fixpoint directly, storage was decode-per-tuple BTreeMap, and there were no graph-native operators, no computed-term evaluation, and no factorization.

How is the reasoner’s query-execution engine actually realized — execution model, join algorithms, optimizer, computed terms, factorization, physical-relation layer, and the incrementality discipline — and what concrete decisions did building it settle?


Context

  • RFD 0020 (umbrella) chose: one composable IR; graph-native operators (D5); BYODS (D6); factorization in v1 (D7); a rule-based optimizer (D8); tier dispatch (D9); CQRS storage (D10).
  • RFD 0018 (incrementality) chose: DBSP as the recursive-tier engine, originally forking Feldera dbsp (D3); AFT pair-encoded Z-sets (D1); arrangements on InternalId (D7); semi-naive as oracle (D9). Its Phase-0 spike de-risked IVM (<1 ms incremental at 10 M facts).
  • The constraint that shaped every slice: a correctness-first, no-placeholder, no-hollow discipline — each optimization had to be the genuine mechanism (not a known-incomplete shortcut), validated against an independent oracle, with a correct fallback for the cases it doesn’t yet handle.
  • A coordination seam with the parallel write-path track: the RelationCatalog public API (get/get_mut/ensure/insert/iter + the CatalogEntry.relation field). The reasoner owns the internal eval/join/representation; the write path consumes via the API.

Decision

D1 — Execution model: CompiledRule is the executed form; the “operator pipeline” is the set of operators it evaluates (#98)

The semi-naive CompiledRule evaluator (executor/eval.rs) is the executed form, routed through Engine::evaluate + ConvergencePolicy dispatch (the RFD 0003 seam made real; the six magic 1000s removed; SemiNaiveExecutor::execute real). RFD 0020’s “operator pipeline” (D2/D4) is realized as the set of operators the executor evaluatesPredicate (join), Comparison (filter), Naf (anti-join), Aggregate, Compute (Map) — not a separate operator-tree interpreter. Why (build-correctly-once): an operator-tree LogicalPlan that round-tripped back to the CompiledRule executor would be throwaway scaffolding the moment an operator-tree executor arrives; the optimizable LogicalPlan IR is reserved for that executor when graph-native physical operators + factorization genuinely demand it. The dead logical/physical/optimizer scaffolding stays reserved, not wired.

D2 — The join engine: a Free-Join hybrid over a polymorphic physical layer

Joins dispatch by body shape, all anchored to the binary binding-extension evaluator as the differential oracle:

  • Indexed joins (#102) — arrangement-by-bound-key, tuples decoded once at build time. Replaced the O(|prior| × |rel|) decode-per-tuple nested scan. Output ordering byte-identical → a pure drop-in.
  • Persistent arrangements (#104) — arrangements cached across a stratum’s semi-naive iterations, evicted after each merge for exactly the relations whose extent changed (merge_into_state is the sole mutation point). Staleness is structurally impossible; stable relations (EDB, earlier strata) build their index once.
  • Worst-case-optimal join (#114, time-optimal #116) — for pure positive-predicate cyclic bodies (the triangle), a variable-at-a-time Generic Join over per-atom prefix tries: at each variable, drive the multiway intersection from the smallest candidate set and probe the others by membership (#116 — the leapfrog discipline that makes it O(M log N), not O(N²); #114 alone was memory-optimal but time-O(N²)). Binary stays for acyclic/linear-recursive bodies — the Free-Join hybrid. Gated by atoms ≥ distinct vars (perf-only; WCOJ is correct for any conjunctive body). Differential-tested identical to binary.
  • BYODS / CSR index-free adjacency (#118) — the arrangement is polymorphic (enum { Bucketed, Csr }, RFD 0020 D6). A binary relation keyed on a single column (the edge-traversal pattern) builds a CSR adjacency (sorted sources + contiguous neighbours); probing is a binary search + a contiguous slice. Built ephemerally from the canonical BTreeMapCatalogEntry.relation is unchanged, so coordination-free.

D3 — The optimizer: SIP reorder + projection-collapse (#103, #105, #122)

In Engine::evaluate, each rule body is reordered (optimizer/reorder.rs): filters first, then the most-constrained predicate (most already-bound / constant argument positions), then aggregates, with a cardinality tie-break (smaller non-rule-head relation first; a rule-head relation’s pre-eval size is unknown → deferred, never falsely “smallest”). Safety is structural — a greedy that only ever places an atom whose consumed variables are already bound (a correct over-approximation excluding NAF/aggregate-locals). Semantics-preserving.

Projection-collapse (#122) — eager projection-pushdown in the naive pass: after each atom, variables that are now dead (bound so far, but not in the head and not referenced by any later atom) are cleared and the binding set deduped, so a projected-away fan-out (HasBoth(p) :- hasPhone(p,ph), hasEmail(p,e) → stays at |distinct p|, never |ph|×|e|) collapses without enumeration. Result-preserving.

D4 — The Map operator: computed terms (#108, #111)

CompiledExpr + CompiledAtom::Compute evaluate computed scalar terms (exact-BigRational arithmetic with Int-collapse, comparisons → Bool, &&/||) — what CompiledTerm (variable/constant only) could not represent, the gate behind the issue-#56 “won’t evaluate” wall. The drift-gated AtomIR::Compute substrate node (Lean + protocol, the write-path track) lowers via compile_expr. The Map operator is linear → trivially incremental. Non-scalar terms (field projection, application) are a loud compile error, never a silent drop.

D5 — Factorization: the f-representation for the high-value cases (#121, #122, #124)

Realized for independent-factor bodies (groups sharing no body-local variable), computed over the factorized form rather than the enumerated cross-product:

  • count (#121) = per-group sizes.
  • projection-collapse (#122) — see D3 (the main-eval analog).
  • value folds (#124) — a value fold whose single projection variable lives in one group folds over that group; the others scale (sum: × ∏ other sizes) or gate (min/max/avg/count_distinct: repetition-invariant, gated on the others being non-empty).

Connected bodies, constant/outer-bound projections, and nested factorization stay on the correct flat path — real follow-ups, not silent shortcuts. Every path is differential-tested identical to the flat fold. This is the genuine f-representation: the general FBindings representation (deferred) would compute these the same way — these generalize, they are not placeholders.

D6 — Incrementality is an additive outer loop, not a rewrite — the DBSP model, not a Feldera fork (revises RFD 0018 D3)

Every operator is a pure Z-set → Z-set function: linear (commutes with the delta operator — Map, Filter, projection) or with an explicit delta-form over arrangements (the join product rule, antijoin’s signed-weight form). The substrate already is the incremental substrate: Relation = BTreeMap<tuple, i64-weight> is a Z-set, the persistent arrangements are DBSP’s integrated indexed state, and the semi-naive delta loop is the join product rule applied across iterations. Therefore cross-mutation IVM is an additive outer loop (integrate/differentiate at the boundaries + delta-seeded recursion), never an operator rewrite. Holding this discipline as each operator is built is what guarantees it.

This revises RFD 0018 D3 (“fork Feldera dbsp”) → “adopt the DBSP model; build our own operators.” Feldera’s flat binary-join Z-sets are the wrong substrate for the factorization and WCOJ that RFD 0020 D7 commits to for v1; grafting them onto Feldera is swimming upstream. We build our own operators on the DBSP model and keep Feldera + the semi-naive evaluator as differential oracles. RFD 0018’s Phase-0 spike still stands — it de-risked IVM-as-approach, not Feldera-the-codebase.

D7 — Cross-query reuse needs the persisted read-model, not a CatalogEntry.relation rep-swap (corrects the “full BYODS” framing)

The catalog is rebuilt per query (query_derive → materialize_predicates → RelationCatalog::new(); the Store holds no materialized catalog; query results are cached, the catalog is not). So making CatalogEntry.relation a persistent dense-InternalId representation buys nothing across queries — it dies with the catalog — and within-query reuse is already captured by the persistent arrangements (#104) + CSR (#118). The genuine cross-query-reuse and incrementality win is to persist the materialized read-model on the Store and maintain it incrementally on mutation — the CQRS persisted read-model + IVM (RFD 0020 D10, RFD 0018 Phase 4) — a storage-and-write-path-coordinated effort, not a reasoner representation change.

D8 — Correctness methodology: the independent oracle

Every optimization is validated against an independent implementation that must produce identical results: the binary/flat evaluator is the differential oracle for indexed joins, WCOJ, CSR, projection-collapse, and factorization; the semi-naive evaluator is the oracle for the eventual DBSP/IVM. Library code has no unwrap/expect/panic; BTreeMap/BTreeSet only; every PR is fmt + clippy clean and cargo nextest-green. This is why the engine could be transformed under load without regressions, and why a structurally-impossible-staleness or byte-identical-output argument backs each slice.


Rationale

  • Build-correctly-once over speculative IR (D1). The optimizable operator-tree LogicalPlan is real architecture, but wiring it to round-trip through the proven CompiledRule executor would be scaffolding discarded the moment an operator-tree executor lands. Optimizing the executed form and deferring the tree to its real consumer is the honest sequencing.
  • The oracle is the load-bearing safety property (D8). WCOJ, CSR, factorization, and projection-collapse are intricate and easy to get subtly wrong; anchoring each to the binary/flat evaluator turned “is it correct?” into a test. (#116 exists because the oracle-and-complexity analysis caught that #114 was memory-optimal but time-O(N²).)
  • The Z-set discipline is what keeps incrementality cheap to add (D6). Because the substrate is already a Z-set with arrangements, and every operator is a differentiable Z-set function, IVM is a wrapping, not a rewrite — which is precisely why forking Feldera (whose model can’t carry factorization/WCOJ) is the wrong trade.
  • Facts beat framings (D7). “Full BYODS for cross-query reuse” sounded right until the per-query catalog rebuild was verified; surfacing that prevented a large, coordinated, marginal-value rep-swap.

Alternatives considered

  • Wire the operator-tree LogicalPlan interpreter now. Rejected (D1): the LogicalPlan → CompiledRule round-trip is throwaway once an operator-tree executor exists, and the round-trip for aggregates/NAF is complex; defer the tree to its real consumer.
  • Fork Feldera dbsp (RFD 0018 D3 as written). Rejected (D6): flat binary-join Z-sets are the wrong substrate for factorization/WCOJ; adopt the model, build our own operators.
  • “Full BYODS” = make CatalogEntry.relation a persistent rep. Rejected (D7): the catalog is rebuilt per query, so it delivers no cross-query reuse; the real win is the persisted read-model.
  • WCOJ everywhere / always-on factorization. Rejected: WCOJ regresses acyclic bodies; factorization helps only specific shapes. Both are gated, with the proven path as the default and fallback.

Consequences

Built and on main: a query engine that is expressive (computed terms), optimized (SIP + cardinality reorder, indexed + persistent arrangements, projection-collapse), worst-case-optimal on cyclic graph patterns, index-free-adjacency-capable (CSR), factorized (count / projection / value-folds), and incrementality-ready (every operator a pure Z-set function). All correctness-first; the forall over-derivation was triaged to the lowering with a validated count-equality fix recipe (#120, handed to the write-path track).

Deferred (each a real follow-up, not a gap in what’s built):

  • delta-path projection-collapse (recursive rules) and nested factorization;
  • the general FBindings representation flowing through the eval — gated on real-workload evidence (no real models exist yet, so the winning-shape frequency is unknown);
  • the persisted read-model + IVM (D7) — the cross-query-reuse and incrementality payoff, the next big coordinated effort;
  • WFS / SLG for recursion-through-negation (OE1309 under strict stratification) — a real-model driver (RFD 0018, Gustavo’s breach calculus);
  • cardinality statistics for cost-based / per-stratum reorder.

Coordination: the RelationCatalog public-API seam stays the boundary with the write-path track; the catalog-owned (persistent) BYODS representation, when built, is the coordinated slice (touches CatalogEntry.relation).


Open questions / tracked-future

  • When does the general FBindings representation earn its cost? It is the uniform home for factorization, but it is the largest operator-model change; building it ahead of a real workload that exhibits the fan-out shapes would be over-engineering. Decide when real models exist.
  • The persisted read-model + IVM — the design (CQRS read model on the Store, generation-driven invalidation, incremental maintenance with retraction) is RFD 0020 D10 / RFD 0018 Phase 4; it is the next major, coordinated effort and warrants its own RFD when started.
  • Recursion-through-negation — needs WFS (the SLG kernel, RFD 0018 D11); strict stratification correctly rejects it (OE1309) today.