RFD 0018 — Production reasoner: the incremental DBSP engine

State: accepted — partially implemented (WFS executor + Engine::evaluate shipped; DBSP fork / persisted-IVM not yet)
Opened: 2026-06-05
Decides: the concrete realization of RFD 0003’s DBSPExecutor — the data model, evaluation engine, storage model, provenance, and invalidation that take Argon’s reasoner from a correct demo (cold, non-incremental, nested-loop semi-naive) to production-grade (“working and usable” at scale). Resolves RFD 0003’s deferred open question (“Stratum boundary policy with retractions → defer to the DBSP integration RFD”). Fixes the scope cut (RP-009 §4) and answers RP-009’s open questions (§9). Scopes the ARS substrate research record (vault ars-substrate R-3.1–R-3.8) down to what we build now vs design-for vs defer.

This RFD is the architecture decision record for the production reasoner. It is Lean-first where it touches semantics (the engine conforms to spec/lean/Argon/Reasoning/; divergence is a bug). It commits a plan, not code; its Phase 0 is a de-risking spike that gates the central engineering bet before any irreversible code lands.

Question

RP-009 mandates: bring the rule engine from correct MVP to production-grade, because it is the foundation the entire language sits on. RFD 0003 settled how multiple backends compose under one Engine/TierExecutor interface, named DBSPExecutor as the incremental recursive-tier backend, and explicitly deferred the engine itself — its data model, its IVM/retraction semantics, its storage and invalidation — to “the DBSP integration RFD.” This is that RFD.

Concretely: What is the production engine? — (1) the value/data model that lets a bilattice (Truth4) truth domain ride an incremental dataflow substrate that needs an abelian group; (2) the evaluation substrate (build vs fork vs wrap, and which); (3) how facts are stored and read at scale (the event log is the source of truth, but a cold per-query re-materialization is O(database) and dies at scale); (4) how derived state is incrementally maintained, persisted, and invalidated; (5) how provenance composes with incrementality; (6) how it conforms to the Lean; and (7) the scope cut + build order that gets us there correctly the first time, designed so the features we’ll need soon (defeasibility, metric-temporal, well-founded semantics over cycles) integrate neatly without a rewrite.

Context

Current state (verified against `origin/main` @ b7a24bc)

The engine is a sound, well-tested, semi-naive stratified-Datalog evaluator — correct on the Datalog + stratified-NAF + exact-rational-aggregate fragment, at small scale. It is not production-grade. The load-bearing gaps:

Cold + non-incremental. oxc-runtime::query_derive calls materialize_predicates(module) (a full scan of all IofAssertion/RelationTuple/IndividualPropertyAssertion events into a fresh RelationCatalog) then evaluate_to_fixpoint from scratch — every query. No derived state persists across queries; no incremental update on mutation. Every query is O(all events) + O(rules × facts × iterations), cold. This is the headline scaling failure.
Nested-loop joins, decode-per-tuple. extend_bindings/unify iterate every tuple of a relation and decode_tuple (CBOR) it inside the innermost join loop; Relation<Vec<u8>> = BTreeMap<CBOR-full-tuple, weight>. No join-key index, no arrangement (arrangement_body is reserved + inert). The dominant cost is O(iterations · rules · |prior| · |rel|) CBOR decodes.
Executor split. The live path is the free fn executor::eval::evaluate_to_fixpoint; SemiNaiveExecutor::execute is a stub returning an empty catalog; the RFD-0003 TierExecutor trait and the DataFusion-shaped logical/physical/optimizer layers are orphaned scaffolding.
Generation counter wired to the wrong cache. oxc-storage-pg has runtime_generations (bumped atomically with appends, invalidating a projection cache), and oxc-serve reads it — but only to memoize the hydrated Store (raw replayed events), not derived reasoning state. The §19.6 runtime-Salsa db (OxcRuntimeDatabase/EventLogInput) does not exist (oxc-runtime doesn’t depend on salsa). So there is no derived-state invalidation today.
Correctness items. Modal box/diamond erase to the inner atom — unsound for anti-rigid types (the Lean proves box(anti_rigid) → false in StaticDischarge.lean). The 1000-iteration cap is a hard Err (not silent), but is a literal at every call site, not a configured convergence policy.

What already works and must stay green: stratified Datalog + NAF; exact-rational sum/count/min/max/avg/count_distinct folds (RFD 0011/0016); the keystone end-to-end tests (oxc-runtime/tests/keystone.rs); Apt-Blair-Walker stratification (compile/stratify.rs, Tarjan SCC); and — partially — a Governatori three-stratum defeasible evaluator and per-standpoint federated Truth4 evaluation.

What the substrate requires (Lean-first conformance bar)

The only fully-mechanized reasoning semantics is strict-stratified perfect-model evaluation (Reasoning/Fixpoint.lean: Cat1-monotone to fixpoint, then Cat2-NAF once, per axis in topological order; Theorems terminate/unique/stable, zero sorry). Consequences for the engine:

Order-independence (stratified_fixpoint_unique): within-stratum rule/axis order is semantically irrelevant ⇒ any strategy converging to the same perfect model is conformant — DBSP, semi-naive, indexed, all sound. Engine choice is a performance decision, not a correctness one.
NAF order-sensitivity (cat2Apply_sublist): Cat2 extension is monotone only under List.Sublist; the engine must preserve relative Cat2 rule order across deltas.
Acyclicity is necessary (Necessity.lean): cyclic axis-dependency ⇒ non-unique. Cycle rejection (OE1309) is correct, not conservative.
Modal: box(anti_rigid) → false is proven and must be honored (or the case honestly refused).
WFS-by-default, #[brave] stable models, the defeasibility proof-tag engine, DatalogMTL operators, and aggregate OWA-intervals are book-promised but not Lean-backed — they are the “design-for / defer” set, not the conformance floor.

What the north-star application needs (RP-009 §4, the residential-lease + accounting overlay)

A shallow, narrow rule program (≈28 derive rules, one true positive recursion — BreachedAt over the propositional-content tree, with stratified not Met — one NAF pipeline). Its load-bearing needs: recursive stratified Datalog with field-navigation joins; aggregates in derive bodies incl. aggregate-comprehension where with rule-atom/predicate filters; stratified NAF; function application + projected-field terms as rule-atom arguments; an Allen-interval + date-arithmetic builtin; forall; denial constraints. It does not today exercise WFS-over-cycles, standpoints, or modal operators, and it models legal exceptions monotonically (conditional propositional content), not via rule defeat. But defeasibility and metric-temporal reasoning are confirmed near-term needs — design for them now, build them next.

Prior research: input, re-derived (not authority)

Per AGENTS.md the vault is research, not authority — the decision below stands on this RFD’s own rationale; the prior record is reconciled and re-derived, not deferred to. The vault ars-substrate program produced a decision record (R-3.1–R-3.8): a two-strategy architecture — DBSP for the bottom-up recursive tier (R-3.3, Feldera dbsp / differential-dataflow), SLG (chalk-engine-shaped) for top-down WFS/upper tiers (R-3.1), bilattice answers via the AFT pair construction (R-3.2), Salsa above both (R-3.4), provenance as a side-track (R-3.6). Four validation passes against that record surfaced the decisive fact:

The single most load-bearing unproven claim — recursive WFS riding DBSP via coupled alternating-fixpoint circuits, which the research itself says has no published precedent — is co-extensive with the WFS-over-cycles feature we are deferring. Everything our scoped target needs (positive recursion, stratified NAF, stratified aggregates, defeasibility’s Maher three-stratum compilation, stratified-NAF DatalogMTL) sits in DBSP’s proven, standard fragment.

So “build it correctly once” is more achievable than RP-009 implies: the scary part is the part we defer, and the AFT pair-encoding keeps the door open to add it (via SLG) without a data-model rewrite.

Decision

A new module realizing RFD 0003’s DBSPExecutor, plus the storage/invalidation/provenance substrate around it. Twelve decisions:

D1 — Value model: AFT pair-encoded Z-sets (the keystone)

Each Truth4 atom is a pair of ℤ-weighted Z-sets: tt (told-true / evidence-for support) and tf (told-false / evidence-against support), in the Belnap evidence-pair encoding (is = (1,0), not = (0,1), CAN/U = (0,0), BOTH = (1,1); a coordinate is present iff its support weight is non-zero). Each stream is a standard Z-set over an abelian group, so DBSP’s incremental machinery (chain rule, distinct, recursion via δ₀/∫) applies per-stream unmodified. Bilattice operations are per-coordinate: info-meet ⊗ = pointwise AND/min; info-join ⊕ (federation → BOTH) = pointwise OR/max; truth meet/join = the Belnap 4×4 table per atom; negation swaps the two streams (the sole cross-stream coupling). This gives, for free: K3 per-standpoint (⊕ statically unreachable within a standpoint; a clash is a diagnostic), FDE BOTH only at the federation boundary — matching the mechanized K3-per-standpoint / FDE-at-federation split (Foundation/Bilattice.lean, Standpoint/Federation.lean; per-standpoint state in Reasoning/State.lean) exactly. Cost ≈ 2× monotone Datalog (3× with symmetric T/F provenance).

Lean gate (D1), discharged. The value-level correspondence is mechanized in Standpoint/PairEncoding.lean (zero sorry/sorryAx; audited): encode/readoff form a bijection, and neg = stream-swap, ⊕ = pointwise-OR, ⊗ = pointwise-AND, plus federate evaluated in the pair representation reads back to the canonical Truth4 fold (readoff_pfederate). Mechanizing this corrected the encoding: the operations above (swap / pointwise OR-AND) hold under the evidence-for/against labels here, not the Fitting consistent-pair labels (T = (1,1), …) an earlier draft printed — under which neg is not a pure swap and ⊕ is not pointwise-max. (This is the for/against form the Phase-0 spike already used.) The ℤ-multiplicity→support bridge for the assert-only fragment, and the retraction case, are the separate D5 obligation.

This is the keystone because it is also the shared substrate for the deferred SLG kernel (R-3.1): SLG’s answer-subsumption over a 2-D lattice is literally this pair. Adding WFS later is a new fixpoint strategy over the same state — not a rewrite.

D2 — Three time axes, kept strictly separate

Transaction/circuit time = the IVM stream index (each kernel commit is a Z-set delta: +1 assert / −1 retract, paired by axiom_id).
Valid time (VT) = a payload column on the Z-set key, never the circuit clock. Derived VT = intersection of body-atom VTs; derived TT = materialization tx-time.
Transaction time (TT) = the durable audit axis (tx_from/tx_to) for “AS OF” reads.

The tuple representation is interval-aware from day one even though metric-temporal ships later (D11), because retrofitting intervals is a rewrite. Metric-temporal reasoning is a separate DRedMTL maintainer over interval-set deltas (no engine implements interval-Z-sets), not a modified DBSP circuit.

D3 — Engine: fork Feldera `dbsp` (own it; do not wrap, do not fork differential-dataflow)

Revised by RFD 0021 D6. The in-memory reasoner adopts the DBSP model (every operator a pure Z-set→Z-set function, so incrementality is an additive outer loop) but does not fork Feldera’s dbsp crate: its flat binary-join Z-sets are the wrong substrate for the WCOJ / factorization / BYODS representations 0021 builds on. The Phase-0 spike below still stands as evidence that the model decouples latency from DB size; what changed is the substrate it runs on.

We fork Feldera dbsp (Apache-2.0) — the Lean-verified Z-set algebra itself — and own it under oxc-reasoning::executor::dbsp. We do not fork differential-dataflow/timely: DD’s value is its distributed/multi-worker machinery, which we would immediately strip on a single-node per-tenant engine. (This refines RFD 0003’s DBSPExecutor row, which said “differential dataflow / timely” — that was the stale default.) Strip: timely exchange/workers/progress-tracking, distributed coordination, any SQL frontend. Keep: the sorted Z-set traces/arrangements and the operator set (map, filter, join, antijoin = stratified NAF via distinct(I₁ − I₁⋈I₂), distinct, consolidate, iterate). Build on top: the D1 pair-encoding, the D2 interval payload, the D5 provenance side-track.

From kora-reason-rl (MIT/Apache, Argon-family — kora-reason-rl is itself “forked from the Orca kernel”) we borrow (lift into our tree, BTreeMap-convert, attribute), not depend on: its bit-parallel Warshall transitive-closure fast path (tc.rs/bitmatrix.rs) as a tier-dispatched operator; its forward/recursive derivation-tree provenance; its stratification (as a cross-check of ours); and its DatalogMTL→time-guarded-Datalog compilation approach (for D11). We skip its incremental.rs (DRed — over-delete+rederive, strictly inferior to DBSP exact deltas, falls back to full re-materialize on additions) and its hand-rolled datafrog_eval.rs (except as a differential-test oracle). This realizes RFD 0003’s “vendor Kora, own it” decision, scoped to the genuinely-reusable pieces.

D4 — Storage: CQRS — event-log write model + persisted, graph-optimized read model

Adopt the event-sourcing / CQRS split explicitly:

Write model = the append-only event log (axiom_events) — the single source of truth (audit, time-travel, bitemporal). It is the Z-set delta stream the engine consumes. Append scales; scanning to re-materialize is what dies — and that is exactly what D6/D7 eliminate.
Read model = a persisted, indexed, graph-optimized materialized representation — the durable form of the DBSP arrangements (D7). In Phase 1 it is in-memory; Phase 4 persists it so cold-start does not replay the whole log. Its layout borrows graph-database storage techniques (CSR / index-free adjacency, native edge stores) for traversal-heavy relations — Argon’s data is a graph (individuals + relation-edges).

This commits the scaling contract: no O(database) operation on any hot path, at any scale — mutation→query is O(Δ) (D6); cold-start is O(read the persisted read model), not O(replay log) (Phase 4). It aligns with the spec §20 reserved CQRS catalog tables and the StorageBackend “per-tenant LSM, XTDB-inspired, recency-sharded” slot. The StorageBackend trait is a clean seam now (Phase 1) so the persisted read model slots in without a reasoner rewrite, even though its implementation is Phase 4.

D5 — Provenance: dual-track (mandatory)

Provenance cannot ride the IVM Z-set stream (PosBool is an idempotent semiring with no additive inverse — Amsterdamer 2011 — so it breaks DBSP’s chain rule; answer-subsumption and variant dedup also erase per-derivation identity). So: the Z-set stream carries counting-multiplicity (ℕ) for IVM; why-provenance lives in the separate append-only derivation log (the existing D-097 PosBool-DNF column on axiom_events), joined by the erasing homomorphism ℕ[X] → PosBool[X]. Cheap (O(1)/conjunct: append-on-assert, remove-by-witness-on-retract, empty-DNF ⇒ retracted); bilattice = per-coordinate (separate T-witness and F-witness DNFs); each conjunct carries its own VT interval. Commit dual-track from day one — it is nearly free given the event log exists, but retrofitting onto a single-track answer table is expensive.

D6 — Incrementality + invalidation: Salsa above, generation-driven, one coherent path

Salsa sits above the engine (session granularity, per-(tenant, fork); the engine owns its internal incrementality). Build the §19.6 runtime OxcRuntimeDatabase with EventLogInput + the u64 generation counter as Salsa inputs, route query_derive through it, and unify it with oxc-serve’s hand-rolled runtime_cache so there is exactly one invalidation path: event-log delta → DBSP circuit (warm arrangements) → Salsa session cache, with the generation bump (already atomic with appends in oxc-storage-pg) as the single invalidation signal. “Warm derived state across queries” (Phase 1) is the prerequisite milestone; “warm across restarts” (Phase 4, via D4’s persisted read model) follows.

D7 — Tuple/index layout: typed, sorted arrangements on `InternalId`

Replace BTreeMap<CBOR-Vec<u8>, weight> + decode-per-tuple with typed, decode-once, partial-key-indexable arrangements (Materialize-style immutable sorted batches + trace) keyed on the existing InternalId (8-byte NonZeroU64, [kind:8 | partition:16 | sequence:40], range-scannable). All layouts are sorted (B-tree / arrangements / interval-trees), honoring the determinism-by-observability rule (RP-006) and the BTreeMap-not-HashMap convention. Automatic index selection (Subotić-style bipartite matching, ~500 LoC) is a compile-time pass portable from the Souffle literature (“steal the ideas, don’t port the C++”).

D8 — Conformance harness: Lean-first + dual-oracle differential testing

The correctness target is the strict-stratified perfect model. The DBSP engine is differential-tested against (a) the retained semi-naive evaluator as an independent oracle and (b) the Lean semantics on generated stratified programs, in addition to keeping the ~600 workspace tests + keystone.rs green throughout. Implement the proven modal discharge (box(anti_rigid) → false) statically; refuse the rest.

D9 — Semi-naive is retained as oracle + cold/one-shot fallback (not retired)

DBSP subsumes semi-naive (its recursive operator is semi-naive internally; a circuit fed the database as one delta-from-empty computes the same bottom-up fixpoint), so the algorithm is never lost. The standalone semi-naive evaluator is kept as: (a) the independent differential oracle (D8) — validating the new engine against an independent implementation is the correct practice; and (b) a registered cold/one-shot fallback executor in the RFD-0003 dispatch (it wins only for query-once-tear-down workloads where arrangement maintenance is pure overhead). It is not grown — all expressivity (D11) lands once, in DBSP. One rule-IR + one Z-set semantics + one RelationCatalog exchange, two backends — not “build twice.” This realizes RFD 0003’s “DBSP preferred; SemiNaive stays as fallback + reference implementation.”

D10 — Executor unification

Collapse the live-eval / TierExecutor-stub split: DBSPExecutor becomes the real recursive-tier path through the RFD-0003 dispatch (registered first, ahead of SemiNaiveExecutor); fold compile/stratify’s stratification into the PhysicalPlan strata so the physical plan is the real execution unit; either move evaluate_to_fixpoint’s body into SemiNaiveExecutor::execute or delete the trait method’s stub. Fix the stale eval.rs “naive” header and replace the literal 1000 caps with a configured convergence/divergence policy (divergence stays a first-class, explained Err).

D11 — Scope cut (RP-009 §4)

Legend: BUILD-NEXT = ships on the stratified bottom-up core in this effort’s own later phases (no new kernel); BUILD-SOON = the next major arc after this effort (the SLG kernel), reached through the RFD-0003 dispatch seam; DEFER = not foreclosed, no current design work.

IN (this effort): recursive stratified Datalog (joins, field-navigation, collection iterators, comparisons); full in-body aggregates incl. aggregate-comprehension where with rule-atom/predicate filters; stratified NAF; function application + projected-field terms as rule-atom arguments; Allen-interval + date-arithmetic builtins; forall; denial constraints (assert ⇒ error); occurrence-typing type-tests; modal discharge soundness; indexing; incrementality; durable+warm derived state; observability; the scaling contract to 10M facts.
DESIGN-FOR-NOW, BUILD-NEXT (ride the stratified bottom-up core; no SLG kernel): defeasibility (Maher three-stratum at the closure tier; proof tags +Δ/−Δ/+∂/−∂ as derived-predicate labellings; defeater chains in the provenance side-track); metric-temporal (DatalogMTL compiled to time-guarded interval-Datalog; a separate DRedMTL maintainer over interval deltas).
DESIGN-FOR-NOW, BUILD-SOON (the next major arc, via the RFD-0003 seam): WFS-over-cycles — the SLG kernel (R-3.1). OE1309 stays reject-with-a-good-error now, architected as the future dispatch-to-SLG trigger. Coupling: ambiguity-propagating defeasibility = WFS-of-translation (Maier–Nute; +∂‖ = well-founded-true), so the SLG arc serves both WFS and the richer defeasibility.
DEFER, don’t foreclose: full Kripke modal; FOL/SMT (unsafe logic); KoraExtensionExecutor (DL via std::owl); standpoint-translation hardening; per-tenant eviction (a 10⁹ / ~2 TB-provenance concern); #[brave] stable models.

D12 — Phased plan, with a gating spike (no phase rebuilds a prior phase)

Phase 0 — de-risking spike (GATE before committing the fork). On a forked/wrapped dbsp: a bilattice-pair-encoded recursive rule (the overlay’s Met-style rule, or reachability) with stratified NAF, over bitemporal-interval-keyed tuples. Acceptance: a post-mutation query does not re-materialize — incremental mutation→query latency decouples from total DB size (latency ∝ |Δ|, not |DB|); cross-stream pair-encoding throughput is acceptable on a real rule mix. If the public dbsp API proves too lossy for the pair-encoding or the interval payload, that is the signal to fork internals (which we do anyway).
- RESULT — PASSED (2026-06-05). Ran on stock dbsp 0.277 (rustc-1.92-compatible), release, 1 worker. (A) recursive transitive closure swept 1K→1M components (10K→10M derived facts): full re-eval grew ≈linearly (9.4 ms→5.78 s) while the steady-state incremental step stayed flat at ~400–850 µs — at 10M facts a mutation propagates in <1 ms vs 5.78 s full (~6,800×), ratio growing with DB size. The decoupling holds. (C) a valid-time [lo,hi) interval carried as a payload column (interval-intersection join) preserved the same flat incremental step (~390–863 µs) — VT rides as data, not the clock (§D2 validated; MTL can compile to interval-Datalog on this base). (B) the AFT pair-encoding (two ℤ-streams) + stratified NAF (antijoin) computed correctly at 1M (active = 500K, T/F disjoint within source, BOTH only at federation) with pair-encoding overhead 1.46× — under the ~2× the construction predicts (§D1 validated). Three Path-A frictions surfaced and reinforce D3 (fork & own): (i) upstream MSRV churn (latest dbsp needs rustc 1.93; we used 0.277); (ii) relations >65,535 records silently read empty without a storage backend configured; (iii) .output() (delta mailbox) reads empty at scale — must use .accumulate_output() (materialized snapshot). A vendored fork eliminates all three by giving us the spine/storage/read-model directly. (Harness + full results live at .local/spikes/dbsp-phase0/ — local-only / gitignored: a throwaway Path-A measurement harness, not shipped and not independently checkable from this tree. The tables and method above are the auditable record; reproduction is three cargo run --release -- {A,C,B} … lines against dbsp = "=0.277.0".)
Phase 1 — the build-once engine core. Forked dbsp IVM + D1 pair-encoding + D7 arrangements (on InternalId) + D10 executor unification + D5 dual-track provenance carriers + D6 runtime-Salsa/generation wiring (unified with serve’s cache) + the D4 StorageBackend seam. Conformance: D8 differential testing. Lean gate (Lean-first): the D5 retraction homomorphism (commutes-with-deletion) and the D1 pair-encoding↔bilattice correspondence are mechanized before the incremental-retract and cross-stream-negation paths they underwrite are trusted (see Consequences/Lean). Bench: mutation→query vs DB size (the headline).
Phase 2 — expressivity on the real engine (unblocks the overlay end-to-end; closes RP-008’s operational-evaluator debt as a by-product): aggregate-comprehension rule-atom/predicate filters, function application in rule bodies, projected-field rule-atom args, Allen/date builtins, modal discharge soundness.
Phase 3 — design-for features: defeasibility (proof-tag propagation) and metric-temporal (interval-Datalog + DRedMTL), each Lean-first + reject/accept tests.
Phase 4 — scale + ops: persisted graph read model (D4) + restart-warmth (checkpoint vs snapshot-replay), observability (structured logging/metrics/slow-query), resource limits, concurrency model, the benchmark suite to the 10M-fact target, IAM in oxc-serve. (Per-tenant eviction enters here only if the scale target rises.)

Each phase is a CI-gated PR sequence, complete + tested + benchmarked — no hollow features; a thing is done only when it runs and is proven.

Rationale

Why DBSP at all (vs indexed-semi-naive-with-caching). The product thesis is “the data system IS the reasoner”: a mutation must make a subsequent query cheap. That is incremental view maintenance, and DBSP is the provably-incremental, Lean-verified Z-set IVM substrate the codebase already committed to (runtime/relation.rs is “the day-one commitment to DBSP-shaped data”; arrangement_body is reserved). Indexed-semi-naive-with-caching gets indexing but not principled incrementality; it would be a stepping-stone we’d replace — i.e., build twice.

Why fork dbsp, not differential-dataflow, and not wrap. Single-node per-tenant means DD’s distributed/timely machinery is surface we’d strip, not value we’d keep; dbsp is the algebra itself with no distribution layer and a verified core (matching the Lean-first bar). Owning the fork (vs wrapping) is required to thread the pair-encoding, interval payloads, and provenance through the operators — and to honor the house constraints by fencing unsafe/HashMap behind a deny.toml-reviewed boundary rather than inheriting them opaquely.

Why the scope is safe to build correctly once. The only unproven piece in the ARS record (recursive-WFS-on-DBSP) is exactly the WFS feature we defer to the SLG kernel; everything in D11-IN is DBSP’s proven fragment. The AFT pair-encoding (D1) is what makes the deferral non-binding: WFS/SLG and the richer defeasibility slot in over the same state, via the same RFD-0003 dispatch seam, with no data-model rewrite. So designing-for-soon costs us a clean trait seam (D6/D10) and a value model (D1) we want regardless — not speculative engine work.

Why CQRS / dual representation. A single append-only log is a fine write model but a terrible read path at scale (re-materialization is O(DB)). Separating a persisted, indexed, graph-optimized read model (the durable DBSP arrangements) from the event-log write model is the standard event-sourcing answer, it is what the spec §20 already reserves, and it is what makes the no-O(DB)-on-any-hot-path contract achievable at all scales.

Why dual-track provenance, from day one. It is forced by three independent facts (group-vs-semiring impossibility, answer subsumption, variant dedup); it is nearly free given the event log already carries the PosBool-DNF column; and retrofitting it onto a single-track answer table later would be a rewrite of the hot path.

Why keep semi-naive. Validating the new engine against an independent implementation — not against itself — is the correct way to earn “completely correct.” Retiring the reference implementation to save tidiness is the wrong trade for a foundation. DBSP subsumes it, so this costs nothing in duplicated expressivity.

Alternatives considered

Indexed semi-naive + cross-query caching (no IVM). Smaller, stays in the BTreeMap world, gets indexing — but delivers caching, not principled incrementality, and would be replaced by DBSP. Rejected as a stepping-stone we’d build twice. (Its good ideas — arrangements, auto index selection — are absorbed by D7.)
Fork differential-dataflow (the ARS-record default Path B). Rejected: its distinguishing value is distributed/timely scale-out we don’t need; we’d fork-then-amputate. dbsp is the cleaner thing to own single-node.
Wrap Feldera dbsp as a black-box dependency (ARS Path A). Rejected as the product path (kept only as a throwaway Phase-0 measurement harness): wrapping can’t thread the pair-encoding/interval/provenance through operator internals, and drags in unsafe/HashMap behind an API we don’t control. “Build correctly once” + the fork-and-strip preference favor owning it.
DRedc as the shipping recursive substrate, DBSP as a v0.3 migration (ARS W5.S2/D-103). This existed as the safe fallback precisely because recursive-WFS-on-DBSP was unproven. Since we stay strict-stratified (where DBSP is proven), DRedc would be a stepping-stone we’d replace — rejected. (This reconciles the R-3.3-vs-D-103 internal inconsistency in the ARS record for our scope.)
Single-track provenance (on the answer row / on-stream). Structurally impossible under the bilattice + group structure (D5 rationale). Rejected.
Build recursive-WFS-on-DBSP now (the novel coupled-fixpoint construction). Unproven, no published precedent; the ARS record itself names it the gating risk. Rejected for this effort — WFS goes to the SLG kernel via the seam.
Event-log only (no persisted read model). The status quo; O(DB) re-materialization; dies at scale. Rejected (D4).
Retire semi-naive. Loses the independent oracle and the cold/one-shot fallback. Rejected (D9).

Consequences

Code structure. New oxc-reasoning::executor::dbsp (the forked, owned engine) registered first in the RFD-0003 dispatch; SemiNaiveExecutor implemented for real and retained as oracle/fallback; the orphaned logical/physical/optimizer scaffolding either wired into the real path or removed; oxc-runtime gains a salsa dependency and the OxcRuntimeDatabase; a StorageBackend trait seam for the future persisted read model; tc.rs/provenance/MTL-compile borrowed from kora-reason-rl under attribution.
Dependencies. The vendored dbsp fork enters via compiler/deny.toml allow-list + an explicit constraint review; unsafe is minimized and fenced (our new code stays #![forbid(unsafe_code)]); HashMap is permitted only in the vendored fork where it cannot affect observable order, justified in this RFD. num-rational/num-bigint already present (RFD 0016).
Lean (Lean-first for new semantics; evaluation-strategy obligations trail). AGENTS.md puts substrate semantics in the Lean-first lane, so the new obligations split by kind:
- Gates the code it underwrites (Lean-first — these are new semantics not covered by the forward perfect-model theorems): (a) the ℕ[X]→PosBool[X] homomorphism commuting with deletion under DBSP’s chain rule (D5) — incremental retract is genuinely new semantics (stratified_fixpoint_unique is the forward model only), so it is mechanized before Phase 1’s incremental-retract path is trusted; (b) the D1 pair-encoding ↔ mechanized bilattice correspondence (representation adequacy for cross-stream negation = stream-swap, and ⊕-only-at-federation = Foundation/Bilattice.lean/Standpoint/Federation.lean) — mechanized before Phase 1 relies on the cross-stream coupling.
- Trails implementation (evaluation strategy — reduces to the already-proven perfect model, sound for any converging strategy by stratified_fixpoint_unique): the operational RuleIR→extent evaluator (closes RP-008).
- All anchored on porting Bogaerts–Cruz-Filipe 2024 (AFT-in-Coq) to Lean 4.
Conformance. The ~600 tests + keystone.rs stay green throughout; new differential + per-feature tests per phase.
Workflow. Branch off origin/main (this RFD is on rfd/0018-production-reasoner); per-PR CI-gated; commit at checkpoints; no “done” without a running/benchmarked proof.
Spec. Resolves RFD 0003’s deferred retraction/IVM open question; refines its DBSPExecutor row (Feldera dbsp fork, not DD/timely). The reference (spec/reference/src/{17,19,20}) follows once the design lands.
Relationship to RFD 0017 (refinement classification). RFD 0017’s two refinement kinds map cleanly onto this engine, with no scope expansion: a defined (iff) refinement is a derived-membership rule — iof(x, C) :- iof(x, parentᵢ), P(x) — i.e. an incrementally-maintained view (a mutation touching P(x) updates C-membership as a Z-set delta; this is the IVM win applied to classification), and is already covered by D11-IN’s recursive stratified Datalog. A primitive (where) refinement is a write-time invariant (OE0668 RefinementInvariantViolated) — a constraint check at the mutation boundary (Cat3-shaped), not a derived relation. RFD 0017 explicitly defers the iff realization theorem (iof(x,C) ↔ iof(x,parent) ∧ P(x)) to RP-007-adjacent work; this engine is its operational home, and that theorem is part of the D8 / RP-008 Lean conformance debt.

Open questions / tracked-future

The Phase-0 spike — PASSED (2026-06-05). Decoupling (∝|Δ| not |DB|) confirmed to 10M facts; AFT pair-encoding + stratified NAF correct at 1M with 1.46× overhead; VT-as-payload preserves incrementality. The three Path-A frictions found (MSRV churn; >65k-record relations need a storage backend; .output() reads empty at scale, must use .accumulate_output()) all reinforce D3 — fork & own. See the Phase-0 RESULT block in §Decision/D12.
AS OF tx-time = X × Salsa backdating — prototype with the Phase-1 Salsa layer; fallback = disable backdating for AS OF queries (ARS R-W5S2-04).
ℕ[X]→PosBool[X] commutes with deletion under DBSP’s chain rule (ARS R-W5.S3-4) — new semantics; gates Phase 1’s incremental-retract code (Lean-first; see Consequences/Lean). Not “tracked-future” — a Phase-1 prerequisite, listed here for visibility.
Persisted-read-model layout — a research item: graph-DB storage internals (CSR / index-free adjacency, native edge stores, triple-store indexing) → the Phase-4 read-model design.
PG schema canonicalization — spec §20.3 (TSTZRANGE+GIST, generated columns, JSONB derivation) vs the shipped migrations (INT8 ns columns, projection-index tables, TEXT derivation). Decide canonical before Phase 4 scaling.
The SLG kernel arc (WFS + ambiguity-propagating defeasibility) — the next major effort after Phase 2; its own RFD, building on D1’s pair-encoding and the D10 dispatch seam.
Per-tenant eviction — tracked, not built; enters when a tenant exceeds ~100 GB (provenance ≈ 2 TB/tenant at 10⁹ is the binding constraint); recommended composition = snapshot-at-τ₀ + continuous provenance compaction + tenant archival.
Cross-stream operator throughput for the pair-encoding — RESOLVED by the Phase-0 spike: measured at 1.46× at 1M facts, under the ~2× the construction predicts.

Keyboard shortcuts

Argon Design Records (RFDs)