Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD 0033 — The ad-hoc query and mutation surface

  • State: accepted — implemented
  • Opened: 2026-06-14
  • Decides: that arbitrary, not-pre-declared (ad-hoc) queries and mutations are a first-class, default-on capability of the Argon runtime — submitted as source text at request time, parsed, lowered, and executed against the loaded module — with a deployment opt-out that restricts a server to declared invocables only. Establishes that the generic submission path is the substrate, and the declared pub query / pub mutate forms are a thin named wrapper over it — not the only door. Builds on RFD 0014 (the serving surface), RFD 0015 (the mutate body / Operation IR), RFD 0020 / RFD 0021 (the Engine/CompiledRule evaluation seam), and RFD 0022 (the build evaluability gate, whose runtime analogue this RFD must define).
  • Implements (as built): the query-provider Schema interface (oxc_types::Schema) with two parity-gated backends — WorkspaceSchema (build-time, over ASTs) and oxc_runtime::ModuleSchema (runtime, over a loaded .oxbin); the checker (oxc-check) fully routed through it; the runtime frontend (oxc-parser/oxc-check/oxc-instantiate now linked into oxc-runtime); Store::eval_{query,mutation}_source (parse → full type-check → lower → run, ill-typed bodies refused and never run); the POST /v1/{query,mutation}/adhoc HTTP surface + ox query --eval CLI; the AdhocPolicy opt-out (default-on); and the build-vs-runtime agreement gate (oxc-runtime/tests/adhoc_agreement.rs) asserting byte-identical diagnostics + lowered IR. The persisted-projection-cache / IVM materialization arc is a separate follow-on, out of this RFD’s scope.

Question

A data system you cannot query ad hoc is not a database. Argon’s design intent — stated repeatedly and recorded since 2026-05-29 — is that the runtime accepts arbitrary queries and mutations at request time, not only the “stored-procedure” pub query / pub mutate declarations that lower into .oxbin. The declared forms are meant to be a convenience layer over a generic ad-hoc path. A deployment may turn ad-hoc off (lock down to declared-only) for safety, but that is a gate you enable, not a default-closed wall.

Today that path is unbuilt at the edges, and — separately — the project’s own notes and one prior analysis have repeatedly mis-described it as “rejected by design.” It is not. This RFD settles:

  1. What the ad-hoc surface is (wire shape, CLI shape, semantics), for queries and mutations together.
  2. How a body submitted as source text is compiled at runtime, given that the compiler frontend is not currently linked into the serving binary.
  3. The resolution context: how names and types in an ad-hoc body resolve against the loaded module rather than a build-time Salsa workspace.
  4. How much type-checking an ad-hoc body receives (answer: the full amount), and what decidability-tier admittance applies at runtime (answer: build-gate parity by default, configurable).
  5. The security model: default-on, the deployment-level opt-out, and affordance parity — ad-hoc is governed by the same uniform capability scoping as declared invocation, never an ad-hoc-specific leash.

Context

The framing matters because it has been wrong. The corrected, code-verified picture:

The reasoner is rule-as-data, and the compile step already runs at request time. In Store::query_body_dispatch (compiler/crates/oxc-runtime/src/lib.rs:67996824) the runtime decodes a query’s AtomIR body + head Term, calls oxc_reasoning::compile::compile_rule(short, &head, &atoms) at dispatch time, pushes the fresh CompiledRule onto the module’s rules, and runs Engine::evaluate(&rules, &mut catalog, …). The engine consumes &[CompiledRule] as plain data; it has no notion of “pre-declared.” The only thing tying this to a declaration is the source of atoms/head: find_query_decls(name) looks them up from .oxbin-loaded QueryDeclBodys rather than from the request.

The mutation interpreter is already general and decl-agnostic. Store::run_body_op (oxc-runtime/src/lib.rs:3548+) interprets an Operation sequence (InsertIof, InsertTuple, Update, For, If, Return, … — oxc-protocol/src/core_ir.rs:389) and does not take the MutationDecl; the decl is consulted only for argument validation and capability checks at the boundary. The storage write methods (emit_iof_assertion, emit_relation_tuple, emit_individual_property_assertion) are origin-agnostic.

So the constraint is not semantic. It is three concrete wiring gaps:

  1. No request field for a body. DispatchDescriptor is { qualified_path, args, return_type } (oxc-serve/src/lib.rs:2671); resolution is query_decls.get(qualified_path) → 404 if absent. There is nowhere to put a body. (The runtime refusal ARGON_RUNTIME_UNSUPPORTED_QUERY_BODY at oxc-serve/src/lib.rs:6749 is a narrower executor gap — field projections in bodies are not yet executable — not an ad-hoc policy.)

  2. The compiler frontend is not linked into the serving binary. oxc-serve and oxc-runtime depend on oxc-reasoning, oxc-protocol, oxc-oxbin (+ storage) — and not oxc-parser, oxc-check, oxc-instantiate, oxc-resolver, or oxc-db (verified in both Cargo.tomls). The runtime can compile pre-lowered IR but cannot turn source text into IR.

  3. Name/type resolution is build-time. The frontend’s full type-checker (oxc-check) is bound to a Salsa OxcDb/Workspace/resolver.

The fourth fact reshapes the whole design and is why this is tractable:

Lowering is already decoupled from Salsa. oxc_parser::parse(source_text: &str) -> Parse (oxc-parser/src/lib.rs:44) is standalone — string in, parse tree out, no DB. Rule-body lowering is body_to_atoms_ctx(list: &SyntaxNode, ctx: &LowerCtx) -> Vec<AtomIR> (atom_lower.rs:49), and LowerCtx (expr_lower.rs:116) resolves names through plain closuresresolve_type: &dyn Fn(&str) -> Option<NameRef>, plus enum-variant and field-optionality resolvers — not Salsa. In oxc-instantiate/src/lower.rs, every &dyn OxcDb use is parse_file(db, file): the DB’s only job in the lowering data path is to produce the parse tree.

The genuinely Salsa-heavy component is oxc-check (reference resolution + full type inference via resolve_path(db, workspace, file, …) and lower_type_expr(…)), and it runs separately, after lowering. So “decouple frontend lowering from Salsa” splits into two very different tasks:

  • (a) Lowering is already call-site-decoupled. The work is to build a LowerCtx whose closures are backed by the runtime Module / .oxbin catalog instead of the build-time file pre-pass. Small.
  • (b) Type-checking is Salsa-bound. Reproducing it at runtime — or deciding ad-hoc bodies get lighter validation — is the large, separable decision.

What the runtime already exposes for (a): Module (oxc-runtime/src/lib.rs) carries concept_id, concept_id_by_short_name, relation_id, ancestor_concept_ids_including_self, resolve_predicate_key, resolve_rule_name, resolve_mutation_invocable, symbol_path, and (today private) declared_field. The .oxbin declaration bodies (oxc-protocol/src/storage.rs) carry field declarations with type expressions, refinement predicates, relation arg concepts/cardinalities, and query/mutation parameter types — encoded as CBOR. The information needed to back the LowerCtx closures exists; Module simply doesn’t yet expose a resolution surface over it (notably: resolving names inside a CBOR-encoded TypeExpr, field-type lookup, and a parameter catalog).

Decision

Adopt a two-tier surface, with the generic path as substrate and declared decls as a wrapper.

Tier A — the generic submission substrate

A submitted body flows through the same runtime seam declared invocables already use:

  • Queries: (head Term, Vec<AtomIR>)compile_rule → appended to module rules → Engine::evaluate → rows. (This is literally the query_body_dispatch path with the IR sourced from the request instead of find_query_decls.)
  • Mutations: Vec<Operation> (+ params) → the existing run_body_op interpreter, under the same atomicity, read-your-writes, and delta-guard contract as declared mutations (RFD 0015 / RFD 0019).

Tier A is reachable in two framings, in priority order:

  1. Source text (the product surface): the request carries an Argon query/mutation body string. The runtime parses and lowers it (Tier B) to the IR above, then runs it. This is what ox query '<body>', a REPL, and an /v1/query HTTP endpoint use.
  2. Pre-lowered IR (the substrate boundary): the IR itself is the unit Tier A executes. It is the internal contract the source-text path compiles down to, and declared decls already produce it. Whether IR is also a public client surface is left open deliberately (§Open) — it is a performance/optimization question (a precompiled/prepared-statement analogue), and the answer should be whatever is correct once the prepared-body / caching design is worked out, not a guess made here. Note that IR submitted directly would bypass the type-checker, so if exposed it must carry its own validation story — another reason to settle it with the performance design rather than now.

Declared pub query/pub mutate become wrappers: their dispatch resolves a name to stored IR and then enters the same Tier A execution. No second engine path.

Tier B — runtime parse + lower + check (the resolution contract)

Parsing is the easy part: oxc_parser::parse(source_text: &str) -> Parse is already standalone (no DB). The hard part — resolving and type-checking the body against the loaded schema — is solved by a single proven pattern, not by carrying source and not by a second checker.

The query-provider pattern (decision #3, refined 2026-06-15). Across every mature separately-compiled language — rustc (.rmeta as a query provider: tcx.type_of(def_id) is answered from local HIR or by decoding metadata, dispatched only by local-vs-extern), Go (go/typesImporter), OCaml/GHC/SML/Scala/F# (rehydrate serialized data into the same Env / TyThing / StaticEnv / typed-tree the checker already consumes) — the dominant, unanimous design is one type-checker whose environment access is an interface, answered either from source (local) or from already-resolved serialized facts (imported / loaded). Nobody re-elaborates the dependency’s source; nobody forks the checker. The PL-theory framing is the same (external prior art, cited as ideas not authority): F-ing modules’ “signatures are views over the kernel’s type structure, not a parallel type system,” and the .olean / .ttc interface-file precedent that the serialized environment is the type-checking source-of-truth.

Concretely for Argon:

  • Introduce a Schema interface — the narrow set of environment-access operations the frontend actually performs: resolve a name to a declared concept/relation/struct/enum; a concept’s fields and their types; subsumption/parent edges; relation argument arities and types; enum variants; query/mutation parameter types; and each concept’s world assumption (CWA/OWA) (so three-valued OWA refinement checking can’t silently diverge — a substrate-research caveat).
  • oxc-parser (standalone), oxc-instantiate body-lowering (already (&SyntaxNode, &LowerCtx), no DB), and oxc-check all resolve through Schema. The build-time backend answers from the Salsa workspace / ASTs (today’s code, behavior unchanged); the runtime backend answers from the loaded module. The inference and lowering logic is shared and untouched — only the environment-access surface is abstracted. This is the rustc local-vs-extern split, not a rewrite of the type system.

The runtime backend reads a projection over the event log — it serializes nothing new. This follows from how Argon storage works today (verified in current code, not assumed): storage is a single append-only axiom_events log (oxc-protocol’s AxiomEvent; the axiom_events table in oxc-storage-pg), and Module already builds its concept/relation/field indexes from that event stream at load. So the runtime Schema is a reader over the catalog projection the Module already builds from declaration eventsnot an embedded copy of source and not a separate schema section. The data it needs (resolved field TypeExprs, parent ids, relation arg types, params, refinement predicates) is already in the .oxbin decl bodies. We do not add a redundant representation of facts the log already holds; we expose them through the interface.

The artifact-identity and drift-guard machinery already exists; the Schema backend keys on it. The separate-compilation literature is unanimous that cross-boundary type identity must be a persistent content hash (rustc DefPathHash + StableCrateId; SML content-derived PIDs), never a structural match or an allocation-order stamp. Argon’s .oxbin already implements this: per-section BLAKE3 content hashes and a composition signature (oxc-oxbin/src/composition_signature.rs, content_hash.rs, section.rs), a multi-axis version preamble with strict-producer/liberal-consumer gating checked at the load site before any body section (versioning.rs; reader.rs), and a load-time tier gate (validation.rs layer1_validOE1204). So the boundary is already guarded two ways — a hard version/format header (deterministic refusal of an incompatible artifact) plus content fingerprints over the sections. The runtime Schema backend identifies its schema by the loaded module’s composition signature and section hashes; nothing new is invented here.

The genuine residual is narrower than “no identity”: artifact-level identity is solid, but it is not yet threaded to per-event / per-symbol identity inside the store — the storage-side gap where module_id is effectively constant, so two schemas’ symbol ids can collide at the event level (a known storage defect). Schema resolution must carry module/artifact identity down to per-symbol resolution; fixing that is shared with the storage-identity work, not additive to it.

Type-checking: full, no shortcuts (decision #1)

An ad-hoc body receives the same, complete type-checking a declared body gets — name resolution, reference checking, full inference — via the same oxc-check logic, now resolving through Schema. There is no “lighter validation” tier and no unchecked-but-executed path; a half-checked ad-hoc surface would be exactly the hollow feature the house rules forbid.

Parity is enforced as a canonical-input contract + agreement test — the same discipline the repo already runs at the Lean↔Rust boundary (the @[language_interface] drift test in oxc-protocol, where one logical contract is checked across two representations). Schema is the only way the frontend may touch the environment — no caller reaches around it to the AST or the catalog (make-illegal-states-unrepresentable) — and a CI agreement test asserts that the same body checked against the build-time and runtime Schema backends yields byte-identical diagnostics and identical lowered IR. Drift is a defect, gated like any spec↔code drift.

Decidability-tier admittance (decision #2)

By default an opted-in deployment admits the same tier ceiling as the build evaluability gate (RFD 0022) — ad-hoc bodies are held to the identical decidability bar as declared ones. The load-time tier gate that enforces parity already exists (oxc-oxbin/src/validation.rs layer1_valid, refusing max_tier_claimed beyond the runtime’s capability with OE1204); an ad-hoc body’s classified tier is checked against the same ceiling. The ceiling is intended to be configurable per deployment (a server may set a lower ad-hoc ceiling for untrusted callers) — that per-call/lenient mode is not yet built (today’s gate is artifact-level strict) — but the default is parity, and a deployment may not silently admit more than the build gate would.

Security: affordance parity, deployment-level control only

The governing principle (decision #4): ad-hoc queries and mutations have the same affordances as everything else. Ad-hoc is not a hobbled subset of the declared surface — it is the surface, with declared forms as the named convenience layer over it. We do not special-case what ad-hoc may express, read, or write relative to a declared invocable. The Postgres test applies: a system you cannot freely query and mutate is not a useful system.

Control is therefore deployment-level, applied uniformly, never an ad-hoc-specific leash:

  • Ad-hoc submission is on by default. A deployment may opt out to restrict to declared invocables only (lock-down), or run read-only (a normal database posture, not an ad-hoc penalty) — these are the same kinds of switches any database exposes.
  • Whatever capability / RBAC / tenant / fork / standpoint scoping exists applies equally to declared and ad-hoc invocation. An ad-hoc mutation that a caller’s capabilities permit is exactly as permitted as the equivalent declared mutation.
  • One capability exception — forget. Physical erasure (forget) is gated on the build-time #[allow_forget] grant, which is a declaration-site capability. A runtime-submitted body has no declaration site and so cannot confer it on itself; an ad-hoc forget is therefore refused (OE0730). This is not an ad-hoc-specific leash on affordance — it is that a request cannot forge a build-time capability grant (the same reason an ad-hoc body cannot, say, mark itself #[brave]). A declared #[allow_forget] mutate still erases; an ad-hoc body cannot. We record this as the deliberate exception to the otherwise-unqualified parity rather than pretend parity is total (originally this section asserted no Forget gate at all — that was the bug, not the code).
  • This is orthogonal to the generic-entity-write denial (POST /v1/entities → 404, oxc-serve/src/lib.rs:9565): that is an untyped-blob REST shape, a different axis. Ad-hoc writes go through the typed mutate/Operation mechanism with the full mutate affordance set. The two must not be conflated.

Forward compatibility: heterogeneous stores (keep this seam clean)

The stated future is specialized stores — relational / columnar / blob — that are “part of the Argon knowledge graph,” queried uniformly, with per-data placement configured in ox.toml. That design is not settled here, but this RFD must not foreclose it. Two principles, grounded in current-repo design intent (RFD 0020) and external prior art (database catalog/connector SPIs; the BYODS work, Sahebolamri et al., OOPSLA 2023):

  • Schema stays strictly store-agnostic. Schema answers type questions only; it must never know where bytes live. Physical placement is a separate layer — RFD 0020’s BYODS (D6: a physical Relation is an interface; representations coexist) plus the RuntimeStorageBackend seam, selected per-relation by ox.toml placement. This is the OBDA shape (data stays in place, queried through the ontology; ox.toml placement is the R2RML analogue), and the catalog/connector SPIs (Calcite Schema/Table.getRowType, Trino ConnectorMetadata) confirm the split: the engine owns the type system; sources map into it and never own planner type semantics.
  • The ad-hoc path lowers to LogicalPlan, not to a single in-memory catalog. RFD 0020 D2 already decided that ad-hoc queries, declared rules, and the type-checker goal all lower to one shared LogicalPlan (the IR scaffolded but currently dead in oxc-reasoning/src/logical/). Lowering ad-hoc bodies to that IR — rather than hard-wiring the current materialize_predicates pull-everything-into-memory model — is what keeps the surface multi-store-ready by construction. When pushdown arrives it follows the proven contract: an optimization never an obligation, negotiated as (handle-that-absorbed-work, remainder) with the residual always re-checkable in-engine (Trino/FDW), capability modeled as binding patterns (a blob/KV store can’t free-scan, TSIMMIS), and shippability gated on determinism + identical both-sides semantics.

This RFD is, in effect, the realization of RFD 0020 D11 (“ad-hoc queries and mutations are first-class … gating is an engine policy, not a language restriction”); its new contribution is the runtime-frontend mechanism (the Schema query-provider, content-addressed identity, parity discipline) that D11 left unspecified.

Rationale

  • Reuse over reinvention. The execution substrate (compile-at-dispatch for queries, the general Operation interpreter for mutations) already exists and already runs at request time. Tier A is mostly routing: let the IR come from a request. This is why “ad-hoc is impossible by design” was always wrong.
  • Lowering is already where we need it. Because parse is DB-free and LowerCtx is closure-based, the runtime lowering path is a Module-backed resolver + a dependency edge — not a rewrite of lowering.
  • One frontend, no drift. Reusing oxc-instantiate lowering and oxc-check type-checking against a Module-backed context (rather than runtime-only reimplementations) keeps build-time and runtime behavior identical, honoring the spec↔code drift discipline. Byte-for-byte diagnostic agreement is the acceptance test.
  • Full parity, no shortcuts. Ad-hoc bodies are type-checked exactly as declared bodies are (decision #1) and hold the same decidability ceiling by default (decision #2). A partially-checked ad-hoc surface would be a hollow feature; we do not ship one.
  • Ad-hoc is the surface, not a sandbox. Declared forms are sugar over the generic path; ad-hoc has full affordance parity (decision #4). Control is deployment-level and uniform, never an ad-hoc-specific restriction.
  • Default-on matches the product. Locking down is a deployment choice, not the substrate’s posture.

Alternatives considered

  1. Declared-only forever (status quo). Rejected: contradicts the stated design intent; “a database you can’t query ad hoc isn’t a database.”
  2. Source text only, IR never public. Likely, but not decided here: whether IR is also a public (prepared-statement-style) surface is folded into the performance/caching design (decision #3, §Open) so the answer is the correct one rather than a guess.
  3. A separate runtime-only frontend fed by an .oxbin catalog (decision-#3 option B). Rejected: faster to stand up but creates a second lowering/checking path that drifts from the build-time one — the exact failure mode the intent-node/drift-gate discipline exists to prevent.
  4. Ship ad-hoc with reduced/“lighter” validation first, full type-checking later. Rejected (decision #1): a half-checked surface is a hollow feature. Full oxc-check parity is in scope from the start, which is what pulls the checker into the runtime frontend.
  5. A special capability leash on ad-hoc writes (extra gates on Update/retract because they are ad-hoc). Rejected (decision #4): ad-hoc has affordance parity; control is uniform and deployment-level. The lone exception is forget, refused for ad-hoc — but that is not a leash on affordance, it is that forget’s #[allow_forget] capability is conferred at a declaration site a request doesn’t have, so the request can’t forge it (see Security).
  6. A generic untyped entity-write endpoint (POST /v1/entities). Rejected/kept-absent: ad-hoc writes belong to the typed mutate/Operation mechanism, not an untyped blob surface.

Consequences

  • New runtime dependencies: oxc-serve/oxc-runtime gain the frontend — oxc-parser, oxc-instantiate, and (per decision #1) oxc-check / oxc-resolver / oxc-types, once their environment access is routed through Schema. This is a substantial change to the runtime’s relationship to the frontend (the runtime/AGENTS.md “the reasoner was not built here” tombstone framing and the oxc-runtime/oxc-serve intent nodes all need updating). Introducing Schema as the sole environment-access contract — with the build-time backend over ASTs and the runtime backend over the event-log projection — is the bulk of the engineering and lands as its own arc before the surface is wired.
  • Artifact identity + drift guard already exist; per-symbol identity is the residual. Artifact identity (composition signature + per-section content hashes) and the version/tier load gates are already built (oxc-oxbin: composition_signature.rs, content_hash.rs, versioning.rs, validation.rs). The runtime backend reuses them. What remains is threading that identity to per-event/per-symbol resolution (the storage-side module_id collision gap) so two schemas’ symbol ids can’t alias — shared with the storage-identity fix, not additive.
  • New Schema-backing Module surface: name/type/parameter/world-assumption/refinement resolution over the CQRS catalog projection (additive; the facts are already in the .oxbin decl bodies — no new serialized representation, no embedded source).
  • New wire + CLI surface: a generic submission request shape and ox query '<body>' / REPL entry (exact shapes in the implementing PRs).
  • Spec/Lean: per the repo workflow, this is language-surface — RFD + reference draft → Lean → code. The reference (spec/reference/) gains an ad-hoc-submission section; the Lean substrate is unaffected in its semantics (an ad-hoc rule is just a rule), but the storage/runtime contract may need to record that evaluation admits request-sourced rules, and the security/opt-out posture should be described where the serving contract lives.
  • The “rejected by design” framing is retired in code comments, AGENTS nodes, and project memory.

Open questions

Decisions #1–#4 are settled above, and the resolution mechanism is settled as the query-provider Schema interface (one checker, build-time backend over ASTs, runtime backend over the event-log projection — the rustc/Go model). What remains genuinely open:

  1. The exact Schema operation set. The minimal trait surface (it must cover name→declaration resolution, field/param types, subsumption edges, enum variants, world-assumption, and refinement metadata) and how much it reuses the indexes Module already builds (concept_ids, relation_signatures, etc.) vs. adds. Identity/fingerprint is not open — the artifact already carries it (composition signature + section hashes); the backend keys on that. Lazy per-name materialization (the Idris .ttc pattern) is a future optimization, not needed for v1 since Module already eagerly indexes the (small) schema.
  2. The performance / prepared-body design (decision #3). The load-bearing open thread: compile-caching of recurring ad-hoc bodies (keyed by body hash + composition signature — the content-hash machinery already exists), whether a public prepared-IR fast-path is the correct surface, and how Salsa incrementality is reused at runtime. The IR-submission question is answered here, not in isolation.
  3. Materialization model. Ad-hoc reads today inherit the full in-memory materialize_predicates build (oxc-reasoning; SemiNaiveExecutor). The intended replacement — a content-addressed, generation-invalidated projection cache (the read-model section is already reserved in .oxbin and invalidation exists in oxc-storage-pg get_projection_cache, but it is not populated; the DBSP/IVM executor is drop-in-ready but gated) — is a real forward arc. The ad-hoc path should target that Engine/projection-cache seam rather than entrench the full-scan, and this overlaps the external/foreign-relation (“market oracle”) thread.
  4. Standpoint / fork / bitemporal scoping. Ad-hoc bodies need the same as_of / standpoint / fork context as declared dispatch; query_body_dispatch currently refuses across-standpoint parameterized bodies (oxc-runtime/src/lib.rs:6787). The ad-hoc path must reach full parity here, so that refusal is a gap to close, not a boundary.