Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD 0001 — ID architecture

  • State: discussion
  • Opened: 2026-05-27
  • Decides: identifier types used across oxc-protocol, oxc-oxbin, oxc-instantiate, oxc-runtime, oxc-storage-mem, oxc-storage-pg, oxc-reasoning.

Question

Argon currently identifies every declared symbol, every event, every partition key, and every runtime tuple element with a 16-byte uuid::Uuid. Across ~286 call sites and ~21 distinct identifier roles, this is one type doing many jobs. The reasoner — which IS the data system — sees UUIDs in every Z-set tuple and every storage index entry. Is uuid::Uuid the right identifier type for Argon, and if not, what is?

Context

How identifiers flow through Argon

A modeler writes Argon source. The compiler lowers it to a sequence of typed axiom events stored in a .oxbin artifact. The runtime loads the artifact into a Module, opens a Store against a StorageBackend, executes mutations that emit more events, and answers queries by reasoning over the resulting fact set.

Identifiers appear at every stage:

StageIdentifier shapes
Sourcequalified_path: String (“demo::Person”)
ASTUUIDs minted from Uuid::new_v5(WORKSPACE_NS, facet ++ qualified_path)
Wire (.oxbin)UUIDs in 70+ body fields and 7 mandatory event-header fields
Storage indexesUUIDs in BTreeMap keys
Reasoner Z-setsUUIDs encoded as Value::Id(Uuid) in tuple bytes
Hot pathUUIDs everywhere; 16 bytes per identifier, 17 bytes per identifier with CBOR framing

Why the choice is load-bearing

In a system whose hot path involves billions of tuple comparisons in Z-set joins, the identifier IS the data structure. Concrete costs at scale:

  • AxiomEvent header: 128 bytes of identifiers per event (7 mandatory UUIDs + 1 SHA-256). At a billion events: ~104 GB of just IDs in the header.
  • Storage indexes: every (LiveKey, HistKey) entry carries two UUIDs (tenant + fork) plus a kind tag. ~40 bytes per index entry.
  • Z-set tuples: every Value::Id is 17 CBOR-encoded bytes. A 3-arg relation tuple is ~55 bytes for IDs alone.

Survey: what comparable systems do

We studied three prior art systems carefully (Kora, Nous, the generic UUIDv8 graph-DB proposal). Each made specific design choices that don’t map cleanly to Argon:

  • Kora (/Users/ivanleon/Code/wt/eidos/main/): Iri(NonZeroU32) backed by a process-wide LazyLock<ThreadedRodeo> interner. Per-engine ConceptIndex with TOP=0, BOTTOM=1 sentinels — owl:Thing / owl:Nothing baked in. Global static state. Tightly coupled to OWL semantics.
  • Nous (/Users/ivanleon/Code/wt/orca-mvp/main/crates/nous/): define_id! macro generating newtyped u64 ids derived from FNV-1a 48-bit hashes of qualified IRIs. Sparse → dense IdBridge rebuilt per schema; DERIVED_ID_BIT = 1<<63 overload on IndividualId. 48-bit hash collides at ~2^24 entries.
  • Generic graph-DB recommendation (UUIDv8 outer + 64-bit InternalId inner, with HLC timestamp inside the UUIDv8): the HLC timestamp inside the wire ID breaks byte-deterministic builds, which is a non-negotiable Argon property.

None of these are right for Argon directly. The substrate-neutrality requirement (no OWL Thing/Nothing), the per-build determinism requirement (no HLC inside wire IDs), and the multi-axis partition model (tenant × fork × standpoint × module — none of which are subordinate to the others) all push toward a custom design.

Decision

Argon defines ten identifier types, each tuned to its identity-source and role. No uuid::Uuid anywhere. The uuid crate is dropped from the workspace.

The types

TypeBytesIdentity sourceScopeRole
Iri(Arc<str>)Modeler-authoredI/O boundaryQualified path: "demo::Person". Surface contract; never in hot paths.
NameRef(NonZeroU32)4Symbol-table positionPer-workspaceWire-format identifier for every declared symbol (concept, relation, module, standpoint, metatype, metarel, metaxis, trait, impl, struct, enum, rule, query, mutation, compute, sink, macro, test).
TenantId(NonZeroU32)4Provisioning tablePer-deploymentTenant partition key.
ForkId(NonZeroU32)4Fork tablePer-tenantFork partition key (per-tenant scope).
IndividualId(NonZeroU64)8System-allocated counterPer-(tenant, fork)Dynamic individual identity. Replaces caller-provided UUIDs; external identifiers are data (a hasExternalId property), not identity.
EventId(NonZeroU64)8Build-deterministic counter (compile-time); HLC-derived Snowflake (runtime)Per-(tenant, fork)Per-event identity. Layout: `[tx_seconds: 32
AxiomKey([u8; 16])16BLAKE3-128 of canonical bodyPer-(tenant, fork)Logical proposition identity. Same proposition asserted twice has the same AxiomKey.
InternalId(NonZeroU64)8Built at Module::load; thrown away at unloadPer-Module-loadRuntime-only hot-path id. Layout: `[kind: 8
ContentId([u8; 32])32BLAKE3-256 of bodyContent-derivedCryptographic content hash. Replaces SHA-256.
CompositionSignature([u8; 32])32BLAKE3-256 of composition inputContent-derivedWorkspace composition signature. Replaces SHA-256.

Amendment (2026-06-11, issue #270 / PR #285). The NonZeroU32 → INT4 mapping above carries an invariant the original RFD left implicit in the Postgres encoder. Recorded here so call-site comments can cite it: a NameRef’s Postgres wire mapping is INT4 with the high bit reserved — the valid band is [1, 2^31-1]. This is enforced by the sqlx::Encode impl (oxc-protocol/src/ids.rs), which signed-converts through i32::try_from and refuses any value with the high bit set rather than letting it wrap negative on the wire. The 0 slot stays the NonZero* niche sentinel; the top bit is held in reserve. Two consequences follow. First, any hash-derived NameRef (the property_id_for_field / reflective_sort_name_ref / individual_id_from_name stand-ins, which fold a BLAKE3 prefix into an id pending the symbol-table lift) must mask into the band — & 0x7FFF_FFFF, zero-folded to 1 — or a coin-flip of field names would set the high bit and abort the mutation (PR #285). Second, folding a hash into 31 bits is not injective; PR #285 adds the load-time collision gate (OE0231) so two distinct Type::field pairs that alias one id refuse loudly instead of silently sharing a storage column. The hash stand-ins are a bridge: the sequential interning table that derives NameRefs from canonical symbol-table position (Phase 3 below) is the production follow-up tracked at #270, and it retires both the mask and the gate.

Two ways identity is derived

The 10 types split cleanly on identity-source:

Content-addressed (deterministic from source content; same source → same byte):

  • NameRef — derived from symbol-table position, which is derived from canonical-sorted qualified paths.
  • AxiomKey — BLAKE3-128 of canonical body bytes.
  • ContentId — BLAKE3-256 of body bytes.
  • CompositionSignature — BLAKE3-256 of composition input.

Allocation-addressed (system-allocated counters, deterministic per-build):

  • TenantId, ForkId — provisioning tables (counters under operator control).
  • EventId — per-event counter (deterministic in build mode; Snowflake in runtime).
  • IndividualId — per-mutation counter (deterministic within a build pass).

Iri is a surface artifact; InternalId is a runtime-only artifact. Neither participates in wire identity.

BLAKE3 unification

AxiomKey (128 bits) and ContentId (256 bits) come from a single BLAKE3-256 invocation of the canonical body bytes. AxiomKey is the first 16 bytes; ContentId is the full 32 bytes. One hash invocation per event.

BLAKE3 replaces SHA-256 throughout because: ~3× faster, parallel-friendly, same cryptographic strength, smaller code size. The replacement is a one-time wire-format change tied to this RFD.

Per-lattice sentinels (engine-local, never on wire)

Within a single Module load, the reasoner builds InternalId-space per lattice (per metatype’s subsumption lattice, per standpoint lattice, per refinement lattice). Within each lattice, the reasoner reserves:

  • The lattice’s (universal) at the lowest available InternalId for that lattice’s kind+partition.
  • The lattice’s (inconsistent) at the next one.

These are engine-local artifacts, recomputed at every Module::load, never written to the wire. The lattice’s actual top and bottom are declared concepts (e.g., the universal in a metatype’s subsumption lattice is a concept declared in stdlib like std::mlt::Class); the engine’s reservation is purely an evaluation-time optimization for short-circuit operations.

This differs from Kora/Nous, which reserve TOP=0 / BOTTOM=1 globally across all concept IDs — that’s an OWL-ism (a single owl:Thing for the whole ontology). Argon doesn’t have a single universal; each lattice has its own bounds, and they’re declared entities, not primitive IDs.

Type-distinct newtypes via the define_id! macro

Each ID type is a newtype with its own Display, Ord, Hash, Serialize, Deserialize, big-endian wire encoding, and NonZero* niche for Option<_>. Generated via a single define_id! macro (inspired by Nous’s pattern). Crossing roles requires explicit conversion — no accidental TenantIdForkId confusion at the type level.

Rationale

Why NameRef instead of UUIDv5(path)

The .oxbin format already mandates a symbol-table section (§D.5) with HDT-PFC-compressed canonical-sorted qualified paths. We’ve been routing around it by minting UUIDv5(path) instead of using the symbol-table position directly.

UUIDv5(path) gives:

  • 16 bytes per reference
  • Determinism (same path → same UUID)
  • Cryptographic collision resistance

NameRef = symbol-table position gives:

  • 4 bytes per reference
  • Determinism (canonical sort order)
  • Zero collision risk (by construction — positions are unique)
  • Native: the dictionary the format already requires

The savings are 12 bytes per declarative reference. Across hundreds of bodies in a real workspace with hundreds of references each, this is substantial. And the property — same source → same NameRef — is preserved.

Why content-addressed AxiomKey at 128 bits, not 64

A 64-bit AxiomKey saves 8 bytes per event header — ~5% of the header. The cost: birthday-collision probability ~37% at 2^32 entries; necessitates collision-handling machinery (either deterministic re-salting or content_id fallback verification on every lookup).

128 bits is birthday-safe past 2^64 entries (effectively unbounded). Collision handling unnecessary. BLAKE3-128 is fast (free, given we compute BLAKE3-256 for ContentId anyway). The 8 bytes saved on the event header aren’t where the storage wins live — those are in declarative _id fields (16 → 4 = 12 bytes saved each, multiplied across every body) and Z-set tuples (17 → 9 bytes per ID, multiplied across millions of tuples).

The architectural rule: don’t compromise the wire format for bytes that aren’t on the hot path.

Why IndividualId is system-allocated, not caller-provided

Pattern: every database treats internal entity identity as surrogate, and external (caller-provided) identity as data. Postgres uses BIGSERIAL PRIMARY KEY + external_id TEXT UNIQUE. Datomic uses partition-encoded entids + :db/ident for natural keys.

Argon mutations like register(external_id: Text) should:

  1. Allocate a fresh IndividualId(NonZeroU64) internally.
  2. Emit iof_assertion(IndividualId, Person).
  3. Emit hasExternalId(IndividualId, external_id) for the caller’s identifier.

The caller can query “find the Person where hasExternalId = ‘user_12345’” later. External identity is a property; internal identity is a surrogate. This is the pattern that scales and stays clean — and it lets IndividualId be 8 bytes (system-allocated) instead of 16 bytes (caller-provided UUID).

Why InternalId is runtime-only

InternalId layout [kind: 8 | partition: 16 | sequence: 40] is optimized for:

  • Cache-friendly Vec<u64>-indexed bitmaps (per-kind, per-partition).
  • Fibonacci-hashed U32-keyed sets in hot loops.
  • Zero-cost type discrimination via the kind byte.

But this layout is engine-policy, not modeler-visible. We reserve the right to renumber on compaction, change partition functions, etc. Making InternalId part of the wire format would couple wire to engine — wrong direction. It stays runtime-only; Module::load builds a NameRef ↔ InternalId dictionary; the reasoner operates entirely in InternalId space.

Why no UUIDs

Five reasons:

  1. We control allocation. UUIDs solve the “globally unique without coordination” problem. Argon’s identifiers are either declared (NameRef from canonical symbol position), system-allocated (EventId, IndividualId), or content-derived (AxiomKey, ContentId). No coordination problem exists.

  2. Type safety. All 21 identifier roles collapsing into one Uuid type is a regression. Newtypes per role give compile-time discrimination.

  3. Storage efficiency. Replacing UUIDs with the right-sized type per role saves 60-80% of identifier bytes across the system.

  4. Wire format determinism. UUIDv4 is random; UUIDv7 has wall-clock time; UUIDv5 is one hash family. Custom types let us pick the determinism story per role (content-hash for AxiomKey, sort-position for NameRef, deterministic-counter for EventId in build mode).

  5. No external compatibility need. Argon doesn’t need to interoperate with systems-that-mint-UUIDs at the identifier level. Federation happens at the Iri level (qualified paths), not at the binary-ID level.

Alternatives considered

Alt 1: Keep UUIDs everywhere

The status quo. Universal, well-understood. Costs: 145 bytes of identifiers per event header; 17-byte Z-set tuple elements; ~286 call sites with one type doing many jobs; full 16-byte width for partition keys with low-cardinality.

Rejected. The hot-path costs and lack of type discrimination outweigh the familiarity benefit.

Alt 2: UUIDv8 outer + 64-bit InternalId inner (the generic doc recommendation)

A two-tier system with UUIDv8 (RFC 9562, custom layout) as the durable external identifier and a packed 64-bit InternalId for hot paths.

Rejected. UUIDv8 in any recommended form embeds a timestamp (HLC), which breaks Argon’s byte-deterministic-build invariant. And the “external identifier” tier isn’t needed for Argon today — we don’t federate at the binary-ID level.

Alt 3: Snowflake-style 64-bit time-ordered IDs throughout

A single 64-bit ID type [time: 42 | shard: 10 | seq: 12] for everything.

Rejected. Conflates allocation-addressed (events) with content-addressed (declarative symbols). Concepts shouldn’t have time in their identity; events should. Single-type-for-everything is what we’re moving away from.

Alt 4: Nous’s FNV-1a 48-bit ConceptId

Hash the qualified path with FNV-1a, truncate to 48 bits, that’s the ID.

Rejected. Collides at 2^24 entries (~16M). Nous trusted IRIs not to collide; Argon shouldn’t. And it conflates declarative identity (paths) with run-of-the-mill IDs.

Alt 5: Kora’s per-engine ConceptIndex with HashMap<Iri, u32>

Each engine maintains its own Iri → u32 index, rebuilt per session.

Rejected as primary scheme (kept as inspiration for InternalId at Module::load time). Per-engine indexes don’t address the wire-format problem; they’re an in-memory representation.

Consequences

Wire format changes (major)

  • Bump oxbin_format_version major (Phase 2 of rollout — see below).
  • All _id: uuid::Uuid fields in oxc-protocol::storage::*Body types become typed: NameRef, EventId, AxiomKey, etc. per their role.
  • AxiomEvent header shrinks from 145 bytes of identifiers to 81 bytes (44% reduction).
  • .oxbin files produced under the old format are not readable under the new format. We have no externally-deployed .oxbin files; this is acceptable.

Code changes

  • uuid crate removed from workspace dependencies.
  • blake3 crate added (replaces sha2).
  • New oxc-ids crate (or oxc-protocol::ids module) carries the 10 types + define_id! macro.
  • oxc-instantiate::identity becomes the Iri ↔ NameRef ↔ symbol-table builder.
  • oxc-runtime::Module::load builds the NameRef ↔ InternalId dictionary.
  • oxc-reasoning::compile::Value becomes Value::Internal(InternalId) plus inline variants.
  • oxc-storage-mem indexes re-keyed on (TenantId, ForkId, kind) instead of (Uuid, Uuid, &'static str).

Performance

Expected wins:

  • Reasoner memory: ~30-50% reduction in Z-set tuple key storage (17 → 9 bytes per ID).
  • Event headers: 44% smaller (145 → 81 bytes).
  • Storage indexes: ~50% smaller (UUID → u32/u64 partition keys).
  • Hash operations: BLAKE3 ~3× faster than SHA-256.

Costs:

  • Module::load builds a dictionary (one pass over declared symbols; negligible).
  • Dropping UUID crate removes a well-tested dependency; replaced with custom types that need testing.

Determinism preserved

Every wire-format identifier is content-derived (NameRef from sort position; AxiomKey from BLAKE3 of body; ContentId from BLAKE3 of body; EventId from build-deterministic counter in build mode). Same source → byte-identical .oxbin. Property preserved.

Phased rollout

The full design can be landed in four independently-shippable phases:

Phase 1 — runtime InternalId, no wire change (~3 days). oxc-reasoning::compile::Value::Internal(InternalId) replaces Value::Id(Uuid). Module::load builds a Uuid ↔ InternalId dictionary. Wire format unchanged. Win: ~30% reduction in reasoner Z-set memory.

Phase 2 — wire format break (~1 week). Add oxc-protocol::ids with all 10 types. Replace every _id: uuid::Uuid in body types. Drop uuid crate, add blake3. Bump oxbin_format_version major. Win: 44% reduction in event header size; type-safe identifiers throughout the wire.

Phase 3 — full Iri interner + symbol-table lift (~3 days). Iri(Arc<str>) with per-workspace arena. The .oxbin symbol-table section becomes the load-bearing dictionary it was designed to be. Win: cleanup of qualified_path: String duplication.

Phase 4 — dense engine structures (~1 week). Fibonacci-hashed U32Set, KindBitmap (Vec<u64> indexed by InternalId), RoleEdges-style packed adjacency. Per-lattice sentinel discipline. Win: foundation for future reasoner backends (SLG, DBSP, Kripke) sharing dense set primitives.

Open questions

  • Cross-workspace federation: when two workspaces’ .oxbin artifacts need to interoperate, what’s the bridge? Open. Likely: an explicit GlobalRef type at the federation boundary only, derived from (workspace_uuid, NameRef) or from full Iri. Out of scope for this RFD.

  • Distributed minting of EventId: the shard: 12 field is reserved but currently always 0. A future RFD addresses the distributed-minting protocol (Stateless Snowflake from container IP, range pre-allocation, or CRDT-style — see “Stateless Snowflake” Chinthareddy 2025).

  • External identifier indexing: when IndividualId is system-allocated and the caller’s identifier is data (a hasExternalId property), querying by external identifier requires a property-indexed lookup. The storage layer’s per-relation indexes (book §20.3.1) cover this, but specific query ergonomics for “find by external id” want a small SDK helper.

  • AxiomKey for non-data axioms: mutation_decl, query_decl, compute_decl are declarative axioms (have stable NameRef-based identity). Should their AxiomKey be BLAKE3(NameRef) or a special discriminator? Likely the former for uniformity; settle when wire format is finalized.