RFD 0001 — ID architecture
- State: discussion
- Opened: 2026-05-27
- Decides: identifier types used across
oxc-protocol,oxc-oxbin,oxc-instantiate,oxc-runtime,oxc-storage-mem,oxc-storage-pg,oxc-reasoning.
Question
Argon currently identifies every declared symbol, every event, every partition key, and every runtime tuple element with a 16-byte uuid::Uuid. Across ~286 call sites and ~21 distinct identifier roles, this is one type doing many jobs. The reasoner — which IS the data system — sees UUIDs in every Z-set tuple and every storage index entry. Is uuid::Uuid the right identifier type for Argon, and if not, what is?
Context
How identifiers flow through Argon
A modeler writes Argon source. The compiler lowers it to a sequence of typed axiom events stored in a .oxbin artifact. The runtime loads the artifact into a Module, opens a Store against a StorageBackend, executes mutations that emit more events, and answers queries by reasoning over the resulting fact set.
Identifiers appear at every stage:
| Stage | Identifier shapes |
|---|---|
| Source | qualified_path: String (“demo::Person”) |
| AST | UUIDs minted from Uuid::new_v5(WORKSPACE_NS, facet ++ qualified_path) |
Wire (.oxbin) | UUIDs in 70+ body fields and 7 mandatory event-header fields |
| Storage indexes | UUIDs in BTreeMap keys |
| Reasoner Z-sets | UUIDs encoded as Value::Id(Uuid) in tuple bytes |
| Hot path | UUIDs everywhere; 16 bytes per identifier, 17 bytes per identifier with CBOR framing |
Why the choice is load-bearing
In a system whose hot path involves billions of tuple comparisons in Z-set joins, the identifier IS the data structure. Concrete costs at scale:
- AxiomEvent header: 128 bytes of identifiers per event (7 mandatory UUIDs + 1 SHA-256). At a billion events: ~104 GB of just IDs in the header.
- Storage indexes: every
(LiveKey, HistKey)entry carries two UUIDs (tenant + fork) plus a kind tag. ~40 bytes per index entry. - Z-set tuples: every
Value::Idis 17 CBOR-encoded bytes. A 3-arg relation tuple is ~55 bytes for IDs alone.
Survey: what comparable systems do
We studied three prior art systems carefully (Kora, Nous, the generic UUIDv8 graph-DB proposal). Each made specific design choices that don’t map cleanly to Argon:
- Kora (
/Users/ivanleon/Code/wt/eidos/main/):Iri(NonZeroU32)backed by a process-wideLazyLock<ThreadedRodeo>interner. Per-engineConceptIndexwithTOP=0,BOTTOM=1sentinels —owl:Thing/owl:Nothingbaked in. Global static state. Tightly coupled to OWL semantics. - Nous (
/Users/ivanleon/Code/wt/orca-mvp/main/crates/nous/):define_id!macro generating newtypedu64ids derived from FNV-1a 48-bit hashes of qualified IRIs. Sparse → denseIdBridgerebuilt per schema;DERIVED_ID_BIT = 1<<63overload onIndividualId. 48-bit hash collides at ~2^24 entries. - Generic graph-DB recommendation (UUIDv8 outer + 64-bit InternalId inner, with HLC timestamp inside the UUIDv8): the HLC timestamp inside the wire ID breaks byte-deterministic builds, which is a non-negotiable Argon property.
None of these are right for Argon directly. The substrate-neutrality requirement (no OWL Thing/Nothing), the per-build determinism requirement (no HLC inside wire IDs), and the multi-axis partition model (tenant × fork × standpoint × module — none of which are subordinate to the others) all push toward a custom design.
Decision
Argon defines ten identifier types, each tuned to its identity-source and role. No uuid::Uuid anywhere. The uuid crate is dropped from the workspace.
The types
| Type | Bytes | Identity source | Scope | Role |
|---|---|---|---|---|
Iri(Arc<str>) | — | Modeler-authored | I/O boundary | Qualified path: "demo::Person". Surface contract; never in hot paths. |
NameRef(NonZeroU32) | 4 | Symbol-table position | Per-workspace | Wire-format identifier for every declared symbol (concept, relation, module, standpoint, metatype, metarel, metaxis, trait, impl, struct, enum, rule, query, mutation, compute, sink, macro, test). |
TenantId(NonZeroU32) | 4 | Provisioning table | Per-deployment | Tenant partition key. |
ForkId(NonZeroU32) | 4 | Fork table | Per-tenant | Fork partition key (per-tenant scope). |
IndividualId(NonZeroU64) | 8 | System-allocated counter | Per-(tenant, fork) | Dynamic individual identity. Replaces caller-provided UUIDs; external identifiers are data (a hasExternalId property), not identity. |
EventId(NonZeroU64) | 8 | Build-deterministic counter (compile-time); HLC-derived Snowflake (runtime) | Per-(tenant, fork) | Per-event identity. Layout: `[tx_seconds: 32 |
AxiomKey([u8; 16]) | 16 | BLAKE3-128 of canonical body | Per-(tenant, fork) | Logical proposition identity. Same proposition asserted twice has the same AxiomKey. |
InternalId(NonZeroU64) | 8 | Built at Module::load; thrown away at unload | Per-Module-load | Runtime-only hot-path id. Layout: `[kind: 8 |
ContentId([u8; 32]) | 32 | BLAKE3-256 of body | Content-derived | Cryptographic content hash. Replaces SHA-256. |
CompositionSignature([u8; 32]) | 32 | BLAKE3-256 of composition input | Content-derived | Workspace composition signature. Replaces SHA-256. |
Amendment (2026-06-11, issue #270 / PR #285). The
NonZeroU32 → INT4mapping above carries an invariant the original RFD left implicit in the Postgres encoder. Recorded here so call-site comments can cite it: aNameRef’s Postgres wire mapping isINT4with the high bit reserved — the valid band is[1, 2^31-1]. This is enforced by thesqlx::Encodeimpl (oxc-protocol/src/ids.rs), which signed-converts throughi32::try_fromand refuses any value with the high bit set rather than letting it wrap negative on the wire. The0slot stays theNonZero*niche sentinel; the top bit is held in reserve. Two consequences follow. First, any hash-derivedNameRef(theproperty_id_for_field/reflective_sort_name_ref/individual_id_from_namestand-ins, which fold a BLAKE3 prefix into an id pending the symbol-table lift) must mask into the band —& 0x7FFF_FFFF, zero-folded to1— or a coin-flip of field names would set the high bit and abort the mutation (PR #285). Second, folding a hash into 31 bits is not injective; PR #285 adds the load-time collision gate (OE0231) so two distinctType::fieldpairs that alias one id refuse loudly instead of silently sharing a storage column. The hash stand-ins are a bridge: the sequential interning table that derivesNameRefs from canonical symbol-table position (Phase 3 below) is the production follow-up tracked at #270, and it retires both the mask and the gate.
Two ways identity is derived
The 10 types split cleanly on identity-source:
Content-addressed (deterministic from source content; same source → same byte):
NameRef— derived from symbol-table position, which is derived from canonical-sorted qualified paths.AxiomKey— BLAKE3-128 of canonical body bytes.ContentId— BLAKE3-256 of body bytes.CompositionSignature— BLAKE3-256 of composition input.
Allocation-addressed (system-allocated counters, deterministic per-build):
TenantId,ForkId— provisioning tables (counters under operator control).EventId— per-event counter (deterministic in build mode; Snowflake in runtime).IndividualId— per-mutation counter (deterministic within a build pass).
Iri is a surface artifact; InternalId is a runtime-only artifact. Neither participates in wire identity.
BLAKE3 unification
AxiomKey (128 bits) and ContentId (256 bits) come from a single BLAKE3-256 invocation of the canonical body bytes. AxiomKey is the first 16 bytes; ContentId is the full 32 bytes. One hash invocation per event.
BLAKE3 replaces SHA-256 throughout because: ~3× faster, parallel-friendly, same cryptographic strength, smaller code size. The replacement is a one-time wire-format change tied to this RFD.
Per-lattice sentinels (engine-local, never on wire)
Within a single Module load, the reasoner builds InternalId-space per lattice (per metatype’s subsumption lattice, per standpoint lattice, per refinement lattice). Within each lattice, the reasoner reserves:
- The lattice’s
⊤(universal) at the lowest availableInternalIdfor that lattice’skind+partition. - The lattice’s
⊥(inconsistent) at the next one.
These are engine-local artifacts, recomputed at every Module::load, never written to the wire. The lattice’s actual top and bottom are declared concepts (e.g., the universal in a metatype’s subsumption lattice is a concept declared in stdlib like std::mlt::Class); the engine’s reservation is purely an evaluation-time optimization for short-circuit operations.
This differs from Kora/Nous, which reserve TOP=0 / BOTTOM=1 globally across all concept IDs — that’s an OWL-ism (a single owl:Thing for the whole ontology). Argon doesn’t have a single universal; each lattice has its own bounds, and they’re declared entities, not primitive IDs.
Type-distinct newtypes via the define_id! macro
Each ID type is a newtype with its own Display, Ord, Hash, Serialize, Deserialize, big-endian wire encoding, and NonZero* niche for Option<_>. Generated via a single define_id! macro (inspired by Nous’s pattern). Crossing roles requires explicit conversion — no accidental TenantId ↔ ForkId confusion at the type level.
Rationale
Why NameRef instead of UUIDv5(path)
The .oxbin format already mandates a symbol-table section (§D.5) with HDT-PFC-compressed canonical-sorted qualified paths. We’ve been routing around it by minting UUIDv5(path) instead of using the symbol-table position directly.
UUIDv5(path) gives:
- 16 bytes per reference
- Determinism (same path → same UUID)
- Cryptographic collision resistance
NameRef = symbol-table position gives:
- 4 bytes per reference
- Determinism (canonical sort order)
- Zero collision risk (by construction — positions are unique)
- Native: the dictionary the format already requires
The savings are 12 bytes per declarative reference. Across hundreds of bodies in a real workspace with hundreds of references each, this is substantial. And the property — same source → same NameRef — is preserved.
Why content-addressed AxiomKey at 128 bits, not 64
A 64-bit AxiomKey saves 8 bytes per event header — ~5% of the header. The cost: birthday-collision probability ~37% at 2^32 entries; necessitates collision-handling machinery (either deterministic re-salting or content_id fallback verification on every lookup).
128 bits is birthday-safe past 2^64 entries (effectively unbounded). Collision handling unnecessary. BLAKE3-128 is fast (free, given we compute BLAKE3-256 for ContentId anyway). The 8 bytes saved on the event header aren’t where the storage wins live — those are in declarative _id fields (16 → 4 = 12 bytes saved each, multiplied across every body) and Z-set tuples (17 → 9 bytes per ID, multiplied across millions of tuples).
The architectural rule: don’t compromise the wire format for bytes that aren’t on the hot path.
Why IndividualId is system-allocated, not caller-provided
Pattern: every database treats internal entity identity as surrogate, and external (caller-provided) identity as data. Postgres uses BIGSERIAL PRIMARY KEY + external_id TEXT UNIQUE. Datomic uses partition-encoded entids + :db/ident for natural keys.
Argon mutations like register(external_id: Text) should:
- Allocate a fresh
IndividualId(NonZeroU64)internally. - Emit
iof_assertion(IndividualId, Person). - Emit
hasExternalId(IndividualId, external_id)for the caller’s identifier.
The caller can query “find the Person where hasExternalId = ‘user_12345’” later. External identity is a property; internal identity is a surrogate. This is the pattern that scales and stays clean — and it lets IndividualId be 8 bytes (system-allocated) instead of 16 bytes (caller-provided UUID).
Why InternalId is runtime-only
InternalId layout [kind: 8 | partition: 16 | sequence: 40] is optimized for:
- Cache-friendly
Vec<u64>-indexed bitmaps (per-kind, per-partition). - Fibonacci-hashed U32-keyed sets in hot loops.
- Zero-cost type discrimination via the
kindbyte.
But this layout is engine-policy, not modeler-visible. We reserve the right to renumber on compaction, change partition functions, etc. Making InternalId part of the wire format would couple wire to engine — wrong direction. It stays runtime-only; Module::load builds a NameRef ↔ InternalId dictionary; the reasoner operates entirely in InternalId space.
Why no UUIDs
Five reasons:
-
We control allocation. UUIDs solve the “globally unique without coordination” problem. Argon’s identifiers are either declared (
NameReffrom canonical symbol position), system-allocated (EventId,IndividualId), or content-derived (AxiomKey,ContentId). No coordination problem exists. -
Type safety. All 21 identifier roles collapsing into one
Uuidtype is a regression. Newtypes per role give compile-time discrimination. -
Storage efficiency. Replacing UUIDs with the right-sized type per role saves 60-80% of identifier bytes across the system.
-
Wire format determinism. UUIDv4 is random; UUIDv7 has wall-clock time; UUIDv5 is one hash family. Custom types let us pick the determinism story per role (content-hash for AxiomKey, sort-position for NameRef, deterministic-counter for EventId in build mode).
-
No external compatibility need. Argon doesn’t need to interoperate with systems-that-mint-UUIDs at the identifier level. Federation happens at the Iri level (qualified paths), not at the binary-ID level.
Alternatives considered
Alt 1: Keep UUIDs everywhere
The status quo. Universal, well-understood. Costs: 145 bytes of identifiers per event header; 17-byte Z-set tuple elements; ~286 call sites with one type doing many jobs; full 16-byte width for partition keys with low-cardinality.
Rejected. The hot-path costs and lack of type discrimination outweigh the familiarity benefit.
Alt 2: UUIDv8 outer + 64-bit InternalId inner (the generic doc recommendation)
A two-tier system with UUIDv8 (RFC 9562, custom layout) as the durable external identifier and a packed 64-bit InternalId for hot paths.
Rejected. UUIDv8 in any recommended form embeds a timestamp (HLC), which breaks Argon’s byte-deterministic-build invariant. And the “external identifier” tier isn’t needed for Argon today — we don’t federate at the binary-ID level.
Alt 3: Snowflake-style 64-bit time-ordered IDs throughout
A single 64-bit ID type [time: 42 | shard: 10 | seq: 12] for everything.
Rejected. Conflates allocation-addressed (events) with content-addressed (declarative symbols). Concepts shouldn’t have time in their identity; events should. Single-type-for-everything is what we’re moving away from.
Alt 4: Nous’s FNV-1a 48-bit ConceptId
Hash the qualified path with FNV-1a, truncate to 48 bits, that’s the ID.
Rejected. Collides at 2^24 entries (~16M). Nous trusted IRIs not to collide; Argon shouldn’t. And it conflates declarative identity (paths) with run-of-the-mill IDs.
Alt 5: Kora’s per-engine ConceptIndex with HashMap<Iri, u32>
Each engine maintains its own Iri → u32 index, rebuilt per session.
Rejected as primary scheme (kept as inspiration for InternalId at Module::load time). Per-engine indexes don’t address the wire-format problem; they’re an in-memory representation.
Consequences
Wire format changes (major)
- Bump
oxbin_format_versionmajor (Phase 2 of rollout — see below). - All
_id: uuid::Uuidfields inoxc-protocol::storage::*Bodytypes become typed:NameRef,EventId,AxiomKey, etc. per their role. AxiomEventheader shrinks from 145 bytes of identifiers to 81 bytes (44% reduction)..oxbinfiles produced under the old format are not readable under the new format. We have no externally-deployed.oxbinfiles; this is acceptable.
Code changes
uuidcrate removed from workspace dependencies.blake3crate added (replacessha2).- New
oxc-idscrate (oroxc-protocol::idsmodule) carries the 10 types +define_id!macro. oxc-instantiate::identitybecomes theIri ↔ NameRef↔ symbol-table builder.oxc-runtime::Module::loadbuilds theNameRef ↔ InternalIddictionary.oxc-reasoning::compile::ValuebecomesValue::Internal(InternalId)plus inline variants.oxc-storage-memindexes re-keyed on(TenantId, ForkId, kind)instead of(Uuid, Uuid, &'static str).
Performance
Expected wins:
- Reasoner memory: ~30-50% reduction in Z-set tuple key storage (17 → 9 bytes per ID).
- Event headers: 44% smaller (145 → 81 bytes).
- Storage indexes: ~50% smaller (UUID → u32/u64 partition keys).
- Hash operations: BLAKE3 ~3× faster than SHA-256.
Costs:
Module::loadbuilds a dictionary (one pass over declared symbols; negligible).- Dropping UUID crate removes a well-tested dependency; replaced with custom types that need testing.
Determinism preserved
Every wire-format identifier is content-derived (NameRef from sort position; AxiomKey from BLAKE3 of body; ContentId from BLAKE3 of body; EventId from build-deterministic counter in build mode). Same source → byte-identical .oxbin. Property preserved.
Phased rollout
The full design can be landed in four independently-shippable phases:
Phase 1 — runtime InternalId, no wire change (~3 days). oxc-reasoning::compile::Value::Internal(InternalId) replaces Value::Id(Uuid). Module::load builds a Uuid ↔ InternalId dictionary. Wire format unchanged. Win: ~30% reduction in reasoner Z-set memory.
Phase 2 — wire format break (~1 week). Add oxc-protocol::ids with all 10 types. Replace every _id: uuid::Uuid in body types. Drop uuid crate, add blake3. Bump oxbin_format_version major. Win: 44% reduction in event header size; type-safe identifiers throughout the wire.
Phase 3 — full Iri interner + symbol-table lift (~3 days). Iri(Arc<str>) with per-workspace arena. The .oxbin symbol-table section becomes the load-bearing dictionary it was designed to be. Win: cleanup of qualified_path: String duplication.
Phase 4 — dense engine structures (~1 week). Fibonacci-hashed U32Set, KindBitmap (Vec<u64> indexed by InternalId), RoleEdges-style packed adjacency. Per-lattice sentinel discipline. Win: foundation for future reasoner backends (SLG, DBSP, Kripke) sharing dense set primitives.
Open questions
-
Cross-workspace federation: when two workspaces’
.oxbinartifacts need to interoperate, what’s the bridge? Open. Likely: an explicitGlobalReftype at the federation boundary only, derived from(workspace_uuid, NameRef)or from fullIri. Out of scope for this RFD. -
Distributed minting of EventId: the
shard: 12field is reserved but currently always 0. A future RFD addresses the distributed-minting protocol (Stateless Snowflake from container IP, range pre-allocation, or CRDT-style — see “Stateless Snowflake” Chinthareddy 2025). -
External identifier indexing: when
IndividualIdis system-allocated and the caller’s identifier is data (ahasExternalIdproperty), querying by external identifier requires a property-indexed lookup. The storage layer’s per-relation indexes (book §20.3.1) cover this, but specific query ergonomics for “find by external id” want a small SDK helper. -
AxiomKey for non-data axioms:
mutation_decl,query_decl,compute_declare declarative axioms (have stableNameRef-based identity). Should theirAxiomKeybeBLAKE3(NameRef)or a special discriminator? Likely the former for uniformity; settle when wire format is finalized.