RFD 0044 — Package registry, workspaces, and distribution
- State: discussion
- Depends on: RFD 0030 (package dependencies — path deps, the on-ramp this extends), RFD 0022 (package-path addressing + the build gate), RFD 0038 (prelude & ambient scope — the import model), RFD 0043 (theory packages — the first publishable package,
packages/mlt), RFD 0013 (toolchain distribution — the CDN infra this reuses; it explicitly deferred the registry to here) - Blocks: a publishable/installable package ecosystem (milestone #11 “Package ecosystem”);
packages/mltdistribution (the modeling team’s MLT vendor-vs-registry decision) - Tracking: epic #688; children #689–#703
Question
Argon has path dependencies (RFD 0030) but no workspace, no lockfile, no version resolution, and no
package registry — every layer above local path deps is recognized-and-refused (OE1240) and deferred
to “a later RFD.” How should Argon distribute packages? Concretely: what is the registry substrate
(it must not be a GitHub repo), the workspace + lockfile + resolution model, and what does Argon’s
nominal type system dictate about identity across package versions?
Context
Where we are. ox = the package orchestrator (cargo), oxc = the single-package compiler
(rustc) — the committed frame (book §16), realizing Backpack-’17’s two-phase pipeline (ox computes a
wiring diagram + composition signature; oxc instantiates per package). Path deps fold a dependency’s
modules into the consumer’s workspace and embed into one .oxbin; the compiler front-end already
resolves imports through resolved deps. version/edition parse but are inert; [workspace] does not
exist; the ~/.argon/packages content-addressed cache and ox.lock are reserved, not built.
The prototype (orca-mvp) is prior art, not ground truth. Its registry was a GitHub repo
(registry.json on a branch + GitHub Releases + the GitHub API), which we are replacing. It got real
things right — a deterministic tarball, a bivalent hash (a BLAKE3 byte hash and a constructs
Merkle root over per-declaration semantic signatures), a content-addressed cache, a lockfile, PubGrub —
and real things wrong: two manifest parsers that disagreed, a compiler that never saw resolved deps
(the deepest bug), two publish pipelines that produced different hashes, and one hardcoded GitHub repo
with a mutable index under concurrent writers.
Argon is unusually well-positioned. It already owns the two most expensive ingredients of a modern
registry — a content-addressed byte hash (content_hash) and a semantic Merkle root (constructs,
which no surveyed system has) — plus the exact S3 + CloudFront topology cache.nixos.org runs in
production (the infra oxup already uses). A five-system prior-art sweep (Unison, Nix, Go modules,
Dhall, Sigstore/TUF | Cargo, JSR, PubGrub, pnpm) converges on one architecture, recorded below; the
research lives at .local/research/package-system/DESIGN.md.
Decision
D1 — Concept identity is nominal/path; the resolved graph is single-version-per-package-name
Argon’s type identity is nominal, by qualified path — verified in both the substrate and the
research. In code: a concept is identified by NameRef (its canonical-symbol-table position for a
qualified path) and DefId = (file, start, name, kind, visibility), never by a hash of its structure;
subtyping is nominal end-to-end (Lean TypeSystem/Subtyping.lean: “Subtyping of named types is
nominal”; Rust oxc-check types_compatible = schema.concept_ancestors(a).contains(b)); the BLAKE3
content-ids that exist (ContentId/AxiomKey/CompositionSignature) are body/build fingerprints kept
separate from symbol identity. The vault’s identity research is explicit: “a naive content hash would
make every schema edit a new type — the opposite of what a nominal type system wants … Unison’s model
fits structural identity; Argon is largely nominal.”
It follows that:
Person@1andPerson@2are the same type by path. A field addition or refinement edit does not mint a new type — it trips the drift fingerprint, not identity. Identity is the full qualified path:pkg::mod_a::Personandpkg::mod_b::Personare simply two distinct concepts (like twoErrortypes in different Rust modules), no conflict.- The resolved graph is single-version-per-package-name. A package name is one namespace root
mapping to exactly one package (already enforced:
OE1241name = published[package].name;OE1243a name cannot denote two package directories). This is the applicative shared-base model (track-F module research; D-77 shared immutable base appears once). This differs deliberately from Cargo, which permits multiple semver-incompatible versions to coexist via name-mangling: Argon must not, because two concepts at one path cannot both be “the” type. The resolver (D2) therefore resolves each package to exactly one version graph-wide, or fails loudly.
Cross-version compatibility is a drift question, not an identity question (see D4 / Open questions).
D2 — Version resolution is PubGrub over a SemVer VersionSet
Use pubgrub-rs (already proven in the prototype). PubGrub is generic over a VersionSet, so Argon can
later define its own range algebra (the 4-axis versioning) without being locked to caret SemVer; its
derivation-graph errors route into Argon’s OE-coded diagnostics (a named root cause + fix, fitting
the loud-over-silent ethos). Resolution enforces the D1 single-version invariant: one version per
package across the graph, or a loud refusal. Features/optional-deps are encoded as virtual packages from
the start. (MVS was considered — see Alternatives.)
D3 — The registry is a static, content-addressed store over our own CDN — not a GitHub repo
Three layers over S3 + CloudFront (argon.sharpe-dev.com, the infra oxup uses), with no trusted
live service on the read/integrity path:
- Layer A — immutable content-addressed blob store + sparse index. The S3 object key is the BLAKE3
content_hash(blobs/<blake3>), served immutable/cache-forever; beside each blob a tiny signed metadata sidecar (Nix.narinfoshape: size,constructsroot, dependency closure, provenance, signature). A Cargo-style sparse-index protocol (config.json+ per-package append-only NDJSON version records, uniform hash-prefix sharding, mandatoryETag/If-None-Match). Yank is an append-only event, never in-place mutation — every object is write-once, which structurally removes the prototype’s concurrent-index contention. Publish source (.ar) as canonical with a per-file content manifest (JSR’s model); compile.oxbinon demand, cached by source-manifest hash. Nothing opaque is ever published. - Layer B — a transparency log of both hashes. A Go-sumdb / Certificate-Transparency-style
append-only Merkle log of
package@version → (content_hash, constructs_root), with signed tree heads and static tiles on the same CDN;oxverifies inclusion + consistency proofs and fails loudly. Logging the semanticconstructsroot next to the byte hash makes the log tamper-evident over meaning, auditable at per-declaration granularity via subset Merkle proofs — a property no surveyed registry has, costing ~nothing once the log exists. - Layer C — a thin TUF metadata cap.
timestamp+snapshotfor freshness and anti-rollback/freeze/mix-and-match (essential because CloudFront caches stale objects), over a threshold offline root + targets key for key-compromise survival and in-band rotation. This makes S3 + CloudFront fully untrusted transport; trust anchors in offline keys + the public log. Fulcio / keyless OIDC, delegated targets, and SLSA attestations are deferred until many external publishers exist.
Because the registry is just static files, a local directory or file:// is a conformant registry —
which yields offline builds, air-gapped mirrors, CI fixtures, and a “local registry” for free, with no
special-casing. The ~/.argon/packages content-addressed cache (D-78, fail-closed) is retained; ox vendor covers fully-pinned reproducible builds.
The alternative considered and rejected is an OCI registry (ECR): standard auth/mirroring, but heavier and a dependency we don’t need given we already own a CDN.
D4 — v1 is minimal-correct; trust hardening and the correctness-oracle edge are follow-ons
v1: one unified manifest with [workspace] virtual manifests + inheritance + a shared ox.lock;
PubGrub resolution; a static content-addressed sparse-registry client with integrity verification; one
authoritative deterministic publish builder + the ox package CLI; packages/mlt as the first
published package. Follow-ons: the transparency log (Layer B), the TUF cap (Layer C), tokenless-OIDC +
Sigstore/Rekor provenance, and the JSR-inspired publish-time correctness oracle — the registry runs
ox check + the tier classifier + the drift gate at publish and publishes correctness metadata
(decidability tier, CWA/OWA cleanliness, silent-accept count, provenance) as first-class, hard-weighted
data. Argon’s trust-first posture turns the registry into a correctness oracle, not an opaque host.
Rationale
- Nominal/path identity (D1) is forced, not chosen — it is what the substrate already implements and
what a nominal-plus-refinement type system requires. The registry design conforms to the substrate, not
the reverse. “No dependency hell” (Unison) does not vanish; it relocates into cross-version
compatibility, which Argon answers with
constructsdrift rather than by silently re-identifying types. - Single-version-per-package-name is the only coherent rule when identity is the path: it is already enforced, it matches the applicative shared-base model, and it gives a stronger guarantee than Cargo’s name-mangling — appropriate for a KR language where vocabulary identity must be stable.
- PubGrub (D2) is greenfield-appropriate (no legacy resolver to preserve bug-for-bug), already in
hand, and its error quality + generic
VersionSetare direct wins. - The static content-addressed CDN (D3) is the convergent state of the art (Nix’s binary cache, Go’s
proxy + sumdb, Cargo’s sparse index, JSR’s static API) and reuses infra we own; the transparency log of
the semantic root is where Argon’s existing
constructssignature lets it exceed every prior art. - Source-published + compiled-on-demand keeps packages auditable (the Deno lesson: URL-as-identity was the mistake, the hash was the safety net) and avoids opaque binaries.
Alternatives
- GitHub-repo-as-registry (the prototype, D-27). Rejected: mutable index under concurrent writers, no namespacing/mirroring, couples distribution to a VCS host. The static CA store subsumes its every use.
- OCI registry (ECR). Rejected for v1 (see D3): heavier, an unneeded dependency.
- MVS instead of PubGrub. Considered seriously — Go’s minimal version selection is deterministic,
lock-free, and carries a genuine safety argument for a KR language (“a transitive release has no
effect until you ask”). Rejected for v1 because PubGrub is already in hand, gives superior errors, and
its generic
VersionSetfuture-proofs Argon’s own range algebra; the MVS safety intuition is preserved by the single-version invariant + the publish-time compatibility gate. - Unison-style structural / content-addressed type identity. Rejected at the language level (D1): it fits structural identity, but Argon is nominal — a content hash would mint a new type on every edit.
Consequences
- The
OE1240manifest refusal of registry/version dep-forms is replaced by real resolution;version/editionbecome load-bearing;[workspace]lands;ox.locklands (bivalent:content_hash+constructsroot). - A new authoritative deterministic publish builder is the only artifact producer (the prototype’s two-pipeline divergence does not recur).
- The registry infra reuses the
oxupCDN/account; the toolchain CDN and the package registry remain distinct surfaces sharing transport. packages/mltbecomes installable, unblocking the MLT vendor-vs-registry decision.- Two substrate prerequisites become correctness floors (Open questions): freezing the
constructscanonicalization, and authoring the breaking-change taxonomy.
Open questions
- The breaking-change / compatibility taxonomy (#697). Cross-version compatibility rides
constructsdrift: additive (newpubdecl, widened bound) = compatible; narrowing a refinement, removing/renaming apubdecl, a CWA→OWA flip = breaking. The vault scoped a Java-binary-compatibility-style ruleset but never authored it. Per the mechanize-soundness-first directive, the compatibility condition is a scratch-Lean candidate before implementation. This is the soundness-bearing piece, and it ties to the keystone disjointness work (#628) and the R1 CWA/OWA write-side ruling. constructscanonicalization freeze (#696) — landed. The semantic Merkle is specified, frozen, and versioned independently of the hash input (Dhall’s v6.0.0 lesson: the spec version is a constant, never folded into the hash), with cycle hashing pinned (Unison’s#x.nrecipe). The canonicalization lives inoxc_protocol::constructs(the per-pub-declaration signature projection → BLAKE3 leaf → D-114-alphabetical Merkle root, with NAF clauses kept distinct from positive boundaries per the #697 oracle), wired to the build viaoxc_workspace::constructsand recorded in theox.lockconstructscolumn. Vocabulary reconciled: the vault’s D-026 callsconstructs“semantic identity”; functionally it is the drift fingerprint —content_hash= byte fingerprint,constructs= semantic drift/compatibility fingerprint, nominal identity = the qualified path (no separate identity hash).- Namespacing/scopes. JSR’s scoped names (
@scope/pkg, admins-not-owners) kill squatting and fit internal teams. Whether to adopt scopes from v1 or start flat is open. - Asymmetric publish tokens (Cargo PASETO
v3.public) vs the deferred OIDC path for the internal bootstrap window. - An Argon-native non-SemVer
VersionSetover the 4-axis versioning (#703) — deferred until a concrete substrate need forces it.