Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD 0044 — Package registry, workspaces, and distribution

  • State: discussion
  • Depends on: RFD 0030 (package dependencies — path deps, the on-ramp this extends), RFD 0022 (package-path addressing + the build gate), RFD 0038 (prelude & ambient scope — the import model), RFD 0043 (theory packages — the first publishable package, packages/mlt), RFD 0013 (toolchain distribution — the CDN infra this reuses; it explicitly deferred the registry to here)
  • Blocks: a publishable/installable package ecosystem (milestone #11 “Package ecosystem”); packages/mlt distribution (the modeling team’s MLT vendor-vs-registry decision)
  • Tracking: epic #688; children #689–#703

Question

Argon has path dependencies (RFD 0030) but no workspace, no lockfile, no version resolution, and no package registry — every layer above local path deps is recognized-and-refused (OE1240) and deferred to “a later RFD.” How should Argon distribute packages? Concretely: what is the registry substrate (it must not be a GitHub repo), the workspace + lockfile + resolution model, and what does Argon’s nominal type system dictate about identity across package versions?

Context

Where we are. ox = the package orchestrator (cargo), oxc = the single-package compiler (rustc) — the committed frame (book §16), realizing Backpack-’17’s two-phase pipeline (ox computes a wiring diagram + composition signature; oxc instantiates per package). Path deps fold a dependency’s modules into the consumer’s workspace and embed into one .oxbin; the compiler front-end already resolves imports through resolved deps. version/edition parse but are inert; [workspace] does not exist; the ~/.argon/packages content-addressed cache and ox.lock are reserved, not built.

The prototype (orca-mvp) is prior art, not ground truth. Its registry was a GitHub repo (registry.json on a branch + GitHub Releases + the GitHub API), which we are replacing. It got real things right — a deterministic tarball, a bivalent hash (a BLAKE3 byte hash and a constructs Merkle root over per-declaration semantic signatures), a content-addressed cache, a lockfile, PubGrub — and real things wrong: two manifest parsers that disagreed, a compiler that never saw resolved deps (the deepest bug), two publish pipelines that produced different hashes, and one hardcoded GitHub repo with a mutable index under concurrent writers.

Argon is unusually well-positioned. It already owns the two most expensive ingredients of a modern registry — a content-addressed byte hash (content_hash) and a semantic Merkle root (constructs, which no surveyed system has) — plus the exact S3 + CloudFront topology cache.nixos.org runs in production (the infra oxup already uses). A five-system prior-art sweep (Unison, Nix, Go modules, Dhall, Sigstore/TUF | Cargo, JSR, PubGrub, pnpm) converges on one architecture, recorded below; the research lives at .local/research/package-system/DESIGN.md.

Decision

D1 — Concept identity is nominal/path; the resolved graph is single-version-per-package-name

Argon’s type identity is nominal, by qualified path — verified in both the substrate and the research. In code: a concept is identified by NameRef (its canonical-symbol-table position for a qualified path) and DefId = (file, start, name, kind, visibility), never by a hash of its structure; subtyping is nominal end-to-end (Lean TypeSystem/Subtyping.lean: “Subtyping of named types is nominal”; Rust oxc-check types_compatible = schema.concept_ancestors(a).contains(b)); the BLAKE3 content-ids that exist (ContentId/AxiomKey/CompositionSignature) are body/build fingerprints kept separate from symbol identity. The vault’s identity research is explicit: “a naive content hash would make every schema edit a new type — the opposite of what a nominal type system wants … Unison’s model fits structural identity; Argon is largely nominal.”

It follows that:

  1. Person@1 and Person@2 are the same type by path. A field addition or refinement edit does not mint a new type — it trips the drift fingerprint, not identity. Identity is the full qualified path: pkg::mod_a::Person and pkg::mod_b::Person are simply two distinct concepts (like two Error types in different Rust modules), no conflict.
  2. The resolved graph is single-version-per-package-name. A package name is one namespace root mapping to exactly one package (already enforced: OE1241 name = published [package].name; OE1243 a name cannot denote two package directories). This is the applicative shared-base model (track-F module research; D-77 shared immutable base appears once). This differs deliberately from Cargo, which permits multiple semver-incompatible versions to coexist via name-mangling: Argon must not, because two concepts at one path cannot both be “the” type. The resolver (D2) therefore resolves each package to exactly one version graph-wide, or fails loudly.

Cross-version compatibility is a drift question, not an identity question (see D4 / Open questions).

D2 — Version resolution is PubGrub over a SemVer VersionSet

Use pubgrub-rs (already proven in the prototype). PubGrub is generic over a VersionSet, so Argon can later define its own range algebra (the 4-axis versioning) without being locked to caret SemVer; its derivation-graph errors route into Argon’s OE-coded diagnostics (a named root cause + fix, fitting the loud-over-silent ethos). Resolution enforces the D1 single-version invariant: one version per package across the graph, or a loud refusal. Features/optional-deps are encoded as virtual packages from the start. (MVS was considered — see Alternatives.)

D3 — The registry is a static, content-addressed store over our own CDN — not a GitHub repo

Three layers over S3 + CloudFront (argon.sharpe-dev.com, the infra oxup uses), with no trusted live service on the read/integrity path:

  • Layer A — immutable content-addressed blob store + sparse index. The S3 object key is the BLAKE3 content_hash (blobs/<blake3>), served immutable/cache-forever; beside each blob a tiny signed metadata sidecar (Nix .narinfo shape: size, constructs root, dependency closure, provenance, signature). A Cargo-style sparse-index protocol (config.json + per-package append-only NDJSON version records, uniform hash-prefix sharding, mandatory ETag/If-None-Match). Yank is an append-only event, never in-place mutation — every object is write-once, which structurally removes the prototype’s concurrent-index contention. Publish source (.ar) as canonical with a per-file content manifest (JSR’s model); compile .oxbin on demand, cached by source-manifest hash. Nothing opaque is ever published.
  • Layer B — a transparency log of both hashes. A Go-sumdb / Certificate-Transparency-style append-only Merkle log of package@version → (content_hash, constructs_root), with signed tree heads and static tiles on the same CDN; ox verifies inclusion + consistency proofs and fails loudly. Logging the semantic constructs root next to the byte hash makes the log tamper-evident over meaning, auditable at per-declaration granularity via subset Merkle proofs — a property no surveyed registry has, costing ~nothing once the log exists.
  • Layer C — a thin TUF metadata cap. timestamp + snapshot for freshness and anti-rollback/freeze/mix-and-match (essential because CloudFront caches stale objects), over a threshold offline root + targets key for key-compromise survival and in-band rotation. This makes S3 + CloudFront fully untrusted transport; trust anchors in offline keys + the public log. Fulcio / keyless OIDC, delegated targets, and SLSA attestations are deferred until many external publishers exist.

Because the registry is just static files, a local directory or file:// is a conformant registry — which yields offline builds, air-gapped mirrors, CI fixtures, and a “local registry” for free, with no special-casing. The ~/.argon/packages content-addressed cache (D-78, fail-closed) is retained; ox vendor covers fully-pinned reproducible builds.

The alternative considered and rejected is an OCI registry (ECR): standard auth/mirroring, but heavier and a dependency we don’t need given we already own a CDN.

D4 — v1 is minimal-correct; trust hardening and the correctness-oracle edge are follow-ons

v1: one unified manifest with [workspace] virtual manifests + inheritance + a shared ox.lock; PubGrub resolution; a static content-addressed sparse-registry client with integrity verification; one authoritative deterministic publish builder + the ox package CLI; packages/mlt as the first published package. Follow-ons: the transparency log (Layer B), the TUF cap (Layer C), tokenless-OIDC + Sigstore/Rekor provenance, and the JSR-inspired publish-time correctness oracle — the registry runs ox check + the tier classifier + the drift gate at publish and publishes correctness metadata (decidability tier, CWA/OWA cleanliness, silent-accept count, provenance) as first-class, hard-weighted data. Argon’s trust-first posture turns the registry into a correctness oracle, not an opaque host.

Rationale

  • Nominal/path identity (D1) is forced, not chosen — it is what the substrate already implements and what a nominal-plus-refinement type system requires. The registry design conforms to the substrate, not the reverse. “No dependency hell” (Unison) does not vanish; it relocates into cross-version compatibility, which Argon answers with constructs drift rather than by silently re-identifying types.
  • Single-version-per-package-name is the only coherent rule when identity is the path: it is already enforced, it matches the applicative shared-base model, and it gives a stronger guarantee than Cargo’s name-mangling — appropriate for a KR language where vocabulary identity must be stable.
  • PubGrub (D2) is greenfield-appropriate (no legacy resolver to preserve bug-for-bug), already in hand, and its error quality + generic VersionSet are direct wins.
  • The static content-addressed CDN (D3) is the convergent state of the art (Nix’s binary cache, Go’s proxy + sumdb, Cargo’s sparse index, JSR’s static API) and reuses infra we own; the transparency log of the semantic root is where Argon’s existing constructs signature lets it exceed every prior art.
  • Source-published + compiled-on-demand keeps packages auditable (the Deno lesson: URL-as-identity was the mistake, the hash was the safety net) and avoids opaque binaries.

Alternatives

  • GitHub-repo-as-registry (the prototype, D-27). Rejected: mutable index under concurrent writers, no namespacing/mirroring, couples distribution to a VCS host. The static CA store subsumes its every use.
  • OCI registry (ECR). Rejected for v1 (see D3): heavier, an unneeded dependency.
  • MVS instead of PubGrub. Considered seriously — Go’s minimal version selection is deterministic, lock-free, and carries a genuine safety argument for a KR language (“a transitive release has no effect until you ask”). Rejected for v1 because PubGrub is already in hand, gives superior errors, and its generic VersionSet future-proofs Argon’s own range algebra; the MVS safety intuition is preserved by the single-version invariant + the publish-time compatibility gate.
  • Unison-style structural / content-addressed type identity. Rejected at the language level (D1): it fits structural identity, but Argon is nominal — a content hash would mint a new type on every edit.

Consequences

  • The OE1240 manifest refusal of registry/version dep-forms is replaced by real resolution; version/ edition become load-bearing; [workspace] lands; ox.lock lands (bivalent: content_hash + constructs root).
  • A new authoritative deterministic publish builder is the only artifact producer (the prototype’s two-pipeline divergence does not recur).
  • The registry infra reuses the oxup CDN/account; the toolchain CDN and the package registry remain distinct surfaces sharing transport.
  • packages/mlt becomes installable, unblocking the MLT vendor-vs-registry decision.
  • Two substrate prerequisites become correctness floors (Open questions): freezing the constructs canonicalization, and authoring the breaking-change taxonomy.

Open questions

  1. The breaking-change / compatibility taxonomy (#697). Cross-version compatibility rides constructs drift: additive (new pub decl, widened bound) = compatible; narrowing a refinement, removing/renaming a pub decl, a CWA→OWA flip = breaking. The vault scoped a Java-binary-compatibility-style ruleset but never authored it. Per the mechanize-soundness-first directive, the compatibility condition is a scratch-Lean candidate before implementation. This is the soundness-bearing piece, and it ties to the keystone disjointness work (#628) and the R1 CWA/OWA write-side ruling.
  2. constructs canonicalization freeze (#696) — landed. The semantic Merkle is specified, frozen, and versioned independently of the hash input (Dhall’s v6.0.0 lesson: the spec version is a constant, never folded into the hash), with cycle hashing pinned (Unison’s #x.n recipe). The canonicalization lives in oxc_protocol::constructs (the per-pub-declaration signature projection → BLAKE3 leaf → D-114-alphabetical Merkle root, with NAF clauses kept distinct from positive boundaries per the #697 oracle), wired to the build via oxc_workspace::constructs and recorded in the ox.lock constructs column. Vocabulary reconciled: the vault’s D-026 calls constructs “semantic identity”; functionally it is the drift fingerprintcontent_hash = byte fingerprint, constructs = semantic drift/compatibility fingerprint, nominal identity = the qualified path (no separate identity hash).
  3. Namespacing/scopes. JSR’s scoped names (@scope/pkg, admins-not-owners) kill squatting and fit internal teams. Whether to adopt scopes from v1 or start flat is open.
  4. Asymmetric publish tokens (Cargo PASETO v3.public) vs the deferred OIDC path for the internal bootstrap window.
  5. An Argon-native non-SemVer VersionSet over the 4-axis versioning (#703) — deferred until a concrete substrate need forces it.