Skip to main content
A Transaction Science Open Standard

Capability per joule.
Not capability per parameter.

JouleClaw is a pure-Rust AI runtime that dispatches every operation through an energy-tiered cascade. Inference is the last resort, not the entry point. Frozen weights are less trustworthy than fresh, provenance-stamped retrieval. Every operation accounts honest microjoules where the hardware permits it, and honest millijoules where it doesn’t.

The runtime rests on four load-bearing primitives — conservation, cone, oracle, elicitor — that turn the harness into the deterministic surface and leave the model fungible underneath.

The cascade

The cascade

Five tiers. Cheapest first.

Every operation walks the tiers top to bottom and stops at the first one that closes the query. A conforming runtime MUST NOT invoke a higher-cost tier before the cheaper tiers have returned Unresolvable.

L0
Cache
picojoules
Content-addressed cache hit. The hottest path: a query that's been resolved before returns its prior answer for the cost of a hash lookup.
L1
Lawful
nanojoules
A deterministic primitive answers the question. Arithmetic, unit conversions, regex extraction, finite-state walks — compute that does not need a model.
L2
Embed
sub-millijoules
Matryoshka embeddings + hybrid (BM25 + ANN) retrieval against a local corpus. The cheapest path that actually looks at content.
L3
Model
joules
Local model — SSM, ternary BitNet, multimodal VLM, diffusion. Inference is the last resort, not the entry point. Picked only when nothing cheaper resolves the query.
L4
Wire
tens of joules
Remote frontier RPC. Explicit escape hatch — only fired when the local cascade returns Unresolvable and the operator's deployment permits.

L1 (Lawful) is meaningful only for text and code modalities. For image, audio, video, and 3D generation the cascade collapses to L0 + L2 + L3 + L4 — there is no deterministic compute path that produces a high-quality image from “draw me a cat.”

L0.5 — compiled skills

Between the cache and the lawful primitive sits the SkillTier: a recurring sub-task class that has been verified at a higher tier compiles to a deterministic skill and resolves here forever after. The cascade learns from itself; the model’s role shrinks with use.

L5–L10 — control plane

Above the resolvers, the cascade carries a control plane: routing, the multi-step agent, reflection, tuner, supervisor, and a per-tenant governor with kill switch. The resolver walk closes queries; the control plane decides which queries the runtime accepts in the first place.

The triangle

Four primitives. One harness.

Underneath the cascade, the runtime rests on four primitives that bound the model on every side: information conservation, the cone of novelty, the closure-gate oracle, and the deterministic elicitor. Each ships as a standalone crate with a serde-byte-equal conformance contract. Random-walk AI features — retrieval, geospatial, world model, agent harness — land as one-crate compositions over these four.

Conservation
jouleclaw-conservation

Information Conservation Law as a typed claim: every bit in the output must come from the spec, the requester-supplied delta, or a named substrate. The reference DeterministicVerifier emits one of Conforming | BelowSpec | Hallucination | MissingDelta. Hallucination is given a precise definition the runtime can refuse on.

Cone
jouleclaw-cone

Typed classification space: ConeCoord { complexity, novelty } with novelty bounded to [0, 1]. A NoveltyBand dispatches each request to the cheapest tier whose preconditions hold by novelty, not by heuristic. The novelty estimator is pluggable; the reference is information-distance over signatures.

Oracle
jouleclaw-oracle

Closure-gate Verdict { Pass { chain } | Fail { reasons } }. Distinct from the scoring primitives elsewhere in the workspace (calibration, critic, verification-tier) — those produce scores; the oracle produces a gate decision. CompositeOracle chains backends with short-circuit-on-first-fail.

Elicitor
jouleclaw-elicitor

Deterministic structured intake. The reference SlotWalker walks the requester through a template’s required slots until each has a typed value, then emits an IntentDraft. The model — if used at all — lives at the question, never at the answer. The determinism floor at the front door.

Each primitive carries a conformance/vectors.json fixture set + a verifier wired into cargo test. An implementation in any language is conformant iff its output for every vector matches the reference byte-for-byte under the documented serialization. The harness becomes the deterministic surface; the model underneath is fungible.

Why energy?

Tokens per second compresses. Joules don’t.

Physics-informed compute, aligned for ecology. Speed is a moving target — every model generation runs faster than the last. Energy is conserved. The cascade is what you build when the bill comes in physical units: measure energy, navigate accordingly, and remove the friction the compute can’t argue with.

The grid is already binding

Global data-centre electricity grew 17% in 2025; AI-specifically grew 50%, with the IEA projecting 945 TWh by 2030. The Uptime Institute caps 2026 AI data-centre load at ~10 GW — not for lack of demand, but because grid and generation additions take 5–10 years.

The bill comes in physical units. The grid, not the model, is the binding constraint.

Energy is the optimisation gradient

Joules-per-token is now proposed as a standard efficiency metric — peer of FLOPs and latency — in current benchmarking work (TokenPowerBench, arXiv 2512.03024). Frontier inference shows an 8–20× efficiency headroom invisible to systems that don’t measure, and reasoning queries cost ~13× a standard query.

You optimise toward what you measure. Measure joules; navigate accordingly.

Computation friction is physical

FLOPS grew 50–60% per year for decades; memory bandwidth, ~23%. The memory wall isn’t metaphor — it’s where the joules go. Modern AI hardware is a documented mismatch to LLM decode (memory-bound, not compute-bound), and energy spent on memory accesses dominates total energy consumption.

The cascade is the architectural answer to the impedance mismatch — call it computation friction. Cheaper tiers short-circuit the friction the model would otherwise pay.

Compile-once, resolve-forever

The L6 agent carries a SkillCompiler hook: each successful sub-dispatch whose confidence clears a configurable floor increments a per-template counter, and at threshold the inducer compiles the resolution into a deterministic skill registered in the shared store. A SkillTier at L0.5 shadows the model on every subsequent matching query.

The confidence floor is load-bearing — low-confidence answers never advance the counter, so a fluky stochastic resolution cannot poison the deterministic surface.

The cascade isn’t ideology — it’s the design that drops out when you take the energy axis seriously and trust the gradient. Speed is a moving target. Joules are conserved.

Honest energy provenance

Every reading carries a Provenance tag.

The thermodynamic circuit breaker is only as honest as the worst counter in a request's span. JouleClaw refuses to claim accuracy a platform cannot deliver.

HwShunt

Real hardware shunt / coulomb counter. Intel and AMD x86 RAPL MSRs (~1 μJ resolution, ~10 ms window). NVIDIA discrete GPU cumulative-energy counter (~1 mJ). Jetson INA3221 i2c shunts (~10 mW).

Honest to the counter's resolution.

ModelBased

Vendor-provided estimate from frequency, voltage, utilisation. Apple Silicon IOReport — yes, including M5 Neural Accelerators. NVML power.draw on older GPUs. ROCm SMI on AMD discrete.

Millijoule-quantised model estimate, not a measurement. Sales talk that calls this microjoule precision gets corrected here.

Estimator

JouleClaw static cost model from architecture × precision × batch tables. Used only on platforms with no usable hardware counter — consumer AMD GPU, ARM PMU.

Wide tolerance bands by construction; auditors discount accordingly.

Vendor-energy measurement protocol

The taxonomy is one half of the story; the other half is actual readings from real silicon under named workloads. The jouleclaw-energy/conformance/ directory carries the protocol (PROTOCOL.md) and per-silicon fixture stubs (Intel/AMD RAPL, NVIDIA NVML, AMD ROCm, Apple IoReport, Huawei Ascend). The fixtures ship empty by design — numbers from blog posts would corrupt the provenance contract the trait exists to enforce. A third party with the hardware fills a fixture, signs it, and contributes it back.

The verifier asserts schema today; once a fixture fills, the same test also enforces idle ≤ lookup ≤ index ≤ 7B-token ≤ 70B-token monotonicity — misreporting can’t sneak through.

Verification layer

Energy is the front. Verification is the back.

Joules account for what compute cost. Verification accounts for what compute ran, what it ran inside, and what evidence backs each claim it made. Both land at the same primitive: a content-addressable artifact with explicit provenance. No artifact, no promotion.

Attestation
jouleclaw-attest / -attest-sig / -attest-vendor

RATS RFC 9334 + EAT RFC 9711 envelope. Ed25519 signing (RustCrypto). Intel TDX TD-Quote, AMD SEV-SNP report, in-toto SLSA Statement parsers — each lifts vendor bytes into typed measurements the verifier can appraise.

Reference values
jouleclaw-corim (IETF draft)

Concise Reference Integrity Manifest. CoMID reference triples in CBOR (IETF wire) or JSON (dev). Tracks the draft-ietf-rats-corim revision honestly via a pinned constant on every manifest.

Supply chain
jouleclaw-oci / -slsa

OCI image digest + ORAS Distribution Spec v1.1 referrers as the content-addressed substrate. In-toto Statement + SLSA v1.1 Build (and draft Source / Dependencies / Verification) predicates — provenance, not just hashes.

Sandbox teeth
jouleclaw-sandbox / -sandbox-audit

RuntimeClaim + CapabilitySet (bwrap / gVisor / Kata / Firecracker / Wasmtime). The audit harness MEASURES the escape gap with a read-only EnvironmentProbe trait and a SandboxEscapeBench-shaped category taxonomy — no escape-proof claim is made, only honest contradictions surfaced.

Transform graph
jouleclaw-transform / -transform-trx

First cross-tool Entity / Transform / TransformResult schema with PROV-O provenance for OSINT- and bio-KG-style fan-out. The TRX adapter speaks Maltego XML so JouleClaw transforms interoperate with the existing ecosystem.

Mech-interp
jouleclaw-mechinterp / -mechinterp-loader

FeatureActivation tuple + Neuronpedia identifier (Gemma Scope 2 / SAELens / TransformerLens alignment). Safetensors SAE header loader. LabelEvidence + VerifiedLabel so a feature description carries the evidence kind that justifies treating it as load-bearing.

Every honest narrowing — “no signature crypto in v1,” “no per-vendor TEE parsing,” “no Maltego XML wire form,” “no SAE weight loading” — closes through one of these crates. The wave-6 envelope shape stays; wave-7 fills in the bytes.

JCR-1 — JouleClaw Receipt Specification

A receipt the regulator can verify without our code.

An efficiency claim is worth nothing in a field drowning in unverifiable ones. JCR-1 turns “we are efficient” into “here is proof this exact output cost this many joules — verify it yourself.” Canonical det-CBOR signed core (RFC 8949 §4.2.1), COSE_Sign1 envelope (RFC 9052 / Ed25519), 10 canonical conformance vectors, three independent verifying implementations across Rust, Python, and Go — byte-equal on all ten. The spec is the source of truth, not the reference code.

Honesty contract

Every reading carries a HwShunt | ModelBased | Estimator tag. The receipt floors at the worst counter in the span; non-shunt readings round to the 1 mJ resolution floor. A receipt cannot claim accuracy a platform cannot deliver.

Three independent implementations

Rust mints (jouleclaw-jcr1, det-CBOR encoder + COSE_Sign1 + 25 lib tests). Python verifies (cbor2 + cryptography). Go verifies (stdlib only, hand-written canonical encoder). Byte-equal across all three on every vector — the “vector-conformant standard” milestone.

Tamper-evident, four ways

Understated joules → re-encode mismatch. Faked HwShunt on an Estimator span → payload mismatch (can’t forge without the steward key). Forged tool- receipt HMAC → HMAC fail. Envelope bitflip → Ed25519 fail. Each failure surfaces a different error.

The promotion-collapse demo, signed

Vector 5: an L3 model resolved a query for 2,500,000 µJ. Vector 10: the same input_hash, after promotion, now resolves at L1 for 9 µJ — a ~277,000× collapse. Both signed by the same steward; both verifiable side-by-side. The yang→yin learning curve made checkable in a single diff.

Specification text is CC-BY-4.0; the Rust reference implementation (jouleclaw-jcr1) is Apache-2.0. Browse the crate →

Run it yourself

Two binaries. No external dependencies.

Each example below runs end-to-end with zero API keys and zero network calls. Clone the workspace, run the command, read the output. The harness either works on your machine or it doesn’t.

Cascade + triangle + JCR-1

A free-form question walks the cascade twice — first ask resolves at L3, second ask at L0. A templated request walks the elicitor, the cone, conservation, and the oracle to produce a verified candidate. The session closes with one signed JCR-1 receipt over total joules.

cargo run -p jouleclaw-stack \
  --example full_cascade_with_triangle
JouleClaw → JCR-1 → OpenPay

The pay-per-joule pitch made real. Cascade walk measures actual joules; a JCR-1 receipt signs the reading; an OpenPay settlement Transaction prices the joules via a documented function. A binding document links the three by a verifiable invariant. Three standards, three artifacts, one pipeline — composing at the wire layer.

cargo run -p jouleclaw-stack \
  --example openpay_per_joule_billing

More starter cascades live alongside these in jouleclaw-stack/examples/ — minimal cache-only, RAG with retrieval, the comprehend verb (L0–L2 only), and a per-intent ledger breakdown.

Reference implementation

Pure-Rust workspace. Apache-2.0.

The runtime, the cascade, the omni-modal generation tier, the harness, the MCP tool surface, the signed receipt emitter — plus the full L0–L10 tier surface, the triangle primitives, skill-compilation, durable promotion, energy-shaped approval gates, a branchable session tree, and the verification layer plus JCR-1 — a canonical, signed, third-party-verifiable receipt standard with three byte-equal reference implementations (Rust + Python + Go). Targets Apple Silicon M3/M4/M5, AMD Strix Halo / Ryzen AI Max, Intel Lunar Lake, NVIDIA Jetson, and discrete NVIDIA / AMD.

Cascade + runtime
jouleclaw-core / -cascade / -runtime

Foundation types, L0 cache + 7-axis coord router, chat/embed/generate/streaming with PLD + drafter spec decode.

Triangle primitives
jouleclaw-conservation / -cone / -oracle / -elicitor

Information Conservation Law + typed novelty/complexity dispatch + closure-gate Verdict + deterministic structured intake. Each carries its own conformance vectors.

Model tiers (L3)
jouleclaw-liquid / -prism / -lmm

CfC SSM (O(L) state), 1-bit/ternary BitNet, multimodal VLM on ternary backbone.

Omni-modal generation
jouleclaw-omni

Diffusion samplers (Stable Diffusion / SDXL / SD3 / Flux), MusicGen, Whisper, Gaussian3D, video, fusion.

Energy + provenance
jouleclaw-energy / -pack / -prov

EnergyCounter trait, .jc.toml sidecar declared-cost contract, Smart-Byte-envelope receipts.

Search-first retrieval
jouleclaw-fresh

SearchProvider + TrustTable + provenance envelope. Brave / Tavily / Exa adapters plug in here.

Harness + MCP
jouleclaw-cli / -mcp (jclaw)

Pi-class minimal harness. MCP tool surface with metered dispatch + joule-mcp CBOR profile (x-jouleclaw/joule-mcp@1).

Browse the workspace →

Stewardship

The protocol is owned by no one.

JouleClaw is a Transaction Science open standard, alongside OpenPay, Smart Byte, EOC, WAI, ARL, Sandbox, and Map. The wire format and the right to fork are public. Transaction Science writes the reference implementation and runs the optional hosted services — the protocols themselves are owned by no one.

Code is Apache-2.0. Spec text is CC-BY-4.0. A standard is conformant if it round-trips the published wire vectors.