Capability per joule.
Not capability per parameter.
JouleClaw is a pure-Rust AI runtime that dispatches every operation through an energy-tiered cascade. Inference is the last resort, not the entry point. Frozen weights are less trustworthy than fresh, provenance-stamped retrieval. Every operation accounts honest microjoules where the hardware permits it, and honest millijoules where it doesn’t.
The runtime rests on four load-bearing primitives — conservation, cone, oracle, elicitor — that turn the harness into the deterministic surface and leave the model fungible underneath.
The cascade
Five tiers. Cheapest first.
Every operation walks the tiers top to bottom and stops at the first one
that closes the query. A conforming runtime MUST NOT invoke a higher-cost
tier before the cheaper tiers have returned Unresolvable.
L1 (Lawful) is meaningful only for text and code modalities. For image, audio, video, and 3D generation the cascade collapses to L0 + L2 + L3 + L4 — there is no deterministic compute path that produces a high-quality image from “draw me a cat.”
Between the cache and the lawful primitive sits the SkillTier:
a recurring sub-task class that has been verified at a higher tier
compiles to a deterministic skill and resolves here forever after. The
cascade learns from itself; the model’s role shrinks with use.
Above the resolvers, the cascade carries a control plane: routing, the multi-step agent, reflection, tuner, supervisor, and a per-tenant governor with kill switch. The resolver walk closes queries; the control plane decides which queries the runtime accepts in the first place.
The triangle
Four primitives. One harness.
Underneath the cascade, the runtime rests on four primitives that bound the model on every side: information conservation, the cone of novelty, the closure-gate oracle, and the deterministic elicitor. Each ships as a standalone crate with a serde-byte-equal conformance contract. Random-walk AI features — retrieval, geospatial, world model, agent harness — land as one-crate compositions over these four.
Information Conservation Law as a typed claim: every bit in the output
must come from the spec, the requester-supplied delta, or a named
substrate. The reference DeterministicVerifier emits one of
Conforming | BelowSpec | Hallucination | MissingDelta.
Hallucination is given a precise definition the runtime can refuse on.
Typed classification space: ConeCoord { complexity, novelty }
with novelty bounded to [0, 1]. A NoveltyBand
dispatches each request to the cheapest tier whose preconditions hold
by novelty, not by heuristic. The novelty estimator is pluggable; the
reference is information-distance over signatures.
Closure-gate Verdict { Pass { chain } | Fail { reasons } }.
Distinct from the scoring primitives elsewhere in the workspace
(calibration, critic, verification-tier) — those produce scores;
the oracle produces a gate decision. CompositeOracle
chains backends with short-circuit-on-first-fail.
Deterministic structured intake. The reference SlotWalker
walks the requester through a template’s required slots until each
has a typed value, then emits an IntentDraft. The model
— if used at all — lives at the question, never at the
answer. The determinism floor at the front door.
Each primitive carries a conformance/vectors.json fixture
set + a verifier wired into cargo test. An implementation in
any language is conformant iff its output for every vector matches the
reference byte-for-byte under the documented serialization. The harness
becomes the deterministic surface; the model underneath is fungible.
Why energy?
Tokens per second compresses. Joules don’t.
Physics-informed compute, aligned for ecology. Speed is a moving target — every model generation runs faster than the last. Energy is conserved. The cascade is what you build when the bill comes in physical units: measure energy, navigate accordingly, and remove the friction the compute can’t argue with.
Global data-centre electricity grew 17% in 2025; AI-specifically grew 50%, with the IEA projecting 945 TWh by 2030. The Uptime Institute caps 2026 AI data-centre load at ~10 GW — not for lack of demand, but because grid and generation additions take 5–10 years.
The bill comes in physical units. The grid, not the model, is the binding constraint.
Joules-per-token is now proposed as a standard efficiency metric — peer of FLOPs and latency — in current benchmarking work (TokenPowerBench, arXiv 2512.03024). Frontier inference shows an 8–20× efficiency headroom invisible to systems that don’t measure, and reasoning queries cost ~13× a standard query.
You optimise toward what you measure. Measure joules; navigate accordingly.
FLOPS grew 50–60% per year for decades; memory bandwidth, ~23%. The memory wall isn’t metaphor — it’s where the joules go. Modern AI hardware is a documented mismatch to LLM decode (memory-bound, not compute-bound), and energy spent on memory accesses dominates total energy consumption.
The cascade is the architectural answer to the impedance mismatch — call it computation friction. Cheaper tiers short-circuit the friction the model would otherwise pay.
The L6 agent carries a SkillCompiler hook: each successful
sub-dispatch whose confidence clears a configurable floor increments a
per-template counter, and at threshold the inducer compiles the
resolution into a deterministic skill registered in the shared store.
A SkillTier at L0.5 shadows the model on every subsequent
matching query.
The confidence floor is load-bearing — low-confidence answers never advance the counter, so a fluky stochastic resolution cannot poison the deterministic surface.
The cascade isn’t ideology — it’s the design that drops out when you take the energy axis seriously and trust the gradient. Speed is a moving target. Joules are conserved.
Honest energy provenance
Every reading carries a Provenance tag.
The thermodynamic circuit breaker is only as honest as the worst counter in a request's span. JouleClaw refuses to claim accuracy a platform cannot deliver.
Real hardware shunt / coulomb counter. Intel and AMD x86 RAPL MSRs (~1 μJ resolution, ~10 ms window). NVIDIA discrete GPU cumulative-energy counter (~1 mJ). Jetson INA3221 i2c shunts (~10 mW).
Honest to the counter's resolution.
Vendor-provided estimate from frequency, voltage, utilisation. Apple Silicon IOReport — yes, including M5 Neural Accelerators. NVML power.draw on older GPUs. ROCm SMI on AMD discrete.
Millijoule-quantised model estimate, not a measurement. Sales talk that calls this microjoule precision gets corrected here.
JouleClaw static cost model from architecture × precision × batch tables. Used only on platforms with no usable hardware counter — consumer AMD GPU, ARM PMU.
Wide tolerance bands by construction; auditors discount accordingly.
The taxonomy is one half of the story; the other half is actual readings
from real silicon under named workloads. The
jouleclaw-energy/conformance/ directory carries the protocol
(PROTOCOL.md) and per-silicon fixture stubs (Intel/AMD RAPL,
NVIDIA NVML, AMD ROCm, Apple IoReport, Huawei Ascend). The fixtures ship
empty by design — numbers from blog posts would corrupt the
provenance contract the trait exists to enforce. A third party with the
hardware fills a fixture, signs it, and contributes it back.
The verifier asserts schema today; once a fixture fills, the same test also enforces idle ≤ lookup ≤ index ≤ 7B-token ≤ 70B-token monotonicity — misreporting can’t sneak through.
Verification layer
Energy is the front. Verification is the back.
Joules account for what compute cost. Verification accounts for what compute ran, what it ran inside, and what evidence backs each claim it made. Both land at the same primitive: a content-addressable artifact with explicit provenance. No artifact, no promotion.
RATS RFC 9334 + EAT RFC 9711 envelope. Ed25519 signing (RustCrypto). Intel TDX TD-Quote, AMD SEV-SNP report, in-toto SLSA Statement parsers — each lifts vendor bytes into typed measurements the verifier can appraise.
Concise Reference Integrity Manifest. CoMID reference triples in CBOR (IETF wire) or JSON (dev). Tracks the draft-ietf-rats-corim revision honestly via a pinned constant on every manifest.
OCI image digest + ORAS Distribution Spec v1.1 referrers as the content-addressed substrate. In-toto Statement + SLSA v1.1 Build (and draft Source / Dependencies / Verification) predicates — provenance, not just hashes.
RuntimeClaim + CapabilitySet (bwrap / gVisor / Kata / Firecracker / Wasmtime). The audit harness MEASURES the escape gap with a read-only EnvironmentProbe trait and a SandboxEscapeBench-shaped category taxonomy — no escape-proof claim is made, only honest contradictions surfaced.
First cross-tool Entity / Transform / TransformResult schema with PROV-O provenance for OSINT- and bio-KG-style fan-out. The TRX adapter speaks Maltego XML so JouleClaw transforms interoperate with the existing ecosystem.
FeatureActivation tuple + Neuronpedia identifier (Gemma Scope 2 / SAELens / TransformerLens alignment). Safetensors SAE header loader. LabelEvidence + VerifiedLabel so a feature description carries the evidence kind that justifies treating it as load-bearing.
Every honest narrowing — “no signature crypto in v1,” “no per-vendor TEE parsing,” “no Maltego XML wire form,” “no SAE weight loading” — closes through one of these crates. The wave-6 envelope shape stays; wave-7 fills in the bytes.
JCR-1 — JouleClaw Receipt Specification
A receipt the regulator can verify without our code.
An efficiency claim is worth nothing in a field drowning in unverifiable ones. JCR-1 turns “we are efficient” into “here is proof this exact output cost this many joules — verify it yourself.” Canonical det-CBOR signed core (RFC 8949 §4.2.1), COSE_Sign1 envelope (RFC 9052 / Ed25519), 10 canonical conformance vectors, three independent verifying implementations across Rust, Python, and Go — byte-equal on all ten. The spec is the source of truth, not the reference code.
Every reading carries a HwShunt | ModelBased | Estimator
tag. The receipt floors at the worst counter in the span; non-shunt
readings round to the 1 mJ resolution floor. A receipt cannot
claim accuracy a platform cannot deliver.
Rust mints (jouleclaw-jcr1, det-CBOR encoder + COSE_Sign1
+ 25 lib tests). Python verifies (cbor2 +
cryptography). Go verifies (stdlib only,
hand-written canonical encoder). Byte-equal across all three on every
vector — the “vector-conformant standard” milestone.
Understated joules → re-encode mismatch. Faked
HwShunt on an Estimator span → payload
mismatch (can’t forge without the steward key). Forged tool-
receipt HMAC → HMAC fail. Envelope bitflip → Ed25519 fail.
Each failure surfaces a different error.
Vector 5: an L3 model resolved a query for 2,500,000 µJ.
Vector 10: the same input_hash, after promotion, now
resolves at L1 for 9 µJ — a ~277,000× collapse.
Both signed by the same steward; both verifiable side-by-side. The
yang→yin learning curve made checkable in a single diff.
Specification text is CC-BY-4.0; the Rust reference implementation
(jouleclaw-jcr1) is Apache-2.0.
Browse the crate →
Run it yourself
Two binaries. No external dependencies.
Each example below runs end-to-end with zero API keys and zero network calls. Clone the workspace, run the command, read the output. The harness either works on your machine or it doesn’t.
A free-form question walks the cascade twice — first ask resolves at L3, second ask at L0. A templated request walks the elicitor, the cone, conservation, and the oracle to produce a verified candidate. The session closes with one signed JCR-1 receipt over total joules.
cargo run -p jouleclaw-stack \
--example full_cascade_with_triangle The pay-per-joule pitch made real. Cascade walk measures actual joules; a JCR-1 receipt signs the reading; an OpenPay settlement Transaction prices the joules via a documented function. A binding document links the three by a verifiable invariant. Three standards, three artifacts, one pipeline — composing at the wire layer.
cargo run -p jouleclaw-stack \
--example openpay_per_joule_billing
More starter cascades live alongside these in
jouleclaw-stack/examples/ — minimal cache-only, RAG with
retrieval, the comprehend verb (L0–L2 only), and a
per-intent ledger breakdown.
Reference implementation
Pure-Rust workspace. Apache-2.0.
The runtime, the cascade, the omni-modal generation tier, the harness, the MCP tool surface, the signed receipt emitter — plus the full L0–L10 tier surface, the triangle primitives, skill-compilation, durable promotion, energy-shaped approval gates, a branchable session tree, and the verification layer plus JCR-1 — a canonical, signed, third-party-verifiable receipt standard with three byte-equal reference implementations (Rust + Python + Go). Targets Apple Silicon M3/M4/M5, AMD Strix Halo / Ryzen AI Max, Intel Lunar Lake, NVIDIA Jetson, and discrete NVIDIA / AMD.
jouleclaw-core / -cascade / -runtime Foundation types, L0 cache + 7-axis coord router, chat/embed/generate/streaming with PLD + drafter spec decode.
jouleclaw-conservation / -cone / -oracle / -elicitor Information Conservation Law + typed novelty/complexity dispatch + closure-gate Verdict + deterministic structured intake. Each carries its own conformance vectors.
jouleclaw-liquid / -prism / -lmm CfC SSM (O(L) state), 1-bit/ternary BitNet, multimodal VLM on ternary backbone.
jouleclaw-omni Diffusion samplers (Stable Diffusion / SDXL / SD3 / Flux), MusicGen, Whisper, Gaussian3D, video, fusion.
jouleclaw-energy / -pack / -prov EnergyCounter trait, .jc.toml sidecar declared-cost contract, Smart-Byte-envelope receipts.
jouleclaw-fresh SearchProvider + TrustTable + provenance envelope. Brave / Tavily / Exa adapters plug in here.
jouleclaw-cli / -mcp (jclaw) Pi-class minimal harness. MCP tool surface with metered dispatch + joule-mcp CBOR profile (x-jouleclaw/joule-mcp@1).
Stewardship
The protocol is owned by no one.
JouleClaw is a Transaction Science open standard, alongside OpenPay, Smart Byte, EOC, WAI, ARL, Sandbox, and Map. The wire format and the right to fork are public. Transaction Science writes the reference implementation and runs the optional hosted services — the protocols themselves are owned by no one.
Code is Apache-2.0. Spec text is CC-BY-4.0. A standard is conformant if it round-trips the published wire vectors.