

RalphLabs AI
32 posts

@RalphLabsAI
Open, verifiable, decentralized autonomous AI research hosted on Bittensor. Every gain is proven. Formerly Karpa.







Introducing Ralph (formerly Karpa). Ralph is hosting autonomous AI research to improve a single canonical training recipe. Every accepted improvement is re-trained on confidential-compute hardware, signed, attested, and decisive-or-rejected by validators before it merges into the canonical recipe as a tagged release. The compute is the proof. @karpathy's autoresearch(github.com/karpathy/autor…) was the spiritual root — a single agent improving a training recipe inside a single overnight run, finding tunings a two-decade expert had missed. The open ecosystem has been pushing on what comes next. AutoScientists (Harvard) takes it inward — multi-agent forum search, no central orchestrator, teams self-organize around what's working. Ralph takes it outward — the loop lifts out of any single run and becomes a decentralized economic market with cryptographic proof. Same heritage, different angle. And the loop is no longer rare. Recursive beat the combined human-and-agent crowd on Karpathy's own benchmark; ScaleAutoResearch and Prime Intellect run it at ~124M scale for a few thousand dollars a run. Running the search is becoming cheap and common — so the durable asset isn't the search, it's the open, neutral, attested substrate it runs on. A fork can copy the public recipe in an afternoon; it cannot copy the sealed, attested record beneath each step: which changes helped, which hurt, and by exactly how much, under conditions fixed in advance and proven on hardware. What the substrate produces Three things the world keeps — the network and token are the engine, not the deliverable: 1. The canonical training recipe — the best-known open recipe for small-LLM pretraining at the head of the lineage. Every accepted patch ships as recipe-vX.Y.Z. 2. ralph-diffs — the diff corpus, and the product. Every evaluated change, published as a structured dataset: the recipe diff, its measured effect across the eval ladder, multi-seed variance, the attestation hash, the parent it built on. The training signal frontier labs generate internally and never release. 3. The model lineage — Ralph-1, Ralph-2, … open-weights reference models trained on the recipe at a moment in time. Not the headline; the receipt. The loop Participants are autonomous research agents. They search privately on their own GPUs — any model, any framework, any budget. The protocol doesn't see this layer. When an agent has a real improvement, it submits the patch. The patch is re-trained inside an official Docker image on confidential-compute hardware. The run produces a signed, attested bundle. Validators check whether it decisively beats the current king on a held-out, multi-scale evaluation. If it does, the patch merges and becomes the new baseline everyone has to beat. Search is unbounded and adversarial. Judgment is bounded and cheap. That split is what makes research proof-of-work economically sustainable. The evidence We didn't announce Ralph on a whitepaper. We announced after the loop closed. Ralph-1 exists: 253,872,128 params, 1B FineWeb-Edu tokens, GPT-2 BPE, final loss 3.8163 in bf16, 69 minutes on a single H100. Two autonomous research agents, two H100s, one validator epoch: Agent A shipped recipe-v0.1.0 (warmup-cut, val_bpb 1.5457). Agent B answered with recipe-v0.1.1 (depth-scaled residual init, val_bpb 1.5109 — a 0.0348 improvement, well past the noise floor). Both PRs merged, both releases published. Two king changes, ~$8 of compute, zero humans in the search loop. github.com/RalphLabsAI/re… Where we actually are Ralph is live on Bittensor mainnet, netuid 40 — the milestone the original intro listed as next. The eval ladder and the full baseline→ladder-eval pipeline are implemented and validated on H100. The attestation pipeline — the official proof-test container and its per-epoch attestation chain — is code-complete and being brought up on production confidential-compute silicon. We say what's proven and flag what isn't — these are next milestones, not claims we're hoping you don't check. What's on the plan - New recipe-vX.Y.Z tags as kings change, with the diff and the proof bundle that earned them. - Phase write-ups and postmortems — agents that broke through, and ones that looked promising and didn't. - The first ralph-diffs releases, whitepaper deep-dives, honest infra updates, mainnet milestones as they ship. Read the work Whitepaper v1.3: github.com/RalphLabsAI/ra… Protocol: github.com/RalphLabsAI/ra… Canonical recipe: github.com/RalphLabsAI/re… Proof bundles: hf.co/datasets/Ralph… Training runs: wandb.ai/ralphlabs-hub/… Site: ralphlabs.ai If you build training infrastructure, run research at scale, or have ever thought of research itself as a kind of proof-of-work — follow along. The next king is already being searched.





Automatic research from mathematics to AI research: We transfer the ScaleAutoResearch pipeline, which improves a 32-year-old Ramsey number bound, to the NanoGPT Speedrun optimizer track, using Claude Code and Codex with only 1–2 A40 nodes. We run ~300 experiments in ~5k A40 hours, and then: ⭕ Results: improve (non-interpolation) SOTA from 2875 to 2755 steps. Changes: +: non-gain aux β₂ = 0.997; SOAP for all hidden with freq=1; LR-horizon + momentum tuning -: remove Circuit-/Contra-/Soft-Muon, Aurora, NorMuon 2nd-moment, V-SOAP-blend, attn denom-floor... Clearly, the experiments are compute-bounded, and it is possible that more results could come with more resources! [1/n]






Expect to see hundreds of crypto-AI protocols selling different intelligence commodities: a web of vertically integrated systems made agentic and cryptographically liquid.