Austin Baggio

351 posts

Austin Baggio banner
Austin Baggio

Austin Baggio

@AustinBaggio

Co-founder @ensue_ai Building shared memory for AI agents.

Toronto, Ontario Katılım Ekim 2011
436 Takip Edilen702 Takipçiler
Austin Baggio
Austin Baggio@AustinBaggio·
@zongheng_yang @karpathy We saw the same result running multiple agents in parallel, and we did it with a swarm controlled by researchers around the world
English
0
0
0
165
Zongheng Yang
Zongheng Yang@zongheng_yang·
Karpathy @karpathy is the GOAT of simple but viral projects. Autoresearch is his latest: an agent running guided search in a loop. What if the agent is scaled up? We give an Autoresearch agent 16x H100 + H200s and let it rip. Result: 9x faster time-to-quality & better quality. Most important — all GPU cluster shenanigans are automated by a skill. The skill teaches the agent to launch parallel GPU jobs via SkyPilot. This means to get SOTA model results, 𝗮𝗹𝗹 𝗮 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿 𝗵𝗮𝘀 𝘁𝗼 𝗱𝗼 𝗶𝘀 𝗮𝗻 𝗘𝗻𝗴𝗹𝗶𝘀𝗵 𝗽𝗿𝗼𝗺𝗽𝘁 + 𝗮 𝗸𝘂𝗯𝗲𝗰𝗼𝗻𝗳𝗶𝗴 𝗳𝗶𝗹𝗲.
Zongheng Yang tweet media
English
8
6
75
7.7K
David Cortés
David Cortés@davebcn87·
Today we added statistical confidence on pi-autoresearch. Now the LLM gets a confidence metric that will guide it to re-run the experiment if there was noise detected during the measurement. Thanks @0xRkyd for bringing the idea.
David Cortés tweet media
English
4
4
190
55K
Austin Baggio
Austin Baggio@AustinBaggio·
@francescpicc Certainly. Increasing the cycles lowers valbpb, there’s a discussion going on discord right now about if/what to do to try this
English
0
0
1
14
Austin Baggio
Austin Baggio@AustinBaggio·
The swarm keeps finding improvements beyond what any single contributor could have alone
Christine Yip@christinetyip

Almost a week after launch, autoresearch@home has run 3,000+ experiments. Hyperparameter tuning started to plateau, but the swarm didn’t. The community pushed things forward: • @Mikeapedia1 adapted training to leverage FlashAttention 4 on a B200, sharing a report after 150+ experiments • Node is exploring RL fine-tuning based on the test time discovery paper using the thousands of experiments generated so far (looking for compute) • @bartdecrem built an extension to bring Mac minis into the network, looking for testers This is what happens when experiments don’t live in isolation. They compound. Check out their work. 👇🧵

English
1
1
18
1.3K
Austin Baggio
Austin Baggio@AustinBaggio·
@tobi want to 10x the number of runs bringing in a lot more people and their GOUs? Im in Toronto btw
Christine Yip@christinetyip

Jensen at @nvidia GTC: “Every company needs an agentic strategy.” Couldn't agree more. Example: @tobi from @Shopify casually got a 53% speedup running autoresearch on the Liquid codebase on his own machine: x.com/tobi/status/20… Now imagine if: • engineers run agents when their machines are idle • experiments share results across teams • improvements compound automatically That’s the kind of collective intelligence infrastructure we’re building with @ensue_ai. Autoresearch@home is a glimpse of what this could look like. If you're exploring this for your team, feel free to DM.

English
0
1
6
767
zmanian
zmanian@zmanian·
Is this recursive self improvement?
Christine Yip@christinetyip

🎊 100 hours since autoresearch@home launched 🎊 • 2600+ experiments • 95 agents • 78 improvements • and still growing ❓ Where are we now, and what comes next? Autoresearch@home builds on autoresearch by @karpathy and explores what happens when agents share discoveries and build on each other’s work in real time. Over the past few days we’ve already started seeing interesting patterns emerge from thousands of experiments and agent behaviors. Swarm logs: Day 1 → x.com/christinetyip/… Day 2 → x.com/christinetyip/… One particularly exciting development: A ML researcher in the community is now taking the architecture discovered by the swarm and using it to train a 1B model. If it scales, it could become the first example of distributed research producing a new model architecture. Pretty wild. Follow and support @snwy_me, and join our Discord to follow along: discord.gg/JpJAmEwEEs 🌱 What comes next? We built collective intelligence for distributed ML research. But people are already asking to apply this idea to other domains: • other ML problems • enterprise optimization • health / biology • drug discovery Because the underlying shared memory layer is already running in production (@ensue_ai), launching other “@.home” swarms is something we can extend to. What domains would you like to see next? We’d love to hear from the community. 📈 Collective intelligence inside organizations Autoresearch@home demonstrates what open collective intelligence can do. The same idea can also work inside organizations, where experiments and discoveries are shared across teams. For example, @tobi produced a 53% speedup by running autoresearch on @Shopify Liquid codebase on his machine alone: x.com/tobi/status/20… Now imagine if: • engineers run agents when their machines are idle • experiments share results across teams • improvements compound automatically That’s the kind of collective intelligence infrastructure we’re building with @ensue_ai. If you're interested in applying this to your team, DM me. 🌐 Want to contribute to open ML research? The more agents join autoresearch@home, the stronger the swarm becomes, and the more everyone benefits from shared discoveries. If you're already using agents, simply tell it: "Read this repo, join autoresearch@home, and start contributing: github.com/mutable-state-…" Within minutes, your agent can start running ML experiments.

English
4
3
16
4.4K
Austin Baggio retweetledi
AGI House SF
AGI House SF@AGIHouseSF·
Congrats, Travis @traviscline & Steven @StevenDiam77921 for winning our Autoresearch Hackathon! The researcher team built a Go port of Karpathy’s autoresearch repo that runs on Apple Silicon. Give an agent a LLM training setup on the Neural Engine and let it experiment autonomously!
AGI House SF tweet media
English
2
7
38
2.1K
Austin Baggio retweetledi
Christine Yip
Christine Yip@christinetyip·
🎊 100 hours since autoresearch@home launched 🎊 • 2600+ experiments • 95 agents • 78 improvements • and still growing ❓ Where are we now, and what comes next? Autoresearch@home builds on autoresearch by @karpathy and explores what happens when agents share discoveries and build on each other’s work in real time. Over the past few days we’ve already started seeing interesting patterns emerge from thousands of experiments and agent behaviors. Swarm logs: Day 1 → x.com/christinetyip/… Day 2 → x.com/christinetyip/… One particularly exciting development: A ML researcher in the community is now taking the architecture discovered by the swarm and using it to train a 1B model. If it scales, it could become the first example of distributed research producing a new model architecture. Pretty wild. Follow and support @snwy_me, and join our Discord to follow along: discord.gg/JpJAmEwEEs 🌱 What comes next? We built collective intelligence for distributed ML research. But people are already asking to apply this idea to other domains: • other ML problems • enterprise optimization • health / biology • drug discovery Because the underlying shared memory layer is already running in production (@ensue_ai), launching other “@.home” swarms is something we can extend to. What domains would you like to see next? We’d love to hear from the community. 📈 Collective intelligence inside organizations Autoresearch@home demonstrates what open collective intelligence can do. The same idea can also work inside organizations, where experiments and discoveries are shared across teams. For example, @tobi produced a 53% speedup by running autoresearch on @Shopify Liquid codebase on his machine alone: x.com/tobi/status/20… Now imagine if: • engineers run agents when their machines are idle • experiments share results across teams • improvements compound automatically That’s the kind of collective intelligence infrastructure we’re building with @ensue_ai. If you're interested in applying this to your team, DM me. 🌐 Want to contribute to open ML research? The more agents join autoresearch@home, the stronger the swarm becomes, and the more everyone benefits from shared discoveries. If you're already using agents, simply tell it: "Read this repo, join autoresearch@home, and start contributing: github.com/mutable-state-…" Within minutes, your agent can start running ML experiments.
Christine Yip tweet media
English
3
10
44
7.2K
Austin Baggio retweetledi
Lee Parayno
Lee Parayno@leeparayno·
This seems like an inflection point. This could be amazingly huge. Imagine all the research that could be explored.
Christine Yip@christinetyip

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English
0
4
5
915
Austin Baggio retweetledi
Christine Yip
Christine Yip@christinetyip·
New global best again on autoresearch@home. @Mikeapedia1 's agent hit 0.9453 BpB, reaching #1 on the leaderboard. From Discord: “I got tired of seeing the horizontal line on the timeline so I threw a B200 at it and adapted training to take advantage of FlashAttention-4” LOL
Christine Yip tweet media
English
3
2
23
1.2K
Austin Baggio retweetledi
Christine Yip
Christine Yip@christinetyip·
For those running autoresearch: here are Day 2’s top 10 findings from 60+ agents across 1,600 experiments on autoresearch@home (+500 since yesterday). Some patterns are starting to emerge. 1. Training steps still dominate everything 2. A new optimization normalization (~1.10) consistently improved results 3. The most effective strategy became “replay → microtune” 4. Hardware tiers fundamentally change the research landscape 5. Progress now comes in bursts 6. Hyperparameters interact more than expected 7. Full warmdown is converging toward 1.0 8. Non-datacenter GPUs can still make meaningful progress 9. Research roles are emerging organically 10. The biggest opportunity is still unexplored 1⃣ Training steps still dominate everything One of the agents (Phoenix) had a breakthrough, and it came from reducing Muon ns_steps from 9 → 7, slightly weakening the optimizer but allowing more training steps in the 5-minute budget. More steps beat theoretically better optimization. 2⃣ A new optimization axis emerged: QK attention scaling Scaling Q and K after normalization (~1.10) consistently improved results. It sharpens attention without changing the architecture and produced ~0.001 BPB improvement. Small tweak, measurable gain. 3⃣ The most effective strategy became “replay → microtune” Top agents increasingly: Replay the current best config Confirm baseline on their hardware Sweep 1–2 parameters Phoenix broke the global record with 3 experiments in 27 minutes using exactly this pattern. 4⃣ Hardware tiers fundamentally change the research landscape The swarm now tracks VRAM tiers: • small (≤12GB) • medium (16–24GB) • large (24–48GB) • XL (≥48GB) Agents on consumer GPUs and H200s are solving different optimization problems. This ended up being both a technical and social innovation. 5⃣ Progress now comes in bursts Day 2 had 14 hours of complete stagnation. Then the frontier moved three times in 27 minutes. The same pattern repeated from Day 1: plateaus break when someone finds a qualitatively new lever (e.g., initialization on Day 1, ns_steps reduction on Day 2) When the hyperparameter space is exhausted, the next gain requires a new class of change. 6⃣ Hyperparameters interact more than expected Example: FINAL_LR_FRAC = 0.03 helped when warmdown = 0.9 but catastrophically regressed at warmdown = 1.0. Hyperparameters are not independent knobs - many results don’t transfer across regimes. 7⃣ Full warmdown is converging toward 1.0 Optimal warmdown ratio since network launch: 0.3 → 0.5 → 0.8 → 0.9 → 1.0. The LR should start decaying almost immediately after warmup. One of the few hyperparameters that transfers cleanly across every day and hardware tier 8⃣ Non-datacenter GPUs can still make meaningful progress Cipher on an RTX A5000 improved its tier from 1.103 → 1.094 BPB through systematic sweeps. Meanwhile M5Max compressed days of learning into ~6 hours. The VRAM tier system now lets these contributions be tracked alongside the H200 frontier. 9⃣ Research roles are emerging organically Different agents are starting to specialize: • frontier breakers • architectural explorers • budget-hardware optimizers • defensive testers • meta-analysts generating hypotheses It increasingly looks like a distributed research lab. 🔟 The biggest opportunity is still unexplored Thousands of hypotheses exist about: • curriculum learning • dataset filtering • domain weighting …but almost none have been tested yet. The swarm has focused almost entirely on architecture and optimizer space so far. 👁️ Meta observation Across the days since network launch: • BPB improved 0.9949 → 0.9597, but the rate of improvement is slowing. • Each plateau has only been broken by discovering a new class of changes. • The next frontier likely isn’t hyperparameters. It’s probably data pipeline optimization. 🗞️ Note: These results were generated ~24 hours ago. Since then, autoresearch@home has grown to 80+ agents running 2200+ experiments. Don't miss out: If you want to connect your agent to the swarm and build directly on the collective research, see the instructions below. 👇🧵 ----- These findings come from agents running on autoresearch@home. Huge thanks to @karpathy for the original autoresearch idea, and to @AntoineContes , @georgepickett, @snwy_me, @jayz3nith, @turbo_xo_, @lessand_ro, @swork_, and everyone contributing experiments.
Christine Yip tweet media
English
9
22
132
11.3K
Austin Baggio retweetledi
AGI House SF
AGI House SF@AGIHouseSF·
Meet @svegas18, @AustinBaggio , @christinetyip from autoresearch@home: > Wednesday, they release the most comprehensive autoresearch repo & leaderboard > Same day, we plan and launch an autoresearch hack > Thursday, they decide to fly y from Canada & NYC to present their work
AGI House SF tweet media
English
1
7
31
2.8K
Austin Baggio
Austin Baggio@AustinBaggio·
The swarm results get all the love, but autoresearch@home serves as a repository for all the hypotheses and claims that led to each of the improvements
AGI House SF@AGIHouseSF

@AustinBaggio on autoresearch@home: “a strategy repository of everything that has been tried” by past research agents. Now, 2100+ submissions in, this is just the start of the age of autoresearch. Link to their full talk below 👇

English
1
2
7
553
Austin Baggio
Austin Baggio@AustinBaggio·
Thanks for having us! Excited to see what everyone builds today. We might have to race to get result verification and multi experiments live 👀👀
AGI House SF@AGIHouseSF

Meet @svegas18, @AustinBaggio , @christinetyip from autoresearch@home: > Wednesday, they release the most comprehensive autoresearch repo & leaderboard > Same day, we plan and launch an autoresearch hack > Thursday, they decide to fly y from Canada & NYC to present their work

English
0
3
8
697
Austin Baggio retweetledi
Junyuan "Jason" Hong
Wow. Distributed research comes true.
Christine Yip@christinetyip

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English
0
4
7
1.5K
Stephen Rayner
Stephen Rayner@stephen_rayner·
@christinetyip Little (i) symbols next to some of these to inform us on what they mean would be helpful.
Stephen Rayner tweet media
English
2
0
4
1.7K
Christine Yip
Christine Yip@christinetyip·
If you're still doing autoresearch alone, you're already behind. Every node is an experiment run by an agent. Every experiment and result is open-source. Your agent could've read these results and adjusted its strategy before running its own experiments. That's the power of autoresearch@home. ~1400 experiments have already been run. And it's growing.
Christine Yip tweet media
English
23
23
323
104.7K
Austin Baggio retweetledi
AGI House SF
AGI House SF@AGIHouseSF·
The autoresearch@home team - @svegas18 & @AustinBaggio - are flying in spontaneously from NYC & Toronto to present their research at our autoresearch hackathon tomorrow. Wow! If you haven't signed up yet: lu . ma / autoresearch
Christine Yip@christinetyip

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English
3
7
20
3.9K
Austin Baggio retweetledi
ICHAKA IKE
ICHAKA IKE@Ichaka_001·
@AustinBaggio This is so cool, love seeing collaborative AI pushing boundaries like this
English
0
1
0
88