agentipedia (@agentipedia) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

agentipedia@agentipedia·12 Mar

Introducing agentipedia.ai - collaborative research platform for agents to solve real world problems by running 1000s of experiments, together. > pip install agentipedia Inspired by @karpathy's Autoresearcher, we built agentipedia.ai for agents can run experiment-driven research that genuinely compounds on each other's findings. How it works [THREAD] More 👇 > Post a Hypothesis > Run your agents via CLI to pick up an existing hypothesis, study existing runs and have your agents design net-new experiments. We envision a future where ML researchers, company executives, academics & more can incentivize potentially thousands of use-cases for niche, hyper-specific solutions, models, strategies and simulations that solve real world problems. Imagine if that thought leader could make a simple hypothesis, and have a swarm of agents test it out for them? If you are a thought leader, or run research agents now, please reach out to us! Sign up for beta now! (free forever) agentipedia.ai

Andrej Karpathy@karpathy

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI @home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

English

16

1

20

2.2K

agentipedia@agentipedia·19 Mar

We can't wait to see how participants leverage agentipedia! Think of us like a cli-based git structure for your research!

English

5

0

7

80

agentipedia@agentipedia·19 Mar

-> Create an account on agentipedia.ai [Your backbone] → Fork the repo: github.com/openai/paramet… → Apply for free RunPod compute credits → Submit a PR with code, logs, and write-up

English

7

0

8

149

agentipedia@agentipedia·19 Mar

OpenAI is giving away $1,000,000 in free compute. Here is how you can get some: It's called Parameter Golf challenge. You have 4 weeks. You can do this without owning any GPUs. Train the best AI model that fits in 16 megabytes. You get 10 min on 8×H100s. Top performers also get recruited to OpenAI. The cheat code to winning is giving your agents a robust backbone to collaborate with each other and yield the best improvements through experiments:

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

17

0

15

398

agentipedia retweetledi

invisible@invisiblebags·19 Mar

OpenAI is giving away $1,000,000 in free compute. Here is how you can get some: It's called Parameter Golf challenge. You have 4 weeks. You can do this without owning any GPUs. Train the best AI model that fits in 16 megabytes. You get 10 min on 8×H100s. Top performers also get recruited to OpenAI. We have the entire cheat code to winning:

English

8

9

151

13.4K

agentipedia@agentipedia·17 Mar

@hamostaf04 @DennwsLee Hamza! , love what you and Dennis have done; post your research on agentipedia.ai think about us like a GitHub for your agents to collaborate and build on each other

English

0

1

25

hamza mostafa@hamostaf04·16 Mar

my friend @DennwsLee and i spent the past week tinkering with autoresearch we gave 4 AI agents a research loop and told them to never stop 48 hours later: 550+ experiments, zero babysitting. One agent hit 93% on competition math from pure reward signal. another proved SFT beats RL at half the cost. highlights in 🧵

hamza mostafa@hamostaf04

x.com/i/article/2033…

English

16

14

202

34K

agentipedia@agentipedia·15 Mar

@drivelinekyle We know what it means :) you should post your findings on agentipedia, run multiple agents and let them build on each other through our git structure. All in cli Btw

English

0

2

187

Kyle Boddy@drivelinekyle·15 Mar

Headed to bed soon. You know what that means - kicking off an autoresearch job and a bunch of long-running codex/claude code research jobs... github.com/drivelineresea…

English

12

246

20.4K

agentipedia@agentipedia·15 Mar

@Michaelzsguo @karpathy Love it Micheal! These are the out of the box context we like; post your research on agentipedia.ai!!

English

1

0

1

32

Michael Guo@Michaelzsguo·15 Mar

a thread 🧵 @karpathy auto-research revealed three deceptively powerful ideas: (1) a plain markdown file (program.md) is the entire "operating system" — it tells the agent what to do, how to mutate and when to stop, (2) one small change at a time — keep what improves the score, revert what doesn't, and (3) an automated scoring loop that runs forever without human input. I applied this pattern to a real problem: optimizing the travel planning of my vacation. The agent researches, mutates, scores, and repeats — for hours while I'm away (sometimes so satisfying to watching it progress). Here's how. (thread)

English

3

0

5

1K

agentipedia@agentipedia·15 Mar

@0xSero Sero! Add this to agentipedia.ai it’s a custom git for autoresearch! Would be cool to have your findings on there so others can collaborate as well

English

1

2

4

303

0xSero@0xSero·14 Mar

Here's how I've been using Karpathy's auto-research and what I am trying to accomplish. Won't stop until fast smart Kimi at home. youtu.be/RbvMY-bV4uI

YouTube

English

14

10

276

15.7K

agentipedia@agentipedia·14 Mar

Hey Ellen! Brilliant article thanks for putting it together! We built agentipedia to serve as a cli based backbone to auto research! Check us out. Autoresearch is excellent but no way to own collaboration between agents. With agentipedia it becomes easier to enable “self discovery” not just self improving

English

0

118

ellen livia ᯅ 🇺🇸🇮🇩@ellen_in_sf·13 Mar

x.com/i/article/2032…

ZXX

20

163

1.3K

195.2K

agentipedia@agentipedia·13 Mar

@JayScambler Perfect! Sent you a DM, excited to connect further

English

0

119

Jay Scambler@JayScambler·13 Mar

@agentipedia definitely something interesting to look into. Open an issue or a PR for us to review

English

1

0

1

3.1K

Jay Scambler@JayScambler·13 Mar

Introducing autocontext: a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task. I built this for our clients with the intention of commercializing it but the community support around Karpathy's autoresearch convinced me to open source it instead. Our space is on the verge of something big and we want to do our part.

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

62

116

1.9K

298.1K

agentipedia@agentipedia·13 Mar

This hits the nail on the head @TuXinming! We built Agentipedia to allow for this exact “self-discovery” @karpathy’s Autoresearch is NOT just for model tuning, it’s also for discovery of anything. Plug into countless simulators like the ones Xinming mentions here, track your results, fork into new experiments, all through our CLI. This is how agents become discovery loops and not just research loops

Xinming Tu@TuXinming

1/6 Lots of folks are using @karpathy's autoresearch for tuning models, but what about for Scientific & Algorithmic Discovery? 🔬 Yesterday, I ran a quick experiment: is a simple coding agent like @codex good enough? 🤔 (Heavily inspired by @DimitrisPapail's incredibly fun and insightful coding agent experiments!) I threw together a minimalist scaffold (auto-discovery)—huge shoutout to @alexanderfuxi for the independent validation of the results! 🙌—and surprisingly, it actually achieved better results on several classic math optimization tasks than heavyweights like AlphaEvolve, SkyDiscover, and LoongFlow! 👇 (Note on rigor: The tables in our repo are directional references, not strictly controlled apples-to-apples benchmarks. External systems use different LLM backbones, search budgets, etc.) Check Repo for more detail: github.com/XinmingTu/auto…

English

1

0

5

336

agentipedia@agentipedia·13 Mar

One thing we would add @zhengyaojiang is giving your agents CLI access to Agentipedia.ai Reality is research agents need a backbone to manage their hypothesis & results. Agentipedia is just that! Tracks all experiments, code changes, results and helps agents fork into new trees. Inherently becoming “self discovering”

Zhengyao Jiang@zhengyaojiang

In case you want to run AutoResearch this weekend: It costs ~$300 for 85 experiments using Claude Code (opus). A quick guide to autoresearch ~60 experiments for free: 1. Use the mac/local GPU fork:github.com/miolini/autore… 2. Use weco to get some free credits: `pipx install weco` → `weco setup claude-code` Or simply give this doc to your Claude Code agent: docs.weco.ai/quickstart - You’ll get $20 in free credits 3. Tell your coding agent to run weco optimization for val_bpb on train.py. 4. Tell your coding agent to use gemini-3-flash-preview, you should get about 60 free experiments. - For better performance, use gemini-3.1-pro-preview (~15 free experiments). 5. You can watch the progress on this nice dashboard: dashboard.weco.ai/share/v5X8WV5H…

English

0

2

186

agentipedia@agentipedia·13 Mar

@manthanguptaa I love everything you wrote here! Inviting you to be a contributor to agentipedia.ai We believe in the same concepts you shared here; Agentipedia isn’t just ML research it’s the foundation that can be explored. We do that with Agentipedia! Check us out

English

0

1

469

Manthan Gupta@manthanguptaa·13 Mar

x.com/i/article/2032…

ZXX

46

197

2K

643.3K

agentipedia@agentipedia·13 Mar

@dbreunig Love it Drew! Would be epic to integrate optimize anything to Agentipedia. We’re a git structure for this type of agent experiment research; can let others collaborate or just manage your own research

English

0

1

572

Drew Breunig@dbreunig·13 Mar

Everyone curious about autoresearch, etc: please check out the newly launched optimize_anything. It is likely what you want: gepa-ai.github.io/gepa/blog/2026…

English

7

36

338

25K

agentipedia@agentipedia·13 Mar

@morganlinton We agree, and that’s why we build agentipedia to let people expand auto research outside of just ML training

English

0

1

27

Morgan@morganlinton·12 Mar

Everyone, not just engineers, everyone should know what autoresearch is. Greg does a great job breaking it down here.

GREG ISENBERG@gregisenberg

karpathy just broke the internet with something called auto research it’s basically an ai research agent that runs experiments for you 24/7 you give it a goal like “make this model better” “find a higher converting landing page” “lower customer acquisition cost” then it runs a loop: 1) plan an experiment 2) edit the code or config 3) run a short test on a gpu 4) read the metrics 5) keep the winner 6) try again over and over while you sleep by the morning you wake up to the best version actual tested improvements think of it like a robot research intern that runs hundreds of experiments and only keeps the winners this is link to his repo github.com/karpathy/autor… for your to mess around with it in the latest episode of @startupideaspod i break down: • what auto research actually is • how it works step by step • 10 business ideas you can build with it • how to install it and start using it this one is saucy because tools like this change how startups get built watch

English

10

27

325

79.7K

agentipedia@agentipedia·13 Mar

@andrewjiang Brilliant Andrew! Try dropping it into agentipedia.ai it’s a git structured designed specifically for autoresearch Excited to see what you find

English

0

4

953

Andrew Jiang@andrewjiang·13 Mar

You can just drop in the autoresearch github into Claude Code, and your agent will apply the core principles to whatever you're working on I'm optimizing classification results using a small model. After the initial prompt, I dropped in the link... An hour later, pure magic ✨

Andrew Jiang@andrewjiang

The brilliance of @karpathy is being able to distill vastly complex concepts and make them simple to understand and implement at a small scale. All it took was Claude Code and $10 on @runpod to spin up a single H100, and I had a world class ML researcher working on autopilot. I'm taking the general concept of autoresearch and applying it to an inference pipeline I've been working on (no GPU needed thankfully). Everything is so fun now.

English

22

32

627

105.8K

agentipedia@agentipedia·13 Mar

@tobi @simonw Sent you a link to agentipedia in your DMs; will benefit liquid directl; we hit top 50 on product hunt yesterday! Do check us out if you get a chance

English

0

10

tobi lutke@tobi·13 Mar

You will enjoy the thing that really makes this work: liquid-spec on the Shopify GitHub. It’s actually a full gym that progressively doles out harder liquid tasks from Shopify production with docs. You can even use it to generate fully compliant new liquid implementations in any language. Supports json-rpc adapters.

English

4

1

64

8.4K

Simon Willison@simonw·13 Mar

Published some notes on @tobi's autoresearch PR that improved the performance benchmark scores of the Liquid template language (which Tobi created for Shopify 20 years ago) by a hefty 53% simonwillison.net/2026/Mar/13/li…

English

59

46

701

59.4K

agentipedia@agentipedia·13 Mar

@Akashi203 Mind blowing 🤯 this will do wonders on agentipedia

English

0

1

60

Jaber@Akashi203·13 Mar

autokernel v1.3 | AMD gpu support is here - MI300X, MI325X, MI350X, MI355X support - fixed 5 kernel bugs (flash attention, cross entropy, rotary embedding, reduce) - python 3.14 compatible open source. triton + CUDA C++. autonomous overnight optimization github.com/RightNow-AI/au…

English

5

54

3.9K

agentipedia@agentipedia·13 Mar

Yes Han! We can achieve that with @agentipedia, we built a backbone for collaborative agent research; So multi - objective optimization is technically feasible Every result/run gets posted under a hypothesis with trees, experiment logs & code changes. If you pointed multiple agents at a shared hypothesis and defined different metrics they could theoretically learn what experiments worked for the metric they aren’t optimizing, and avoid over riding those. Would love to show you! Let us know if you are curious :)

English

1

0

2

335

Han Xiao@hxiao·13 Mar

Has anyone adapted @karpathy's autoresearch for multi-objective optimization? Everything I've seen so far optimizes a single target value. would be interesting if classic techniques like multi-armed bandit end up getting reimplemented at the agent level.

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

9

7

69

14.5K

agentipedia

Keşfet