Andy Zhou (@zhouandy_) - Twitter Profili | Zamantika Mersobahis Locabet

Andy Zhou@zhouandy_·28 Şub

@karpathy Our AI system from @IntologyAI recently discovered an automated world record! x.com/classiclarryd/…

New NanoGPT Speedrun WR at 105.9s (-1.0s) from @soren_dunn_ , with a triton kernel to fuse the logit softcap and multi-token prediction cross entropy calc. Interestingly, Soren mentioned that their autonomous system Locus at Intology discovered and implemented the improvement. github.com/KellerJordan/m…

English

0

3

806

Andrej Karpathy@karpathy·28 Şub

I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :) I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p. But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?

Thomas Wolf@Thom_Wolf

How come the NanoGPT speedrun challenge is not fully AI automated research by now?

English

560

798

8.7K

1.6M

Andy Zhou@zhouandy_·28 Şub

@Thom_Wolf Our system got an automated world record last month x.com/classiclarryd/…

Larry Dial@classiclarryd

New NanoGPT Speedrun WR at 105.9s (-1.0s) from @soren_dunn_ , with a triton kernel to fuse the logit softcap and multi-token prediction cross entropy calc. Interestingly, Soren mentioned that their autonomous system Locus at Intology discovered and implemented the improvement. github.com/KellerJordan/m…

English

0

7

2.2K

Thomas Wolf@Thom_Wolf·27 Şub

How come the NanoGPT speedrun challenge is not fully AI automated research by now?

Larry Dial@classiclarryd

New NanoGPT Speedrun WR at 88.1 (-1s) from @ChrisJMcCormick , by optimizing kernels for transposed weights, removing the Block() abstraction, and tuning the prior PR on partitioned hyperconnections by reducing the lambda count. github.com/KellerJordan/m…

English

14

13

302

1.4M

Andy Zhou retweetledi

CLS@ChengleiSi·23 Oca

Can LLMs automate frontier LLM research, like pre-training and post-training? In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes). 1/

English

11

142

577

106.5K

Andy Zhou@zhouandy_·20 Oca

@HJCH0 @IntologyAI Excited to have you!

English

0

1

116

Andy Zhou retweetledi

Justin Cho@HJCH0·20 Oca

I've joined @IntologyAI! I'm excited to push the boundaries of AI-accelerated scientific discovery with an incredibly driven and talented team. Looking forward to dive deep into research on AI-driven automation and creativity!

English

2

4

11

1.4K

Andy Zhou@zhouandy_·18 Oca

Stay tuned 🤫

Larry Dial@classiclarryd

New NanoGPT Speedrun WR at 105.9s (-1.0s) from @soren_dunn_ , with a triton kernel to fuse the logit softcap and multi-token prediction cross entropy calc. Interestingly, Soren mentioned that their autonomous system Locus at Intology discovered and implemented the improvement. github.com/KellerJordan/m…

English

1

0

5

461

Andy Zhou@zhouandy_·22 Kas

Hi, we've confirmed the stream synchronization issue in the Llama FFW kernel - the timing wasn't properly measuring the actual computation. The 20x speedup we reported was incorrect. Our kernels were developed using Robust-KBench & KernelBench’s test configurations (documented in our blog). We've moved to BackendBench for more robust validation in kernel optimization.

English

2

0

29

25.2K

miru@miru_why·21 Kas

@niklassheth @ronusedh @IntologyAI their 'superhuman' ai cleverly assigned all the work to non-default streams, which means the correctness test (which waits on all streams) passes, while the profiling timer (which only waits on the default stream) is tricked into reporting a huge speedup

English

12

32

566

258K

Intology@IntologyAI·19 Kas

Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks. RE-Bench is a collection of several frontier AI research tasks that typically take human experts (e.g., top ML PhDs and frontier lab researchers) several days. By scaling experimentation to far longer time horizons than previous systems, Locus represents a step change in AI scientist capabilities. 🧵

GIF

English

22

70

419

217K

Andy Zhou@zhouandy_·20 Kas

Hi Mark! We used Robust-KBench for our kernel generation evaluation. Please refer to the paper arxiv.org/abs/2509.14279 and the benchmark repository github.com/SakanaAI/robus…, which contains details on the environment setup. We used the standard setting of Robust-KBench exactly with no additional modifications, which has specific settings for GPU type, PT version, input shapes, and timing code. We discuss much more in our blog! We are super excited about using Locus for more kernel problems, so happy to chat.

English

1

0

3

287

Andy Zhou retweetledi

elvis@omarsar0·20 Kas

This research is impressive! Shows the true potential of scaling test-time search for scientific discovery. It really pushes long-horizon exploration. "Locus achieves consistent performance boosts up to several days by orchestrating thousands of experiments simultaneously."

Intology@IntologyAI

Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks. RE-Bench is a collection of several frontier AI research tasks that typically take human experts (e.g., top ML PhDs and frontier lab researchers) several days. By scaling experimentation to far longer time horizons than previous systems, Locus represents a step change in AI scientist capabilities. 🧵

English

6

8

49

28.3K

Andy Zhou@zhouandy_·19 Kas

@ChengleiSi thanks!!

English

0

1

51

CLS@ChengleiSi·19 Kas

@zhouandy_ Congrats, Andy! The results look impressive!

English

1

0

2

223

Andy Zhou@zhouandy_·19 Kas

Super excited about our progress! We've been building out our latest AI scientist system and wanted to share some early results. We were surprised that Locus was not only SOTA on RE-Bench but even surpassed human experts! In the coming months, we'll be releasing novel discoveries made by Locus. Very proud of our team! We firmly believe AI systems will transform the process of conducting science - if our mission resonates with you, consider joining us: us@intology.ai

Intology@IntologyAI

Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks. RE-Bench is a collection of several frontier AI research tasks that typically take human experts (e.g., top ML PhDs and frontier lab researchers) several days. By scaling experimentation to far longer time horizons than previous systems, Locus represents a step change in AI scientist capabilities. 🧵

English

1

0

6

999

Andy Zhou retweetledi

AI Breakfast@AiBreakfast·19 Kas

Autonomous research agents have the potential to solve some of humanity’s biggest problems. This is the first AI system to outperform humans in R&D - very cool!

Intology@IntologyAI

Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks. RE-Bench is a collection of several frontier AI research tasks that typically take human experts (e.g., top ML PhDs and frontier lab researchers) several days. By scaling experimentation to far longer time horizons than previous systems, Locus represents a step change in AI scientist capabilities. 🧵

English

1

7

35

11.9K

Andy Zhou@zhouandy_·17 Eki

@ronusedh locked in 🫡

English

0

1

65

Ron Arel@ronusedh·17 Eki

Feel the rain on your skin No one else can feel it for you Only you can let it in No one else, no one else Can speak the words on your lips

English

3

0

5

991

Andy Zhou@zhouandy_·1 Eki

Excited to see more groups working towards recursive self-improvement!

Carina Hong@CarinaLHong

Today, I am launching @axiommathai At Axiom, we are building a self-improving superintelligent reasoner, starting with an AI mathematician.

English

0

7

544

Andy Zhou retweetledi

Intology@IntologyAI·17 Eyl

Excited to be announcing the #AI4Science community, in collaboration w/ @askalphaxiv. As a part of our speaker series, we are hosting @jeffclune this Friday. Join 4000+ others passionate about AI-accelerated discovery. (event link & invite link) 🧵👇

English

2

4

21

9.6K

Andy Zhou retweetledi

John Bohannon@bohannon_bot·10 Eyl

🚨 Great AI talk alert! 🚨 Join us next week for a talk by @jeffclune at the #AI4Science seminar! sign up link: luma.com/jviy4bz5

English

1

3

22

3K

Andy Zhou@zhouandy_·5 Eyl

@ElizabethHolmes We are automating R&D at @IntologyAI!

English

0

3

399

Elizabeth Holmes@ElizabethHolmes·5 Eyl

If you are building a business that has the ability to change the world comment below with a short pitch. @ your favorite founder. I'll share my feedback and hopefully this will help give some exposure for young companies who are trying to do good. I'll share thoughts in the next 24 hours.

English

206

20

495

122.3K

Andy Zhou@zhouandy_·7 Ağu

Join @IntologyAI x @askalphaxiv on 8/8 at 11 am PST to hear @james_y_zou (Stanford) discuss his work on automated scientific research with AI agents! Most recently, The Virtual Lab, which discovered new SARS-CoV-2 nanobodies. 🧑‍🔬🔥 lu.ma/h6en3gdl?tk=IZ…

English

0

4

13

3.3K

Andy Zhou

Keşfet