Edan Toledo

17

84

8.2K

Edan Toledo retweetledi

Michael Beukman@mcbeukman·2d

1/ As compute continues to grow and simulators continue to improve, it is becoming feasible to train RL agents for billions or trillions of timesteps. However, this is only useful if agents can continue learning over such long training horizons, which is far from given 👇

English

5

43

325

85.2K

Edan Toledo retweetledi

Roberta Raileanu@robertarail·27 Mar

How can agents get better at algorithm discovery? Meta-meta-learning is one answer, aka improving the agents themselves at inventing generalizable algorithms. DiscoBench provides a way to procedurally generate algorithm discovery tasks at scale, which can be used for meta-meta-learning. Kudos to @AlexDGoldie and team for the release!

1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵

English

15

88

12.5K

Edan Toledo@EdanToledo·26 Mar

The potential is huge!

1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵

English

1

14

1.6K

Edan Toledo retweetledi

Alex Goldie@AlexDGoldie·25 Mar

1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵

English

3

41

146

35.6K

Edan Toledo retweetledi

Belen Alastruey@b_alastruey·17 Mar

Happy to share 🌍Omnilingual Machine Translation🌍 In this work @AIatMeta we explore translation systems supporting 1,600+ languages. We show how our models (1B to 8B) can outperform baselines of up to 70B while having much larger language coverage. 📄:ai.meta.com/research/publi…

English

10

43

186

22.5K

Edan Toledo retweetledi

Deepak Nathani@deepaknathani11·8 Ara

It was a great experience talking at the @SEAWorkshop at #NeurIPS. This was my first workshop talk and it’s all thanks to the organizers @lawhy_X @guohao_li, @robertarail for recommending me and @AlexDGoldie for leading DiscoBench.

SEA Workshop@SEAWorkshop

Invited Talk 4 "Automated Algorithmic Discovery for AI Research" from Deepak Nathani @deepaknathani11 (UC Santa Barbara)

English

3

7

57

7.3K

Edan Toledo retweetledi

Rishi Hazra@RishiHazra95·5 Ara

happening right now, #AiResearchAgents #NeurIPS, @AIatMeta @EdanToledo @MartinJosifoski @AlexAudR

English

4

13

603

Edan Toledo@EdanToledo·4 Ara

Come see our poster today! If you’re interested in research agents and how to unlock scaling for them, come for a chat!

English

1

5

636

Edan Toledo retweetledi

Roberta Raileanu@robertarail·2 Ara

DiscoBench makes it easy to evaluate autonomous discovery of generalizable algorithms by providing meta-train/meta-test splits for a number of diverse open research problems. Kudos to @AlexDGoldie and team for bringing this to life.

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

English

Deepak Nathani@deepaknathani11

4

41

6.1K

Edan Toledo retweetledi

Alex Goldie@AlexDGoldie·2 Ara

🪩 If you’re at NeurIPS, get down to Deepak’s talk on Saturday to learn all about DiscoBench!

🪩 DiscoBench rethinks how we evaluate AI research agents. Modular codebases so your agent can focus on the most important discoveries, meta-train/meta-test splits, and a growing set of tasks spanning RL, vision, unlearning, and more. I will be talking about DiscoBench at the SEA workshop on Dec 7th from 10:30 AM. Find me at NeurIPS if you want to chat AI for scientific discovery! 🔬

English

3

17

2.5K

Edan Toledo@EdanToledo·2 Ara

It can be very hard to evaluate LLM-driven algorithmic discovery. DiscoBench is there to help. With clean and configurable eval setups, meta-train and meta-test splits, and modular tasks, DiscoBench gives an incredible playground for novel discovery.

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

English

1

4

564

Edan Toledo retweetledi

Jakob Foerster@j_foerst·1 Ara

What is the equivalent of "don't train on test" in the context of algorithm discovery? Meet DiscoBench -- our attempt at making it easy to do the right thing (at least w.r.t. evals for LLM agents and other algorithmic discovery systems - everything else is still up to you )

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

English

2

5

67

8.4K

Edan Toledo retweetledi

Alex Goldie@AlexDGoldie·1 Ara

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

GIF

English

Bidipta Sarkar@bidiptas13

24

109

29.8K

Edan Toledo retweetledi

Alex Goldie@AlexDGoldie·21 Kas

🥚 Evolution is both cool and, it turns out, a really scalable way to train a language model! Pretty mind boggling work led by @bidiptas13, Mattie Fellows and Juan Augustin Duque! Check it out 👇

Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs

English

Martin Josifoski@MartinJosifoski

1

8

879

Edan Toledo@EdanToledo·7 Tem

Very proud of this work! If you're interested in AI agents and their current challenges, give this a read. Thanks to my incredible collaborators and to @Meta and @ucl for enabling me to tackle something of this scale for my first PhD paper. Excited for what's ahead!

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape its behavior, regardless of specific design or implementation choices: - Environment: The context (infrastructure) in which the agent operates - Search Policy: How the agent allocates resources - Operator Set and Policy: The available actions the agent can take and how it chooses among them - Evaluation Mechanism: How the agent determines whether a particular direction is promising We specifically focus on ML research agents tasked with real-world machine learning challenges from Kaggle competitions (MLE-bench). What we found is that factors like the environment, the agents’ core capabilities (the operator set), and overfitting emerge as critical bottlenecks long before computational limitations come into play. Here are our key insights: 🔹Environment: Agents can't scale without a robust environment that offers flexible and efficient access to computational resources. For instance, simply running the baseline agents in the (open-sourced) AIRA-dojo environment boosts performance by 10% absolute (30% relative)—highlighting just how crucial the environment is. 🔹Agent design and core capabilities: Resource allocation optimization only matters if agents can actually make good use of those resources. Our analysis shows that the agents’ operator set—the core actions they perform—can limit performance gains from more advanced search methods like evolutionary search and MCTS. We achieve SoTA performance by designing an improved operator set that better manages context and encourages exploration, and coupling it with the search policies. 🔹Evaluation: Accurate evaluation of the solution space is critical and reveals a significant challenge: overfitting. Ironically, agents that are highly effective at optimizing perceived values tend to be more vulnerable to overfitting—a problem that intensifies with increased compute resources. We observe up to 13% performance loss due to suboptimal selection of final solutions caused by this issue. 🔹Compute: Providing agents with sufficient compute resources is essential to avoid introducing an additional limitation and bias into evaluations. We demonstrate this through experiments in which we scale the runtime from 24 to 120 hours. In summary, successfully scaling AI research agents requires careful attention to these foundational aspects. Ignoring them risks turning scaling efforts into, at best, exercises in overfitting. These insights set the stage for exciting developments ahead!

English

Kevin Patrick Murphy@sirbayes

2

25

998

Edan Toledo retweetledi

Andrei Lupu@_andreilupu·26 Haz

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

English

4

26

104

23.2K

Edan Toledo retweetledi

Yoram Bachrach@yorambac·7 Tem

AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench

English

8

62

315

30.1K

Edan Toledo retweetledi

David Pfau@pfau·2 May

New paper accepted to ICML! We present a novel policy optimization algorithm for continuous control with a simple closed form which generalizes DDPG, SAC etc. to generic stochastic policies: Wasserstein Policy Optimization (WPO).

English

4

40

451

43K

Edan Toledo retweetledi

matt@MattVMacfarlane·26 Mar

Thrilled to see our NeurIPS 2024 paper, Sequential Monte Carlo Policy Optimisation (arxiv.org/abs/2402.07963), featured in Kevin's Reinforcement Learning: A Comprehensive Overview, which additionally recognises SMC as a competitive, scalable online planner. A fantastic modern resource on RL! @EdanToledo @DonalByrne2 @pduckw @AlexLaterre

I'm happy to announce that v2 of my RL tutorial is now online. I added a new chapter on multi-agent RL, and improved the sections on 'RL as inference' and 'RL+LLMs' (although latter is still WIP), fixed some typos, etc. arxiv.org/abs/2412.05265…

English