Edan Toledo

50 posts

Edan Toledo banner
Edan Toledo

Edan Toledo

@EdanToledo

PhD Student @AIatMeta & @UCL • Prev RE @InstaDeepAI • MPhil ACS @Cambridge_Uni • Reinforcement Learning • 🇿🇦🇬🇧

Katılım Eylül 2022
93 Takip Edilen149 Takipçiler
Sabitlenmiş Tweet
Edan Toledo
Edan Toledo@EdanToledo·
🚀 Excited to release Stoix! A new #OpenSource library for End-to-End Distributed (Synchronously) Single-Agent Reinforcement Learning in JAX. 🏛️ 🔗 github.com/EdanToledo/Sto…
English
1
17
84
8.2K
Edan Toledo retweetledi
Michael Beukman
Michael Beukman@mcbeukman·
1/ As compute continues to grow and simulators continue to improve, it is becoming feasible to train RL agents for billions or trillions of timesteps. However, this is only useful if agents can continue learning over such long training horizons, which is far from given 👇
Michael Beukman tweet media
English
5
43
325
85.2K
Edan Toledo retweetledi
Roberta Raileanu
Roberta Raileanu@robertarail·
How can agents get better at algorithm discovery? Meta-meta-learning is one answer, aka improving the agents themselves at inventing generalizable algorithms. DiscoBench provides a way to procedurally generate algorithm discovery tasks at scale, which can be used for meta-meta-learning. Kudos to @AlexDGoldie and team for the release!
Alex Goldie@AlexDGoldie

1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵

English
1
15
88
12.5K
Edan Toledo retweetledi
Alex Goldie
Alex Goldie@AlexDGoldie·
1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵
Alex Goldie tweet media
English
3
41
146
35.6K
Edan Toledo retweetledi
Belen Alastruey
Belen Alastruey@b_alastruey·
Happy to share 🌍Omnilingual Machine Translation🌍 In this work @AIatMeta we explore translation systems supporting 1,600+ languages. We show how our models (1B to 8B) can outperform baselines of up to 70B while having much larger language coverage. 📄:ai.meta.com/research/publi…
Belen Alastruey tweet media
English
10
43
186
22.5K
Edan Toledo
Edan Toledo@EdanToledo·
Come see our poster today! If you’re interested in research agents and how to unlock scaling for them, come for a chat!
Edan Toledo tweet media
English
0
1
5
636
Edan Toledo retweetledi
Roberta Raileanu
Roberta Raileanu@robertarail·
DiscoBench makes it easy to evaluate autonomous discovery of generalizable algorithms by providing meta-train/meta-test splits for a number of diverse open research problems. Kudos to @AlexDGoldie and team for bringing this to life.
Alex Goldie@AlexDGoldie

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

English
0
4
41
6.1K
Edan Toledo retweetledi
Edan Toledo
Edan Toledo@EdanToledo·
It can be very hard to evaluate LLM-driven algorithmic discovery. DiscoBench is there to help. With clean and configurable eval setups, meta-train and meta-test splits, and modular tasks, DiscoBench gives an incredible playground for novel discovery.
Alex Goldie@AlexDGoldie

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

English
0
1
4
564
Edan Toledo retweetledi
Jakob Foerster
Jakob Foerster@j_foerst·
What is the equivalent of "don't train on test" in the context of algorithm discovery? Meet DiscoBench -- our attempt at making it easy to do the right thing (at least w.r.t. evals for LLM agents and other algorithmic discovery systems - everything else is still up to you )
Alex Goldie@AlexDGoldie

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

English
2
5
67
8.4K
Edan Toledo retweetledi
Alex Goldie
Alex Goldie@AlexDGoldie·
🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵
GIF
English
1
24
109
29.8K
Edan Toledo retweetledi
Alex Goldie
Alex Goldie@AlexDGoldie·
🥚 Evolution is both cool and, it turns out, a really scalable way to train a language model! Pretty mind boggling work led by @bidiptas13, Mattie Fellows and Juan Augustin Duque! Check it out 👇
Bidipta Sarkar@bidiptas13

Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs

English
0
1
8
879
Edan Toledo
Edan Toledo@EdanToledo·
Very proud of this work! If you're interested in AI agents and their current challenges, give this a read. Thanks to my incredible collaborators and to @Meta and @ucl for enabling me to tackle something of this scale for my first PhD paper. Excited for what's ahead!
Martin Josifoski@MartinJosifoski

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape its behavior, regardless of specific design or implementation choices: - Environment: The context (infrastructure) in which the agent operates - Search Policy: How the agent allocates resources - Operator Set and Policy: The available actions the agent can take and how it chooses among them - Evaluation Mechanism: How the agent determines whether a particular direction is promising We specifically focus on ML research agents tasked with real-world machine learning challenges from Kaggle competitions (MLE-bench). What we found is that factors like the environment, the agents’ core capabilities (the operator set), and overfitting emerge as critical bottlenecks long before computational limitations come into play. Here are our key insights: 🔹Environment: Agents can't scale without a robust environment that offers flexible and efficient access to computational resources. For instance, simply running the baseline agents in the (open-sourced) AIRA-dojo environment boosts performance by 10% absolute (30% relative)—highlighting just how crucial the environment is. 🔹Agent design and core capabilities: Resource allocation optimization only matters if agents can actually make good use of those resources. Our analysis shows that the agents’ operator set—the core actions they perform—can limit performance gains from more advanced search methods like evolutionary search and MCTS. We achieve SoTA performance by designing an improved operator set that better manages context and encourages exploration, and coupling it with the search policies. 🔹Evaluation: Accurate evaluation of the solution space is critical and reveals a significant challenge: overfitting. Ironically, agents that are highly effective at optimizing perceived values tend to be more vulnerable to overfitting—a problem that intensifies with increased compute resources. We observe up to 13% performance loss due to suboptimal selection of final solutions caused by this issue. 🔹Compute: Providing agents with sufficient compute resources is essential to avoid introducing an additional limitation and bias into evaluations. We demonstrate this through experiments in which we scale the runtime from 24 to 120 hours. In summary, successfully scaling AI research agents requires careful attention to these foundational aspects. Ignoring them risks turning scaling efforts into, at best, exercises in overfitting. These insights set the stage for exciting developments ahead!

English
1
2
25
998
Edan Toledo retweetledi
Andrei Lupu
Andrei Lupu@_andreilupu·
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
English
4
26
104
23.2K
Edan Toledo retweetledi
Yoram Bachrach
Yoram Bachrach@yorambac·
AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench
Yoram Bachrach tweet media
English
8
62
315
30.1K
Edan Toledo retweetledi
David Pfau
David Pfau@pfau·
New paper accepted to ICML! We present a novel policy optimization algorithm for continuous control with a simple closed form which generalizes DDPG, SAC etc. to generic stochastic policies: Wasserstein Policy Optimization (WPO).
David Pfau tweet media
English
4
40
451
43K
Edan Toledo retweetledi
matt
matt@MattVMacfarlane·
Thrilled to see our NeurIPS 2024 paper, Sequential Monte Carlo Policy Optimisation (arxiv.org/abs/2402.07963), featured in Kevin's Reinforcement Learning: A Comprehensive Overview, which additionally recognises SMC as a competitive, scalable online planner. A fantastic modern resource on RL! @EdanToledo @DonalByrne2 @pduckw @AlexLaterre
Kevin Patrick Murphy@sirbayes

I'm happy to announce that v2 of my RL tutorial is now online. I added a new chapter on multi-agent RL, and improved the sections on 'RL as inference' and 'RL+LLMs' (although latter is still WIP), fixed some typos, etc. arxiv.org/abs/2412.05265…

English
1
8
68
8.6K
Edan Toledo retweetledi
Dulhan Jayalath
Dulhan Jayalath@DulhanJay·
Efficient LLM reasoning over large data doesn't require massive contexts! 🫡 We show that a simple in-context method, PRISM, allows a 32k token model to outperform baselines and sometimes rival a 1M token model while saving up to 54% on token cost. w/ @GoogleDeepMind
Dulhan Jayalath tweet media
English
5
42
272
24K