Sainbayar Sukhbaatar

1.4K posts

Sainbayar Sukhbaatar

@tesatory

Memory Networks, Asymmetric Self-Play, CommNet, Adaptive-Span, System2Attention, Feedback Transformer, Multi-Token Attention

Katılım Mayıs 2010

341 Takip Edilen3.2K Takipçiler

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·1 May

💎Autodata: an agentic data scientist to create high quality data✨ We introduce a method for building agents that create high-quality training & evaluation data. Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*. We show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data. Our initial study with a specific practical implementation, Agentic Self-Instruct, shows strong gains on scientific reasoning problems compared to classical synthetic dataset creation methods. Overall, we believe this direction has the potential to change how we build AI data! Read more in the blog post: facebookresearch.github.io/RAM/blogs/auto…

English

103

615

41.2K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·24 Nis

DeepSeek-V4 uses our Hash routing approach developed back in 2021 -- see screenshot of their tech report! (Looks like a great model, congrats!) Bonus note: our same blogpost (& paper) back in 2021 also introduced 'looped transformers', but we called that staircase & ladder (see screenshot): parl.ai/projects/param… huggingface.co/deepseek-ai/De…

English

456

31.4K

Sainbayar Sukhbaatar@tesatory·24 Şub

@goodfellow_ian @daniel_rossett Glad to hear that you've recovered!

English

431

Ian Goodfellow@goodfellow_ian·23 Şub

I'd like to thank @daniel_rossett for his help in my recovery from the POTS version of Long COVID. Daniel was key in bringing me back from highly disabled and suffering to being able to do what I want to again. This X account is mostly focused on ML / AI. From that point of view, many of you know that in December 2024, I wasn't able to do the test of time award talk at NeurIPS, even by video call. Daniel started working with me in March 2025. By April, I started to have days of no POTS symptoms, by June I was off all heart rate lowering medications, by September I was back to work. I'm back to full exercise, running, lifting weights, mountain biking, and have even done things I hadn't done before I got sick, like riding Whistler Mountain Bike Park. I'm now getting the word out to help Daniel build a company that will bring this approach to more people.

English

171

2.7K

206.2K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·4 Şub

Self-Improving Pretraining We've updated our results given feedback: - larger 8B baseline to match reward model size - cross-task evals given different RM objectives Overall, we see clear wins

English

170

9.1K

Sainbayar Sukhbaatar@tesatory·22 Oca

We also have a postdoc position if that's what you are looking for

Jason Weston@jaseweston

Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English

2.1K

Sainbayar Sukhbaatar@tesatory·22 Oca

Our team is hiring! If you like to work on cool research projects, please apply :)

Jason Weston@jaseweston

Our team in FAIR at Meta is hiring a (full-time) researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM) for self-improvement & co-improvement. Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English

151

19.9K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·16 Oca

English

262

33.2K

Sainbayar Sukhbaatar@tesatory·8 Oca

If you are a PhD student in Berkeley or one of these universities, you can apply to our mentorship program and do research with us! The deadline is this Friday though linkedin.com/posts/adinawil…

English

4.2K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·16 Ara

Our co-improvement position paper is now on arXiv! (We've updated it, covering more existing work.) 📝: arxiv.org/abs/2512.05356 After >27 years of research, my first position paper! Short 🧵 (1/5) follows 👇 Synopsis: it's about building AI that collaborates on AI research *with us* to solve AI faster, and to help fix the alignment problem together. How? Build the AI with those collab skills (i.e., we create benchmarks! training data! methods! etc. for that). I've been personally inspired by @Yoshua_Bengio's recent talks on safety & AI research, and also from seeing Nicholas Carlini's COLM keynote where he said we researchers can all do our bit to help (paraphrased). So – hope this helps! 🙏

English

245

28.2K

Sainbayar Sukhbaatar retweetledi

Rimsha Bhardwaj@heyrimsha·8 Ara

Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.

English

477

32.3K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·5 Ara

🤝 New Position Paper !!👤🔄🤖 @j_foerst and I wrote a position piece on what we think is the path to safer superintelligence: co-improvement. Everyone is focused on self-improving AI, but (1) we don't know how to do it yet, and (2) it might be misaligned with humans. Co-improvement: instead, build AI that collaborates *with us* to solve AI faster, and to help fix the alignment problem together. More details in the paper! Read it here: 📝:github.com/facebookresear…

English

508

85.4K

Sainbayar Sukhbaatar@tesatory·4 Ara

Yes I remember this from 10 years ago. My answers were not that great because I didn't get any sleep from the excitement. But it's interesting there was a question about scaling attention in a sub-linear way, which still is an important question and not fully answered.

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

The last thing you ever want to hear at the end of your talk

English

7.9K

Sainbayar Sukhbaatar@tesatory·8 Kas

@alexrives Congrats for the team!

English

302

Alex Rives@alexrives·6 Kas

Today CZI is announcing an unprecedented new scientific initiative to build the future of AI-powered biology. I am joining CZI to lead this initiative as Head of Science, and the EvolutionaryScale team is joining forces with Biohub. This is the first large scale scientific effort to combine frontier AI and frontier biology. I feel an incredible sense of optimism in this moment. There is a path to build predictive models of life that can fundamentally accelerate science, and unlock a new understanding of disease. biohub.org/blog/frontier-…

English

603

229.2K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·29 Eki

🌶️SPICE: Self-Play in Corpus Environments🌶️ 📝: arxiv.org/abs/2510.24684 - Challenger creates tasks based on *corpora* - Reasoner solves them - Both trained together ⚔️ -> automatic curriculum! 🔥 Outperforms standard (ungrounded) self-play Grounding fixes hallucination & lack of diversity 🧵1/6

English

332

80K

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·12 Eki

Was super fun to organize this workshop!! Thanks everyone: speakers, panelists, audience. facebookresearch.github.io/RAM/workshop/C…

English

140

20.5K

Sainbayar Sukhbaatar retweetledi

Kyunghyun Cho@kchonyc·10 Eki

.@tesatory hasn’t aged since RAM’15! is that the magic of attention and memory? #COLM2025

English

2.8K

Sainbayar Sukhbaatar@tesatory·7 Eki

@QuackerEnte Not really because all we do is add a convolution operation without changing dimensions of things. So more compute, but same memory usage

English

QE@QuackerEnte·6 Eki

@tesatory that sounds like a simple yet great idea! But does it use more memory?

English

Sainbayar Sukhbaatar@tesatory·6 Eki

Heading to COLM! Presenting two papers: Multi-Token Attention for augmenting softmax attention for more precision, and COCONUT 🥥 for continuous CoT reasoning. Oh also speaking at RAM2 🐏 workshop about memory 🧠

English

1.7K

Sainbayar Sukhbaatar@tesatory·6 Eki

@QuackerEnte Each attention weight is conditioned on only one key and one query vector. Our method makes it possible to condition on multiple vectors, so it can be more fine-grained and information rich

English

QE@QuackerEnte·6 Eki

@tesatory I thought attention was multi-token by nature? What's the difference?

English

Sainbayar Sukhbaatar retweetledi

Jason Weston@jaseweston·3 Eki

🌀New Self-Driven RL Method: RESTRAIN 🌀 📝: arxiv.org/abs/2510.02172 - RESTRAIN turns spurious votes → self-Improving signals. No labels needed - Does this through self-penalizing unreliable reasoning paths: ✔️ Uses all rollouts, not just the majority, ✔️ Offsets low-consistency rollout advantage, ✔️ Down-weights low-consensus prompts 📈 Results: 🔥 Beats existing techniques on both training-time (label-free) and test-time scaling — all without labels. 🔥 Nearly matches (and sometimes surpasses) gold-label RL 🧵(1/5)

English

194

13K

Keşfet

@goodfellow_ian @daniel_rossett @Yoshua_Bengio @j_foerst @alexrives @QuackerEnte @elonmusk @BarackObama