Sainbayar Sukhbaatar

1.4K posts

Sainbayar Sukhbaatar banner
Sainbayar Sukhbaatar

Sainbayar Sukhbaatar

@tesatory

Researcher Scientist at FAIR @AIatMeta Research: Memory Networks, Asymmetric Self-Play, CommNet, Adaptive-Span, System2Attention, ...

Katılım Mayıs 2010
342 Takip Edilen3.2K Takipçiler
Ian Goodfellow
Ian Goodfellow@goodfellow_ian·
I'd like to thank @daniel_rossett for his help in my recovery from the POTS version of Long COVID. Daniel was key in bringing me back from highly disabled and suffering to being able to do what I want to again. This X account is mostly focused on ML / AI. From that point of view, many of you know that in December 2024, I wasn't able to do the test of time award talk at NeurIPS, even by video call. Daniel started working with me in March 2025. By April, I started to have days of no POTS symptoms, by June I was off all heart rate lowering medications, by September I was back to work. I'm back to full exercise, running, lifting weights, mountain biking, and have even done things I hadn't done before I got sick, like riding Whistler Mountain Bike Park. I'm now getting the word out to help Daniel build a company that will bring this approach to more people.
English
169
83
2.6K
200.4K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
Self-Improving Pretraining We've updated our results given feedback: - larger 8B baseline to match reward model size - cross-task evals given different RM objectives Overall, we see clear wins
Jason Weston tweet media
English
4
25
170
8.8K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
We also have a postdoc position if that's what you are looking for
Jason Weston@jaseweston

Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English
0
0
10
2.1K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
Our team is hiring! If you like to work on cool research projects, please apply :)
Jason Weston@jaseweston

Our team in FAIR at Meta is hiring a (full-time) researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM) for self-improvement & co-improvement. Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English
6
9
152
19.8K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927
English
10
44
263
32.7K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
If you are a PhD student in Berkeley or one of these universities, you can apply to our mentorship program and do research with us! The deadline is this Friday though linkedin.com/posts/adinawil…
English
0
3
33
4.2K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
Our co-improvement position paper is now on arXiv! (We've updated it, covering more existing work.) 📝: arxiv.org/abs/2512.05356 After >27 years of research, my first position paper! Short 🧵 (1/5) follows 👇 Synopsis: it's about building AI that collaborates on AI research *with us* to solve AI faster, and to help fix the alignment problem together. How? Build the AI with those collab skills (i.e., we create benchmarks! training data! methods! etc. for that). I've been personally inspired by @Yoshua_Bengio's recent talks on safety & AI research, and also from seeing Nicholas Carlini's COLM keynote where he said we researchers can all do our bit to help (paraphrased). So – hope this helps! 🙏
Jason Weston tweet media
English
7
40
245
28K
Sainbayar Sukhbaatar retweetledi
Rimsha Bhardwaj
Rimsha Bhardwaj@heyrimsha·
Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.
Rimsha Bhardwaj tweet media
English
39
78
482
32.2K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
🤝 New Position Paper !!👤🔄🤖 @j_foerst and I wrote a position piece on what we think is the path to safer superintelligence: co-improvement. Everyone is focused on self-improving AI, but (1) we don't know how to do it yet, and (2) it might be misaligned with humans. Co-improvement: instead, build AI that collaborates *with us* to solve AI faster, and to help fix the alignment problem together. More details in the paper! Read it here: 📝:github.com/facebookresear…
Jason Weston tweet media
English
26
96
509
84.8K
Alex Rives
Alex Rives@alexrives·
Today CZI is announcing an unprecedented new scientific initiative to build the future of AI-powered biology. I am joining CZI to lead this initiative as Head of Science, and the EvolutionaryScale team is joining forces with Biohub. This is the first large scale scientific effort to combine frontier AI and frontier biology. I feel an incredible sense of optimism in this moment. There is a path to build predictive models of life that can fundamentally accelerate science, and unlock a new understanding of disease. biohub.org/blog/frontier-…
English
51
64
601
225.6K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
🌶️SPICE: Self-Play in Corpus Environments🌶️ 📝: arxiv.org/abs/2510.24684 - Challenger creates tasks based on *corpora* - Reasoner solves them - Both trained together ⚔️ -> automatic curriculum! 🔥 Outperforms standard (ungrounded) self-play Grounding fixes hallucination & lack of diversity 🧵1/6
Jason Weston tweet media
English
8
56
335
79.8K
Sainbayar Sukhbaatar retweetledi
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
.@tesatory hasn’t aged since RAM’15! is that the magic of attention and memory? #COLM2025
Kyunghyun Cho tweet media
English
0
2
11
2.8K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
@QuackerEnte Not really because all we do is add a convolution operation without changing dimensions of things. So more compute, but same memory usage
English
1
0
0
28
QuackerEnte
QuackerEnte@QuackerEnte·
@tesatory that sounds like a simple yet great idea! But does it use more memory?
English
1
0
0
26
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
Heading to COLM! Presenting two papers: Multi-Token Attention for augmenting softmax attention for more precision, and COCONUT 🥥 for continuous CoT reasoning. Oh also speaking at RAM2 🐏 workshop about memory 🧠
English
1
2
24
1.6K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
@QuackerEnte Each attention weight is conditioned on only one key and one query vector. Our method makes it possible to condition on multiple vectors, so it can be more fine-grained and information rich
English
1
0
0
47
QuackerEnte
QuackerEnte@QuackerEnte·
@tesatory I thought attention was multi-token by nature? What's the difference?
English
1
0
1
48
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
🌀New Self-Driven RL Method: RESTRAIN 🌀 📝: arxiv.org/abs/2510.02172 - RESTRAIN turns spurious votes → self-Improving signals. No labels needed - Does this through self-penalizing unreliable reasoning paths: ✔️ Uses all rollouts, not just the majority, ✔️ Offsets low-consistency rollout advantage, ✔️ Down-weights low-consensus prompts 📈 Results: 🔥 Beats existing techniques on both training-time (label-free) and test-time scaling — all without labels. 🔥 Nearly matches (and sometimes surpasses) gold-label RL 🧵(1/5)
Jason Weston tweet media
English
4
39
195
12.9K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
It’s fascinating that brain has waves, ie frequency. During focused intense tasks, it switches to higher frequency, kind of like CPU overclocking
English
0
1
11
547
Sainbayar Sukhbaatar retweetledi
Maria Lomeli
Maria Lomeli@MariaLomeli_·
🚨New paper: Stochastic activations We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.
English
8
16
125
61.2K