Sainbayar Sukhbaatar

1.4K posts

Sainbayar Sukhbaatar banner
Sainbayar Sukhbaatar

Sainbayar Sukhbaatar

@tesatory

Memory Networks, Asymmetric Self-Play, CommNet, Adaptive-Span, System2Attention, Feedback Transformer, Multi-Token Attention

Katılım Mayıs 2010
341 Takip Edilen3.2K Takipçiler
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
💎Autodata: an agentic data scientist to create high quality data✨ We introduce a method for building agents that create high-quality training & evaluation data. Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*. We show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data. Our initial study with a specific practical implementation, Agentic Self-Instruct, shows strong gains on scientific reasoning problems compared to classical synthetic dataset creation methods. Overall, we believe this direction has the potential to change how we build AI data! Read more in the blog post: facebookresearch.github.io/RAM/blogs/auto…
Jason Weston tweet media
English
0
103
615
41.2K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
DeepSeek-V4 uses our Hash routing approach developed back in 2021 -- see screenshot of their tech report! (Looks like a great model, congrats!) Bonus note: our same blogpost (& paper) back in 2021 also introduced 'looped transformers', but we called that staircase & ladder (see screenshot): parl.ai/projects/param… huggingface.co/deepseek-ai/De…
Jason Weston tweet mediaJason Weston tweet media
English
0
38
456
31.4K
Ian Goodfellow
Ian Goodfellow@goodfellow_ian·
I'd like to thank @daniel_rossett for his help in my recovery from the POTS version of Long COVID. Daniel was key in bringing me back from highly disabled and suffering to being able to do what I want to again. This X account is mostly focused on ML / AI. From that point of view, many of you know that in December 2024, I wasn't able to do the test of time award talk at NeurIPS, even by video call. Daniel started working with me in March 2025. By April, I started to have days of no POTS symptoms, by June I was off all heart rate lowering medications, by September I was back to work. I'm back to full exercise, running, lifting weights, mountain biking, and have even done things I hadn't done before I got sick, like riding Whistler Mountain Bike Park. I'm now getting the word out to help Daniel build a company that will bring this approach to more people.
English
171
83
2.7K
206.2K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
Self-Improving Pretraining We've updated our results given feedback: - larger 8B baseline to match reward model size - cross-task evals given different RM objectives Overall, we see clear wins
Jason Weston tweet media
English
4
25
170
9.1K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
We also have a postdoc position if that's what you are looking for
Jason Weston@jaseweston

Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English
0
0
11
2.1K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
Our team is hiring! If you like to work on cool research projects, please apply :)
Jason Weston@jaseweston

Our team in FAIR at Meta is hiring a (full-time) researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM) for self-improvement & co-improvement. Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English
6
9
151
19.9K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927
English
10
44
262
33.2K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
If you are a PhD student in Berkeley or one of these universities, you can apply to our mentorship program and do research with us! The deadline is this Friday though linkedin.com/posts/adinawil…
English
0
3
33
4.2K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
Our co-improvement position paper is now on arXiv! (We've updated it, covering more existing work.) 📝: arxiv.org/abs/2512.05356 After >27 years of research, my first position paper! Short 🧵 (1/5) follows 👇 Synopsis: it's about building AI that collaborates on AI research *with us* to solve AI faster, and to help fix the alignment problem together. How? Build the AI with those collab skills (i.e., we create benchmarks! training data! methods! etc. for that). I've been personally inspired by @Yoshua_Bengio's recent talks on safety & AI research, and also from seeing Nicholas Carlini's COLM keynote where he said we researchers can all do our bit to help (paraphrased). So – hope this helps! 🙏
Jason Weston tweet media
English
7
40
245
28.2K
Sainbayar Sukhbaatar retweetledi
Rimsha Bhardwaj
Rimsha Bhardwaj@heyrimsha·
Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.
Rimsha Bhardwaj tweet media
English
39
78
477
32.3K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
🤝 New Position Paper !!👤🔄🤖 @j_foerst and I wrote a position piece on what we think is the path to safer superintelligence: co-improvement. Everyone is focused on self-improving AI, but (1) we don't know how to do it yet, and (2) it might be misaligned with humans. Co-improvement: instead, build AI that collaborates *with us* to solve AI faster, and to help fix the alignment problem together. More details in the paper! Read it here: 📝:github.com/facebookresear…
Jason Weston tweet media
English
26
95
508
85.4K
Alex Rives
Alex Rives@alexrives·
Today CZI is announcing an unprecedented new scientific initiative to build the future of AI-powered biology. I am joining CZI to lead this initiative as Head of Science, and the EvolutionaryScale team is joining forces with Biohub. This is the first large scale scientific effort to combine frontier AI and frontier biology. I feel an incredible sense of optimism in this moment. There is a path to build predictive models of life that can fundamentally accelerate science, and unlock a new understanding of disease. biohub.org/blog/frontier-…
English
51
63
603
229.2K
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
🌶️SPICE: Self-Play in Corpus Environments🌶️ 📝: arxiv.org/abs/2510.24684 - Challenger creates tasks based on *corpora* - Reasoner solves them - Both trained together ⚔️ -> automatic curriculum! 🔥 Outperforms standard (ungrounded) self-play Grounding fixes hallucination & lack of diversity 🧵1/6
Jason Weston tweet media
English
8
54
332
80K
Sainbayar Sukhbaatar retweetledi
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
.@tesatory hasn’t aged since RAM’15! is that the magic of attention and memory? #COLM2025
Kyunghyun Cho tweet media
English
0
2
11
2.8K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
@QuackerEnte Not really because all we do is add a convolution operation without changing dimensions of things. So more compute, but same memory usage
English
1
0
0
29
QE
QE@QuackerEnte·
@tesatory that sounds like a simple yet great idea! But does it use more memory?
English
1
0
0
27
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
Heading to COLM! Presenting two papers: Multi-Token Attention for augmenting softmax attention for more precision, and COCONUT 🥥 for continuous CoT reasoning. Oh also speaking at RAM2 🐏 workshop about memory 🧠
English
1
2
24
1.7K
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar@tesatory·
@QuackerEnte Each attention weight is conditioned on only one key and one query vector. Our method makes it possible to condition on multiple vectors, so it can be more fine-grained and information rich
English
1
0
0
48
QE
QE@QuackerEnte·
@tesatory I thought attention was multi-token by nature? What's the difference?
English
1
0
1
49
Sainbayar Sukhbaatar retweetledi
Jason Weston
Jason Weston@jaseweston·
🌀New Self-Driven RL Method: RESTRAIN 🌀 📝: arxiv.org/abs/2510.02172 - RESTRAIN turns spurious votes → self-Improving signals. No labels needed - Does this through self-penalizing unreliable reasoning paths: ✔️ Uses all rollouts, not just the majority, ✔️ Offsets low-consistency rollout advantage, ✔️ Down-weights low-consensus prompts 📈 Results: 🔥 Beats existing techniques on both training-time (label-free) and test-time scaling — all without labels. 🔥 Nearly matches (and sometimes surpasses) gold-label RL 🧵(1/5)
Jason Weston tweet media
English
4
38
194
13K