Mahtab Sarvmaili

770 posts

Mahtab Sarvmaili

@MahtabSarvmaili

love AI 🤖

🇦🇶 Katılım Haziran 2020

721 Takip Edilen74 Takipçiler

Mahtab Sarvmaili retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·5d

Excited to share that my ICLR 2026 Oral Talk for GEPA is available on YouTube. I go deeper into why GEPA works better than prior optimization techniques, along with touching on many aspects of GEPA! youtu.be/HbGah-uP1fI

YouTube

Lakshya A Agrawal@LakshyAAAgrawal

Thrilled to present GEPA as an Oral Talk and Poster at ICLR 2026 this Friday in Rio! 🇧🇷 Apr 24 Oral Session 3A (Agents), 10:30 AM BRT, Amphitheater Poster Session 4, 3:15 PM, Pavilion 3 x.com/LakshyAAAgrawa… Let's recap what's happened since we released GEPA last year 🧵

English

241

29.4K

Mahtab Sarvmaili retweetledi

Dwarkesh Patel@dwarkesh_sp·5d

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

English

146

595

6.5K

1.2M

Mahtab Sarvmaili retweetledi

Utkarsh@utk7arsh·6d

I thought robotics was for PhDs and billion dollar labs. Then I found this repo where NVIDIA open-sourced the entire stack for physical AI. Brain. Body. Physics. Simulation. Free. I wrote the full breakdown and what projects you can start building today to get ahead.

Utkarsh@utk7arsh

x.com/i/article/2048…

English

617

65.5K

Mahtab Sarvmaili retweetledi

Zhijing Jin@ZhijingJin·6d

What happens when you put #LLM agents in a room and ask them to cooperate? They collapse. They free-ride. They form social networks. We spent 2+ years building a full research series on Multi-Agent LLM Safety. Here's a 50-min talk covering all of it: 🔗 youtube.com/watch?v=1MxpYJ…

YouTube

English

6.3K

Mahtab Sarvmaili retweetledi

Neel Nanda@NeelNanda5·6d

Very cool work - a LoRA you add to any finetune of a model, and then it tells you what it was finetuned for!

keshav@kshenoy_

Can LLMs simply tell us about unwanted behaviors they’ve picked up in training? We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors. It generalizes to detecting hidden misalignment, backdoors and safeguard removal.

English

321

28K

Mahtab Sarvmaili retweetledi

Rishabh Agarwal@agarwl_·6d

I gave a talk at ICLR 2026 about how we are scaling RL on frontier LLMs with 1T+ parameters, on experimental data from our physical lab at Periodic! Here's a rough recording of the talk:

English

171

1.8K

203.6K

Mahtab Sarvmaili retweetledi

Stanford NLP Group@stanfordnlp·22 Nis

.@stanfordnlp folk headed to Rio! 🇧🇷 #NLProc #ICLR2026 Ethan Hsu, Hongmeng Yam, @mdoumbouya WebDS An End-to-End Benchmark for Web-based Data Science Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity iclr.cc/virtual/2026/p…

English

4.5K

Mahtab Sarvmaili retweetledi

Hanqi Yan@yan_hanqi·21 Nis

🧠 Mechanistic interpretability is obsessed with features. But what if gradients tell you more? 📐 Introducing GRADE — using gradient subspace dynamics to measure how far an LLM is from the correct answer, probing knowledge gaps at their root. 🔍 📄 Paper: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics 🔗 arxiv.org/pdf/2604.02830

English

197

11.2K

Mahtab Sarvmaili retweetledi

MIT CSAIL@MIT_CSAIL·21 Nis

Today, MIT & the IMO released MathNet, the world’s largest dataset of International Math Olympiad problems & solutions 🌍 MathNet is 5x larger than previous datasets & is sourced from over 40 countries across 4 decades: bit.ly/4u1bhBC

English

543

2.1K

193.6K

Mahtab Sarvmaili retweetledi

Nav Toor@heynavtoor·20 Nis

Your "hallucination-free" RAG system trusts its retrieval layer. Researchers just proved that 5 documents, planted in a database of 2.6 million, can hijack the LLM's answer 97% of the time. The attacker never touches your model. They never see your retriever. They just write a document. This is PoisonedRAG. 🧵

English

127

577

44.5K

Mahtab Sarvmaili@MahtabSarvmaili·22 Nis

mathnet.csail.mit.edu

ZXX

Mahtab Sarvmaili retweetledi

Nathan Lambert@natolambert·14 Nis

Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months. At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods. YT playlist and course landing page below.

English

236

1.7K

184.5K

Mahtab Sarvmaili retweetledi

Vivo@vivoplt·13 Nis

Research papers you must read for AI Engineer interviews: 1. Attention is all you need (Transformers) 2. LoRA (Low rank adaption) 3. PEFT ( Parameter Efficient Fine Tuning) 4. VIT (Vision Transformers) 5. VAE (Variational Auto Encoder) 6. GANs ( Generative Adversarial Networks) 7. BERT ( Bidirectional Encoder Representation from Transformers) 8. Diffusion Models (Stable Diffusion) 9. RAG (Retrieval Augment Generation) 10. GPT (Generative Pre-trained Transformers)

English

292

2.5K

109.6K

Mahtab Sarvmaili retweetledi

Didier Lopes@didier_lopes·12 Nis

This was a really good read. h/t @guohao_li

English

393

45.5K

Mahtab Sarvmaili retweetledi

𒐪@SHL0MS·12 Nis

introducing Autoreason, a reasoning method inspired by @karpathy's AutoResearch which extends the strategy for subjective domains the paper was co-written with Hermes Agent by @NousResearch, using a research-paper-writing skill developed while writing it paper + results below

English

156

1.4K

305.6K

Mahtab Sarvmaili retweetledi

elvis@omarsar0·11 Nis

NEW paper from Meta. (bookmark this one) What if the model wasn't just using the computer, but became the computer? New research from Meta AI and KAUST makes a serious case for Neural Computers (NCs). The paper proposes NCs as learned runtimes where computation, memory, and I/O live inside a single latent state. Their first prototypes use video models to roll out terminal and GUI interfaces from prompts, pixels, and user actions. Why does it matter? Today's agents still depend on external computers to store state, execute actions, and enforce system contracts. Neural Computers point to a different machine form: one where interface dynamics, working memory, and execution are learned together. The early results are promising but grounded. CLI rendering improves, GUI cursor control reaches 98.7% with explicit visual supervision, and reprompting boosts arithmetic-probe accuracy from 4% to 83%. But symbolic reliability, stable reuse, and runtime governance remain open. This is less "agents got better" and more "what comes after agents as a computing substrate?" Paper: arxiv.org/abs/2604.06425 Learn to build effective AI agents in our academy: academy.dair.ai

English

505

61K

Mahtab Sarvmaili retweetledi

Cas (Stephen Casper)@StephenLCasper·9 Nis

🧵🧵🧵 A provocation to the mechanistic interpretability researchers of the world...

English

154

17.7K

Mahtab Sarvmaili retweetledi

Brian Roemmele@BrianRoemmele·7 Nis

We at The Zero-Human Company have been testing MemPalace by the amazing @bensig and Milla Jovovich and are absolutely blown away! It is a freaking masterpiece and we have deployed it to 79 employees at the company. Each worker will be testing and expanding on MemPalace. I will have a lot to say about how we are using it and how you should to.

Ben Sigman@bensig

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…

English

1.2K

186.9K

Mahtab Sarvmaili retweetledi

Yacine Mahdid@yacinelearning·7 Nis

for those interested in distributed reinforcement learning I just finished a ~1h tutorial on the echo2 framework by @Gradient_HQ we check: - how to do async RL - infra split between rollout workers and centralized learner - interview with gradient cofounder eric yang himself!

English

402

40.8K

Mahtab Sarvmaili retweetledi

AVB@neural_avb·5 Nis

People interested in model interpretability check out this gold. The "Circuits" Thread A series of exploratory research by Chris Olah himself and team when he was with OpenAI around 2020-2021. Circuits are sub-graphs of the network, consisting a set of linked features and the weights. These articles are trying to "reverse engineer" neural nets and finding these subgraphs. Shoutout to @exploding_grad for unearthing this and sharing. This is what I'll be passively consuming this week I guess... Link in attached tweet.

kendrick@exploding_grad

@neural_avb If you want to go deep down the rabbit hole - start with the distil circuits thread. distill.pub/2020/circuits/

English

329

25.9K

Keşfet

@reinerpope @stanfordnlp @mdoumbouya @guohao_li @karpathy @NousResearch @bensig @Gradient_HQ