Angehefteter Tweet
Mann Patel
312 posts

Mann Patel
@punsbymann
post training @CapitalOne | prev ml @google, GLAMOR LAB @USC | rl, interpretability and all things compute
Los Angeles, CA Beigetreten Mart 2020
754 Folgt334 Follower

I think GLM 5.2 points to a 7 months gap currently
It's around Opus 4.7-4.8 level, all told (modulo vision which in Opus's case is garbage anyway). Mythos reached Preview status (≥ Opus 4.8, functionally) by early Feb 2026. This means full PRC Mythos ("Fable") by Nov-Dec'26.

Lunexa@Lunexalith
@teortaxesTex What's your current timeline for china to reach Fable class ? GLM-5.2 certainly shorten the gap.
English

Pretraining fundamentally does not make sense anymore for anyone other than frontier labs. Although there are a lot of people at enterprises & startups who have "Pretrainitis" to show “impact” and get promotions, fundamentally, it doesn’t make sense.
There is probably higher ROI in partnering with a frontier lab to do prompt engineering, although it isn’t as “sexy” as pretraining.

English

guess who fumbled the bag hard…congrats!!
Exa@ExaAILabs
We raised $250M in Series C funding at a $2.2B valuation, led by a16z. Exa is a search lab organizing the web's data for agents.
English
Mann Patel retweetet

Codex with gpt-5.5 xhigh discovered a math trick for full vocabulary kl distillation
jonathanc.net/blog/kl-cache-…
English
Mann Patel retweetet

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action.
thinkingmachines.ai/blog/interacti…
English
Mann Patel retweetet
Mann Patel retweetet
Mann Patel retweetet

1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics!
We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics.
🔨 arxiv.org/pdf/2604.21691 🔧

English
Mann Patel retweetet

Model shaping is still a craft of a few. That's what AI agents are for: learning it and doing it for everyone else.
As a part of FrontierSWE benchmark we built a 20-hour post-training task on @tinkerapi and found the real bottleneck is research intuition.
English

@littewhite16806 most interesting part of this little project was writing a selenium verifier to 16personalities site and letting llm answer 5 scale questions haha
English

@punsbymann Nice work as a class project. Our method does not require training 8 different models. It looks like we generate several models from one, and your paper fuses several into one. Also, PEFT is different from pruning (e.g., pruning is not frozen). Welcome to stop by our poster! 😀
English

Your language model isn’t one person—it’s everyone.
Check out Personality Subnetworks (ICLR 2026): a training-free framework to extract persona-specialized subnetworks. A step towards controllable and interpretable personalization in LLMs.
Paper: arxiv.org/pdf/2602.07164

English

@littewhite16806 your work is totally unique and kind of opposite to mine! while you prune and prove personas already exist in your model (which sounds about right), i tried to train Lora’s that show composition of persona😄
English
Mann Patel retweetet

the benchmark game has entered its IPO era

Claude@claudeai
Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.
English
Mann Patel retweetet

🧵 Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?
1/n 💡 Why do methods like DPO and KTO actually improve reasoning? In standard alignment, we use preference pairs, but we don't fully understand what properties of the data drive downstream gains.
We investigate two distinct notions of quality: Generator-level Delta - the capability gap between the models producing the chosen vs. rejected traces
Sample-level Delta - The fine-grained quality difference within a single pair (Factuality, Coherence, Precision)
👇
English
Mann Patel retweetet

This paper digs into what actually makes delta learning work. The authors show that it’s not just about the chosen trajectory being correct relative to the rejected one. What drives the gains is the coherence of reasoning across the trajectory.
🔗arxiv.org/abs/2604.08723




English





