Mann Patel

312 posts

Mann Patel

@punsbymann

post training @CapitalOne | prev ml @google, GLAMOR LAB @USC | rl, interpretability and all things compute

Los Angeles, CA Beigetreten Mart 2020

754 Folgt334 Follower

Angehefteter Tweet

Mann Patel@punsbymann·16 May

first day being noogler at @Google ✨

English

2.9K

Mann Patel@punsbymann·5d

@jietang @yacineMTB @elonmusk @teortaxesTex AURA

Euskara

118

jietang@jietang·5d

@elonmusk @teortaxesTex won’t take that long

English

251

461

5.7K

1.7M

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·6d

I think GLM 5.2 points to a 7 months gap currently It's around Opus 4.7-4.8 level, all told (modulo vision which in Opus's case is garbage anyway). Mythos reached Preview status (≥ Opus 4.8, functionally) by early Feb 2026. This means full PRC Mythos ("Fable") by Nov-Dec'26.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Lunexa@Lunexalith

@teortaxesTex What's your current timeline for china to reach Fable class ? GLM-5.2 certainly shorten the gap.

English

101

1.4K

517.3K

Mann Patel@punsbymann·12 Haz

@SemiAnalysis_ dylan i like my job, let me have it 😭

English

226

SemiAnalysis@SemiAnalysis_·12 Haz

Pretraining fundamentally does not make sense anymore for anyone other than frontier labs. Although there are a lot of people at enterprises & startups who have "Pretrainitis" to show “impact” and get promotions, fundamentally, it doesn’t make sense. There is probably higher ROI in partnering with a frontier lab to do prompt engineering, although it isn’t as “sexy” as pretraining.

English

635

70.4K

Mann Patel@punsbymann·6 Haz

@jmzoeee naur

Eesti

234

JMZ@jmzoeee·6 Haz

my last day of tech week, back to the cave

Manhattan, NY 🇺🇸 English

923

Mann Patel@punsbymann·4 Haz

@radixark raise @ly_h990 ‘s salary, she is a overworking herself! :3

English

Mann Patel@punsbymann·21 May

guess who fumbled the bag hard…congrats!!

Exa@ExaAILabs

We raised $250M in Series C funding at a $2.2B valuation, led by a16z. Exa is a search lab organizing the web's data for agents.

English

163

Mann Patel retweetet

Jonathan Chang@ChangJonathanC·12 May

Codex with gpt-5.5 xhigh discovered a math trick for full vocabulary kl distillation jonathanc.net/blog/kl-cache-…

English

312

40.5K

Mann Patel retweetet

Thinking Machines@thinkymachines·11 May

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

465

15.8K

7.8M

Mann Patel retweetet

Will Held@WilliamBarrHeld·11 May

To train better open models, we need predictable scaling. Delphi is Marin’s first step: we pretrained many small models with one recipe, then extrapolated 300× to predict a 25B-param / 600B-token run with just 0.2% error. Getting there took some work 🧵

English

460

138.6K

Mann Patel@punsbymann·1 May

everyone benchmarks pass@k. everyone deploys pass^k.

English

Mann Patel@punsbymann·27 Nis

@finbarrtimbers same

English

Mann Patel retweetet

finbarr@finbarrtimbers·27 Nis

The best part of my job is I get to play with GPUs all day and someone else pays for it

English

218

11.5K

Mann Patel retweetet

Jamie Simon@learning_mech·24 Nis

1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 arxiv.org/pdf/2604.21691 🔧

English

291

1.5K

305.4K

Mann Patel retweetet

Thoughtful@thoughtfullab·23 Nis

Model shaping is still a craft of a few. That's what AI agents are for: learning it and doing it for everyone else. As a part of FrontierSWE benchmark we built a 20-hour post-training task on @tinkerapi and found the real bottleneck is research intuition.

English

521

216.6K

Mann Patel@punsbymann·22 Nis

@littewhite16806 most interesting part of this little project was writing a selenium verifier to 16personalities site and letting llm answer 5 scale questions haha

English

Bo Hui@littewhite16806·22 Nis

@punsbymann Nice work as a class project. Our method does not require training 8 different models. It looks like we generate several models from one, and your paper fuses several into one. Also, PEFT is different from pruning (e.g., pruning is not frozen). Welcome to stop by our poster! 😀

English

Bo Hui@littewhite16806·17 Nis

Your language model isn’t one person—it’s everyone. Check out Personality Subnetworks (ICLR 2026): a training-free framework to extract persona-specialized subnetworks. A step towards controllable and interpretable personalization in LLMs. Paper: arxiv.org/pdf/2602.07164

English

345

36.1K

Mann Patel@punsbymann·22 Nis

@littewhite16806 your work is totally unique and kind of opposite to mine! while you prune and prove personas already exist in your model (which sounds about right), i tried to train Lora’s that show composition of persona😄

English

Mann Patel retweetet

Psyho@FakePsyho·16 Nis

the benchmark game has entered its IPO era

Claude@claudeai

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English

489

8.6K

357.9K

Mann Patel retweetet

Michael Lee@ChiahsuanL·14 Nis

🧵 Decomposing the Delta: What Do Models Actually Learn from Preference Pairs? 1/n 💡 Why do methods like DPO and KTO actually improve reasoning? In standard alignment, we use preference pairs, but we don't fully understand what properties of the data drive downstream gains. We investigate two distinct notions of quality: Generator-level Delta - the capability gap between the models producing the chosen vs. rejected traces Sample-level Delta - The fine-grained quality difference within a single pair (Factuality, Coherence, Precision) 👇

English

357

Mann Patel@punsbymann·13 Nis

@f14bertolotti @ChiahsuanL @MingyangKevinZh 🤩

QME

Mann Patel retweetet

Francesco Bertolotti@f14bertolotti·13 Nis

This paper digs into what actually makes delta learning work. The authors show that it’s not just about the chosen trajectory being correct relative to the rejected one. What drives the gains is the coherence of reasoning across the trajectory. 🔗arxiv.org/abs/2604.08723

English

4.3K

Entdecken

@jietang @yacineMTB @elonmusk @teortaxesTex @SemiAnalysis_ @jmzoeee @radixark @ly_h990