Mann Patel

312 posts

Mann Patel banner
Mann Patel

Mann Patel

@punsbymann

post training @CapitalOne | prev ml @google, GLAMOR LAB @USC | rl, interpretability and all things compute

Los Angeles, CA เข้าร่วม Mart 2020
754 กำลังติดตาม334 ผู้ติดตาม
ทวีตที่ปักหมุด
Mann Patel
Mann Patel@punsbymann·
first day being noogler at @Google
Mann Patel tweet media
English
0
0
17
2.9K
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
I think GLM 5.2 points to a 7 months gap currently It's around Opus 4.7-4.8 level, all told (modulo vision which in Opus's case is garbage anyway). Mythos reached Preview status (≥ Opus 4.8, functionally) by early Feb 2026. This means full PRC Mythos ("Fable") by Nov-Dec'26.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Lunexa@Lunexalith

@teortaxesTex What's your current timeline for china to reach Fable class ? GLM-5.2 certainly shorten the gap.

English
63
101
1.4K
517.4K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Pretraining fundamentally does not make sense anymore for anyone other than frontier labs. Although there are a lot of people at enterprises & startups who have "Pretrainitis" to show “impact” and get promotions, fundamentally, it doesn’t make sense. There is probably higher ROI in partnering with a frontier lab to do prompt engineering, although it isn’t as “sexy” as pretraining.
SemiAnalysis tweet media
English
44
27
635
70.4K
JMZ
JMZ@jmzoeee·
my last day of tech week, back to the cave
JMZ tweet media
Manhattan, NY 🇺🇸 English
1
0
4
923
Mann Patel รีทวีตแล้ว
Thinking Machines
Thinking Machines@thinkymachines·
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…
English
465
2K
15.8K
7.8M
Mann Patel รีทวีตแล้ว
Will Held
Will Held@WilliamBarrHeld·
To train better open models, we need predictable scaling. Delphi is Marin’s first step: we pretrained many small models with one recipe, then extrapolated 300× to predict a 25B-param / 600B-token run with just 0.2% error. Getting there took some work 🧵
English
14
78
460
138.6K
Mann Patel
Mann Patel@punsbymann·
everyone benchmarks pass@k. everyone deploys pass^k.
English
0
0
0
38
Mann Patel รีทวีตแล้ว
finbarr
finbarr@finbarrtimbers·
The best part of my job is I get to play with GPUs all day and someone else pays for it
English
7
6
218
11.5K
Mann Patel รีทวีตแล้ว
Jamie Simon
Jamie Simon@learning_mech·
1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 arxiv.org/pdf/2604.21691 🔧
Jamie Simon tweet media
English
54
291
1.5K
305.4K
Mann Patel รีทวีตแล้ว
Thoughtful
Thoughtful@thoughtfullab·
Model shaping is still a craft of a few. That's what AI agents are for: learning it and doing it for everyone else. As a part of FrontierSWE benchmark we built a 20-hour post-training task on @tinkerapi and found the real bottleneck is research intuition.
English
10
53
521
216.6K
Mann Patel
Mann Patel@punsbymann·
@littewhite16806 most interesting part of this little project was writing a selenium verifier to 16personalities site and letting llm answer 5 scale questions haha
English
0
0
1
23
Bo Hui
Bo Hui@littewhite16806·
@punsbymann Nice work as a class project. Our method does not require training 8 different models. It looks like we generate several models from one, and your paper fuses several into one. Also, PEFT is different from pruning (e.g., pruning is not frozen). Welcome to stop by our poster! 😀
English
2
0
1
54
Bo Hui
Bo Hui@littewhite16806·
Your language model isn’t one person—it’s everyone. Check out Personality Subnetworks (ICLR 2026): a training-free framework to extract persona-specialized subnetworks. A step towards controllable and interpretable personalization in LLMs. Paper: arxiv.org/pdf/2602.07164
Bo Hui tweet media
English
8
42
345
36.1K
Mann Patel
Mann Patel@punsbymann·
@littewhite16806 your work is totally unique and kind of opposite to mine! while you prune and prove personas already exist in your model (which sounds about right), i tried to train Lora’s that show composition of persona😄
English
0
0
1
18
Mann Patel รีทวีตแล้ว
Michael Lee
Michael Lee@ChiahsuanL·
🧵 Decomposing the Delta: What Do Models Actually Learn from Preference Pairs? 1/n 💡 Why do methods like DPO and KTO actually improve reasoning? In standard alignment, we use preference pairs, but we don't fully understand what properties of the data drive downstream gains. We investigate two distinct notions of quality: Generator-level Delta - the capability gap between the models producing the chosen vs. rejected traces Sample-level Delta - The fine-grained quality difference within a single pair (Factuality, Coherence, Precision) 👇
English
2
3
7
357
Mann Patel รีทวีตแล้ว
Francesco Bertolotti
Francesco Bertolotti@f14bertolotti·
This paper digs into what actually makes delta learning work. The authors show that it’s not just about the chosen trajectory being correct relative to the rejected one. What drives the gains is the coherence of reasoning across the trajectory. 🔗arxiv.org/abs/2604.08723
Francesco Bertolotti tweet mediaFrancesco Bertolotti tweet mediaFrancesco Bertolotti tweet mediaFrancesco Bertolotti tweet media
English
1
12
43
4.3K