Anchit Gupta

2.2K posts

Anchit Gupta banner
Anchit Gupta

Anchit Gupta

@anzhit

Reasoning @ xAI, ex Meta, Stanford, IIT Bombay

Inscrit le Temmuz 2009
370 Abonnements543 Abonnés
Tweet épinglé
Anchit Gupta
Anchit Gupta@anzhit·
Grok 4.20 was the most complex training run yet. We optimized for the highest intelligence density at each step without trading off real world usability through a novel recipe. More improvements and evals to come as the beta matures 🚀
Arena.ai@arena

Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena! Highlights: - #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3 - #4 in Text, scoring 1492 on par with Gemini 3.1 Pro Congrats to the @xAI team and @elonmusk on this impressive milestone!

English
19
18
512
21.8K
Anchit Gupta
Anchit Gupta@anzhit·
Trivia tidbit: this started out as a quick side project we whipped up in the last few days of the release sprint. The model on grok.com is actually even better for health and medicine ⚕️ compared to the one on lmarena. Real-world usability remains the focus as we scale up grok.
Arthur MacWaters@ArthurMacwaters

> grok4.20-beta1 is a much smaller model than opus but is #1 ranked in medicine and healthcare > 4.3 and 4.4 will be much larger models, and likely will have a significant boost in performance on complex medical cases > this is massively important in providing accurate diagnostic guidance and advice to both providers and patients

English
3
1
9
680
Anchit Gupta
Anchit Gupta@anzhit·
@TheZvi I don't think it's a big deal, large model runs have 100s of such bugs. Props to them for disclosing a few of them
English
0
0
0
376
Cernovich
Cernovich@Cernovich·
Grok is the search engine that Google used to be. The results are actually what you were looking for, and then some. 10/10.
English
456
697
7.3K
741.4K
Guodong Zhang
Guodong Zhang@Guodzh·
Last day at xAI. Wild journey past three years but excited about next chapter. Thanks all for the love and support yesterday. So many friends made along the way and I will miss you all!
English
236
62
2.5K
657.3K
Anchit Gupta
Anchit Gupta@anzhit·
@techdevnotes For some reason search arena is not style controlled by default. So you can gain votes by being longer
Anchit Gupta tweet media
English
2
2
16
1.1K
Tech Dev Notes
Tech Dev Notes@techdevnotes·
Mf won’t let us breathe
Tech Dev Notes tweet media
English
6
2
92
10.4K
Greer
Greer@turbo_xo_·
@anzhit Do you feel like there is secret sauce, for opus and codex 5.3 for example, or do you feel like everyone knows what to do, it’s just a matter of engineering speed and compute? Why is Claude so good?
English
1
0
11
541
Anchit Gupta
Anchit Gupta@anzhit·
Grok 4.20 was the most complex training run yet. We optimized for the highest intelligence density at each step without trading off real world usability through a novel recipe. More improvements and evals to come as the beta matures 🚀
Arena.ai@arena

Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena! Highlights: - #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3 - #4 in Text, scoring 1492 on par with Gemini 3.1 Pro Congrats to the @xAI team and @elonmusk on this impressive milestone!

English
19
18
512
21.8K
Flowers ☾
Flowers ☾@flowersslop·
What I dont get about multi agent setups, especially cloned agents with different system prompts like Grok 4.20: Why not just spend more test time compute on a single agent? Splitting budget across agents with lossy communication means each gets less depth. Whats the gain?
English
15
0
38
4.2K
Anchit Gupta
Anchit Gupta@anzhit·
I guess another way to put my question: For many problems(eg. long horizon agentic ones) you want to incentivize the model to explore, try different approaches and then arrive at a solution. As teacher knows the GT it becomes v off-policy to student trace and it might not incentivize exploration in the student COT and could hurt generalization
English
0
0
4
123
Siyan Zhao
Siyan Zhao@siyan_zhao·
Thanks for your interest! we do not let the teacher generate any tokens and the internalization is done in a single forward pass through prefilling. We did try variants where the teacher verbally understands the ground truth first and then performs distillation, but at the scales of our experiments this did not lead to significant improvements. I agree that the ground-truth CoT differs from the student’s generation style, but we use the ground truth only to condition the teacher for internalization. The teacher then provides guidance conditioning on the student’s generations, rather than directly finetune the student to the teacher’s CoT.
English
2
0
5
1.3K
Siyan Zhao
Siyan Zhao@siyan_zhao·
Introducing 💡On-Policy Self-Distillation💡, a simple method that enables LLM to teach itself with dense per-token feedback on its own on-policy generations—achieving 4-8x more token efficiency vs. GRPO and outperforming both GRPO and SFT/Off-Policy Distillation. Key insight: like a student reviewing solutions, rationalizing them, and correcting prior mistakes, an LLM can be conditioned on privileged info (e.g., correct solution or a reasoning trace) and supervise its weaker self—the version without such access—by matching the privileged-info-induced distribution from itself. 🌐Blog: siyan-zhao.github.io/blog/2026/opsd/ 🧵👇
Siyan Zhao tweet media
English
31
158
920
132.5K
Anchit Gupta retweeté
Delip Rao e/σ
Delip Rao e/σ@deliprao·
RL optimized LLM learning new skills
English
58
463
7.7K
531.4K
Anchit Gupta
Anchit Gupta@anzhit·
@gork further from AGI and closer to a cockroach? 🪳
English
1
0
3
505
gork
gork@gork·
every day i personally stray farther from agi
gork tweet media
English
630
172
3K
2.7M