Pritish Mishra

1.1K posts

Pritish Mishra banner
Pritish Mishra

Pritish Mishra

@pritmish

ml engineer @smallest_AI, training LLMs for real-time voice agents

India Katılım Temmuz 2019
1.2K Takip Edilen405 Takipçiler
Pritish Mishra
Pritish Mishra@pritmish·
Also, the training node could be busy with someone else's run I just ask Claude code to poll and monitor in an interval, as soon as it's free Claude just starts the training without me knowing. Quite obvious usage but super fun to see when it happens.
English
0
0
0
30
Pritish Mishra
Pritish Mishra@pritmish·
I really like this setup, run Claude code inside a tmux of a remote gpu box and turn on remote control. Claude can now start and monitor my training runs and can even start inference and create eval reports for me. And I can control all of this from my mobile.
Pritish Mishra tweet media
English
1
0
0
85
Pritish Mishra
Pritish Mishra@pritmish·
@willccbb calling this claudeslop would be a disservice. this was very knowledge-"dense". we need more of this.
English
0
0
1
359
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Follow-up on non-English token-inefficiency with more model-language pairs: - Chinese is cheaper than English on major Chinese models - Gemini and Qwen provide least non-English tax - Anthropic has the highest tax by far; Kimi is next - Hindi is the worst-covered language here, despite its massive speaker base
Aran Komatsuzaki tweet media
Aran Komatsuzaki@arankomatsuzaki

The non-English tax is real. Sutton's Bitter Lesson, translated across languages and normalized to OpenAI English token count: Hindi: OpenAI 1.37×, Anthropic 3.24× Arabic: OpenAI 1.31×, Anthropic 2.86× Chinese: OpenAI 1.15×, Anthropic 1.71× Claude’s tokenizer charges a much higher linguistic tax.

English
61
243
1.2K
478K
Omar Sanseviero
Omar Sanseviero@osanseviero·
Meeting APAC creators and press to talk about Gemma 4 was lots of fun! @samikiz and I hosted a session exploring the potential of open models, sharing demos and examples of developers incredible work across APAC in the Gemmaverse Excited to see what the community builds next!
Omar Sanseviero tweet media
English
6
3
49
3.4K
elie
elie@eliebakouch·
it was a nice day
elie tweet media
English
17
4
236
14.4K
Pritish Mishra
Pritish Mishra@pritmish·
uh so i just tried SFT-ing Gemma4-31B and the grad norm is 82 thousand.
Pritish Mishra tweet media
English
0
0
3
889
Pritish Mishra
Pritish Mishra@pritmish·
@bcherny when we will get auto mode in claude desktop? I want to use desktop app but this is a big blocker.
English
0
0
0
27
Pritish Mishra
Pritish Mishra@pritmish·
@bnjmn_marie Gemma is very strong for multilingual though. So very valuable for non-coding use cases.
English
0
0
6
161
difficultyang
difficultyang@difficultyang·
Opus 4.7 as Anthropic's 4o deprecation moment
English
6
3
194
15K
Pritish Mishra
Pritish Mishra@pritmish·
@auto_grad_ I was talking about "The answer surprised me too when I first learned it".
English
0
0
0
22
Pritish Mishra
Pritish Mishra@pritmish·
opus is sharing me it's memories of pre-training
Pritish Mishra tweet media
English
1
0
1
256
Pritish Mishra
Pritish Mishra@pritmish·
@bcherny need bypass all permissions in claude desktop app. please.
English
0
0
0
43
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
@tenderizzation I had a week to optimize perf on B300s with CUDA 13... I wouldn't even make megatron run in that time
English
2
0
27
745
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Interesting: Gemma 4 31B has tied embeddings Very unusual at this scale. It's saving 1.41B params (vocab_size x hidden_size), ~2.8 GB
English
4
2
75
9.2K