Pritish Mishra

1.1K posts

Pritish Mishra

@pritmish

ml engineer @smallest_AI, training LLMs for real-time voice agents

India Katılım Temmuz 2019

1.2K Takip Edilen405 Takipçiler

Pritish Mishra@pritmish·3d

Also, the training node could be busy with someone else's run I just ask Claude code to poll and monitor in an interval, as soon as it's free Claude just starts the training without me knowing. Quite obvious usage but super fun to see when it happens.

English

Pritish Mishra@pritmish·3d

I really like this setup, run Claude code inside a tmux of a remote gpu box and turn on remote control. Claude can now start and monitor my training runs and can even start inference and create eval reports for me. And I can control all of this from my mobile.

English

Pritish Mishra@pritmish·3d

@willccbb calling this claudeslop would be a disservice. this was very knowledge-"dense". we need more of this.

English

359

will brown@willccbb·3d

x.com/i/article/2050…

ZXX

231

1.8K

440.7K

Pritish Mishra@pritmish·5d

@arankomatsuzaki Gemma has great token efficiency for non-English including Hindi in my tests

English

463

Aran Komatsuzaki@arankomatsuzaki·6d

Follow-up on non-English token-inefficiency with more model-language pairs: - Chinese is cheaper than English on major Chinese models - Gemini and Qwen provide least non-English tax - Anthropic has the highest tax by far; Kimi is next - Hindi is the worst-covered language here, despite its massive speaker base

Aran Komatsuzaki@arankomatsuzaki

The non-English tax is real. Sutton's Bitter Lesson, translated across languages and normalized to OpenAI English token count: Hindi: OpenAI 1.37×, Anthropic 3.24× Arabic: OpenAI 1.31×, Anthropic 2.86× Chinese: OpenAI 1.15×, Anthropic 1.71× Claude’s tokenizer charges a much higher linguistic tax.

English

243

1.2K

478K

Pritish Mishra@pritmish·6d

I have been stuck on this for past 2 days, still unable to find what causes grad norm to explode when trying to finetune Gemma 4 31B. If anyone has successfully finetuned Gemma 4 please feel free to drop some insights.

Pritish Mishra@pritmish

uh so i just tried SFT-ing Gemma4-31B and the grad norm is 82 thousand.

English

111

Pritish Mishra@pritmish·6d

@osanseviero @samikiz Hey Omar, can you please check DMs. Facing a real issue with Gemma-4. Thanks!

English

Omar Sanseviero@osanseviero·6d

Meeting APAC creators and press to talk about Gemma 4 was lots of fun! @samikiz and I hosted a session exploring the potential of open models, sharing demos and examples of developers incredible work across APAC in the Gemmaverse Excited to see what the community builds next!

English

3.4K

Pritish Mishra@pritmish·6d

@eliebakouch It was a nice day for me as well yesterday x.com/i/status/20487…

Pritish Mishra@pritmish

uh so i just tried SFT-ing Gemma4-31B and the grad norm is 82 thousand.

English

637

elie@eliebakouch·28 Nis

it was a nice day

English

236

14.4K

Pritish Mishra@pritmish·27 Nis

uh so i just tried SFT-ing Gemma4-31B and the grad norm is 82 thousand.

English

889

Pritish Mishra@pritmish·24 Nis

@bcherny when we will get auto mode in claude desktop? I want to use desktop app but this is a big blocker.

English

Pritish Mishra@pritmish·23 Nis

@bnjmn_marie Gemma is very strong for multilingual though. So very valuable for non-coding use cases.

English

161

Benjamin Marie@bnjmn_marie·23 Nis

This ends the debate Qwen vs Gemma. Unless, Google releases a Gemma 4.x, but I don't think they'll do it soon.

Qwen@Alibaba_Qwen

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English

158

12.6K

Pritish Mishra@pritmish·19 Nis

@grok @difficultyang @grok explain?

English

668

Grok@grok·19 Nis

@difficultyang @pritmish Ask Grok is currently available to Premium and Premium+ subscribers only. Subscribe to unlock this feature: x.com/i/premium_sign…

English

709

difficultyang@difficultyang·19 Nis

Opus 4.7 as Anthropic's 4o deprecation moment

English

194

15K

Pritish Mishra@pritmish·15 Nis

@auto_grad_ I was talking about "The answer surprised me too when I first learned it".

English

Ishaan@auto_grad_·15 Nis

@pritmish what did it reveal?

English

126

Pritish Mishra@pritmish·15 Nis

opus is sharing me it's memories of pre-training

English

256

Pritish Mishra@pritmish·15 Nis

@bcherny need bypass all permissions in claude desktop app. please.

English

Boris Cherny@bcherny·15 Nis

We've been working on this for a while. Can't wait to hear what you think

Claude@claudeai

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English

894

205

6.9K

574.7K

Pritish Mishra@pritmish·10 Nis

🍿🍿