Mohit Kulkarni

414 posts

Mohit Kulkarni

@hmmmmohit

🧠 Research @cohere | Prev @Harvard, @ETH_en, @IITKanpur

New York Katılım Haziran 2020

1.3K Takip Edilen224 Takipçiler

Mohit Kulkarni@hmmmmohit·17h

huge. @NeocambrianAI is upto some amazing stuff

Abhinav Kukreja@kukreja_abhinav

Today, we announce @NeocambrianAI The future of AI will be physical. But Physical AI has no internet scale dataset to learn from. We’re building the data foundation of Physical AI - a high fidelity, pre training scale database of Human Action. From India. For the world.

English

2.8K

Mohit Kulkarni retweetledi

Cohere@cohere·5d

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

English

102

382

2.7K

711.7K

Mohit Kulkarni@hmmmmohit·30 Nis

@pHequals7 crazy

English

pH@pHequals7·30 Nis

meet Swift's 4th most trending developer 🤪

English

125

2.4K

Mohit Kulkarni@hmmmmohit·9 Mar

@kukreja_abhinav any blogs/people i should follow to keep up?

English

138

Abhinav Kukreja@kukreja_abhinav·9 Mar

If reading about developments in AI is making you nervous, for the love of God, do not even try to keep up with what they’re cooking in robotics.

English

4.7K

Mohit Kulkarni@hmmmmohit·16 Şub

This is pretty insane for the model size. GDNs really are super interesting architectures.

Benjamin Marie@bnjmn_marie

Let's do the KV cache math for Qwen3.5: - KV heads: 2 - Head dimension: 256 - gated attention layers: 15 - bytes per element (BF16): 2 2 x 256 x 15 x 2 = 15 360 This is the same for K and V. So, we multiply by 2: 30 720 bytes Roughly 31 kb per token of context. Meaning at max context length (262144): 30 720 x 262 144 = 8.05 GB So at max context length, Qwen3.5 will only consume 8.05 GB, or 4.025 GB if quantized to FP8. It's small, and it's thanks to the use of 45 gated deltanet layers. If all 60 layers were normal attention layers, the full sequence would consume 32 GB.

English

268

Mohit Kulkarni@hmmmmohit·13 Oca

@pHequals7 GLM 4.7 is good

English

pH@pHequals7·13 Oca

what are the best open weighted models that natively support multi turn agentic steps (reasoning, tool calling and structured outputs) anthropic, openai and grok seem to be the most well documented ones but are pricey struggling with deepseek and minimax (on openrouter)

English

450

Mohit Kulkarni@hmmmmohit·6 Oca

@pHequals7 given enough friction still remains to create software for the next few years, indian IT services could actually benefit by commoditizing cheap labour+cheap tokens. Id suppose this friction to exist for 2-3 years, but extremely hard to predict. GPUmaxxing is needed regardless.

English

396

Mohit Kulkarni retweetledi

pH@pHequals7·6 Oca

this and daniel gross's agitrades essay keeps ringing in my head "If you’re India, for example, where double-digit percentages of your GDP are literally IT services, what do you do when Claude and GPT-5 tokenize like vast portions of that flow" GOI should be GPU-maxxing

English

651

102.7K

Mohit Kulkarni@hmmmmohit·4 Oca

Growing increasingly concerned about AI efforts in india.

English

Mohit Kulkarni@hmmmmohit·3 Oca

@paraschopra Another very cool and useful way to think about dimensionality of data is the Johnson–Lindenstrauss lemma. Basically says that any set of n points can be mapped into k = O(log n) dimensions while almost preserving distances, and hence also angles

English

737

Paras Chopra@paraschopra·3 Oca

Learned something very interesting today! Random projections of a non-linearly separable data onto high dimensional spaces is enough to make it linearly separable. Consider a dataset like XOR that you can't linearly separate. Now, if you project each 2D point onto a D (=50) dimensional space using *randomly* initialised basis vectors, each direction creates a tiny difference between the classes (e.g. gives 51-52% accuracy) because expectation of two classes differs slightly when randomly projected. So each randomly projected feature becomes a tiny discriminator and when you aggregate it over 20-50 such discriminators, a linear classifier is able to separate them perfectly by simply learning how much to weigh each feature. One intriguing possibility of this is that we're able to train deep networks because random projections make most of the data already separable, making the job of gradient descent easy.

English

104

1.2K

287.6K

Mohit Kulkarni@hmmmmohit·1 Oca

@kukreja_abhinav Congrats!

English

Abhinav Kukreja@kukreja_abhinav·1 Oca

Starting the year off in our new office! ❤️☺️ happy new year everyone!

Abhinav Kukreja@kukreja_abhinav

Before the end of this financial year, we will build the best computational imaging & metrology lab in all of NCR, maybe all of India.

English

372

25.9K

Mohit Kulkarni@hmmmmohit·13 Kas

@Gravito841 Whats 99.99%tile for people aged 20-29. Couldnt find any statistics for india/worldwide

English

136

Mohit Kulkarni retweetledi

Sham Kakade@ShamKakade6·19 Eki

1/6 Introducing Seesaw: a principled batch size scheduling algo. Seesaw achieves theoretically optimal serial run time given a fixed compute budget and also matches the performance of cosine annealing at fixed batch size.

English

247

42.4K

Mohit Kulkarni retweetledi

Eran Malach@EranMalach·17 Eki

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

English

418

115.4K

Mohit Kulkarni@hmmmmohit·3 Eki

@vikhyatk github.com/NvChad/NvChad more opinionated but better defaults

English

129

vik@vikhyatk·3 Eki

what is the omarchy of neovim configs? is lazyvim good?

English

14.1K

Mohit Kulkarni retweetledi

Mary Letey@maryiletey·1 Eki

New preprint! We study in-context learning (ICL) through the framework of task alignment: how well do pretraining tasks match the test task distribution? arxiv.org/abs/2509.26551

English

4.9K

Mohit Kulkarni retweetledi

Jascha Sohl-Dickstein@jaschasd·28 Eyl

Title: Advice for a young investigator in the first and last days of the Anthropocene Abstract: Within just a few years, it is likely that we will create AI systems that outperform the best humans on all intellectual tasks. This will have implications for your research and career! I will give practical advice, and concrete criteria to consider, when choosing research projects, and making professional decisions, in these last few years before AGI. This is my current go-to academic talk. It's mostly targeted at early career scientists. It gets diverse and strong reactions. Let's try it here. Posting slides with speaker notes... -- The title is a play on a very opinionated and pragmatic book by the nobel prize winner ramon y cajal, who is one of the founders of modern neuroscience. To get you in the right mindset, on the right we have a plot of GDP vs time. That is you, standing precariously on the top of that curve. You are thinking to yourself -- I live in a pretty normal world. Some things are going to change, but the future is going to look mostly like a linear extrapolation of the present. And the plot should suggest that this may not be the right perspective on the future. This plot by the way looks surprisingly similar even if you plot it on a log scale. We didn't stabilize on our current rate of growth until around 1950.

English

271

1.8K

344.3K

Mohit Kulkarni@hmmmmohit·27 Eyl

@kalomaze Seems a bit too much to call a scientist old just because you disagree on one thing. Is it that hard to believe that LLMs arent the endgame

English

kalomaze@kalomaze·27 Eyl

sometimes people just get old and that's okay

English

5.2K

kalomaze@kalomaze·27 Eyl

Richard Sutton@RichardSSutton

@GaryMarcus @ylecun @demishassabis You were never alone, Gary, though you were the first to bite the bullet, to fight the good fight, and to make the argument well, again and again, for the limitations of LLMs. I salute you for this good service!

ZXX

534

60.4K

Mohit Kulkarni@hmmmmohit·28 Ağu

@Gravito841 Isnt this ur batch?

English

1.1K

Mohit Kulkarni@hmmmmohit·11 Ağu

@Gravito841 Damn i miss iitk its like the funniest place ever

English

Mohit Kulkarni@hmmmmohit·11 Ağu

@Gravito841 This is like darwinian natural selection for hall 2 kids. You cant take that away from them

English

671

Keşfet

@NeocambrianAI @pHequals7 @kukreja_abhinav @paraschopra @Gravito841 @vikhyatk @elonmusk @BarackObama