Sourav

1.2K posts

Sourav

@srvmshr

ML University of Tokyo. Prev: Microsoft Research RF, @virginia_tech. Personal opinions. Coasting life with @jnchrltte

Tokyo-to, Japan Katılım Nisan 2013

1K Takip Edilen616 Takipçiler

Sourav retweetledi

himanshu@himanshustwts·1d

Deepseek shipping speed is truly underrated. In just 16 months, they shipped 10+ architectural innovations across training stack. > MLA > DeepseekMoE > Aux-loss-free load balancing > GRPO > R1-Zero > DSA > CSA (in v4) > HCA (in v4) > mHC > Large-scale agentic task synthesis and almost everything is open source. what am i missing?

English

738

24.7K

Sourav@srvmshr·1d

13!

Sourav retweetledi

Google Gemma@googlegemma·1d

x.com/i/article/2044…

ZXX

130

11K

Sourav retweetledi

Dhruv@dhruvtwt_·3d

Why is no one talking about this? @nvidia is offering around 80 AI models via hosted APIs absolutely for free. You get access to MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek 3.2, GPT-OSS-120B, Sarvam-M etc. This plugs straight into OpenClaude, OpenCode, Zed IDE, Hermes agent and even with Cursor IDE. Setup: – Grab API key: build.nvidia.com/models – base_url = "integrate.api.nvidia.com/v1" – api_key = "$NVIDIA_API_KEY" – select model (e.g. minimaxai/minimax-m2.7) If you’re building or experimenting, this is basically free inference. Lock in and start building today anon. Thank me later.

English

526

1.8K

18.1K

1.5M

Sourav retweetledi

Sebastien Bubeck@SebastienBubeck·2d

Ramsey numbers are one the most basic objects in combinatorics, a beautiful illustration of structure within chaos. They have been heavily studied for almost a century now, so it came as a real surprise to us when an internal version of GPT-5.5 proved a new elementary result about them: \lim_{n\to \infty} R(k,n+1)/R(k,n) = 1 for all k This was also known as Erdos problem #1014, although I personally think the more relevant bit is that it's a basic result about off-diagonal Ramsey numbers. As it often happens (for now), the proof is reasonably simple in hindsight, although it is quite a wire act and it relies on some unexpected numerics (the "unexpected" part here is probably why this wasn't discovered before). Despite being simple, it's certainly the type of result that could now be taught in a combinatorics class. Pdf of the proof (produced by @mehtaab_sawhney): cdn.openai.com/pdf/6dc7175d-d… Lean verification of the proof (produced by Boris Alexeev): github.com/plby/lean-proo…

English

497

63.8K

Sourav retweetledi

GDP@bookwormengr·2d

DeepSeek V4 hits it out of the park and addresses HBM shortage: DeepSeek proves why it is such a fundamental research lab. In addition to exceeding Opus 4.6 on Terminal Bench and virtually matching on other performance metrics, the most notable advancement is this statement: "In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2" To understand significance of this point, consider below diagram that shows memory layout for Prefill and Decode nodes. If you implement Decode with Data and Expert parallelism (DEP16) with 16 GPUs on GB200 or GB300 NVL72 rack with DeepSeek v3.2, you are left with 104GB or 176 GB HBRAM per GPU respectively. Here we are assuming MoE parameters are in NVFP4. The remaining HBRAM per GPU dictates how large batch size you can have for inference, which determines how many concurrent request you can serve. Consider GB300 with 176GB left: 1. For 128K context, you need 4.45 GB HBRam for KV Cache, and you can serve only 36 concurrent requests. 2. For 256K context, you need 8.90 GB HBRam for KV Cache, and you can serve only 18 concurrent requests. 3. For 512K context, you need 17.80 GB HBRam for KV Cache, and you can serve only 9 concurrent requests. 4. For 1M context, you need 35.60 GB HBRam for KV Cache, and you can serve only 4 concurrent requests. You see the point. Now you imagine, you actually required 10 times less KV cache somehow at 1M! It basically enables you to server 10 times more requests with same resources. Recall Decode is memory bound and not compute bound, unlike Prefill. This is probably the most important contribution of DeepSeek V4. @teortaxesTex @jukan05 @zephyr_z9

DeepSeek@deepseek_ai

Structural Innovation & Ultra-High Context Efficiency 🔹 Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention). 🔹 Peak Efficiency: World-leading long context with drastically reduced compute & memory costs. 🔹 1M Standard: 1M context is now the default across all official DeepSeek services. 4/n

English

241

1.6K

207.6K

Sourav@srvmshr·5d

The most underrated LLM afaict. Kimi works really really well too on @windsurf Looking forward to trying it there

Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

Sourav@srvmshr·5d

The 2.6 benchmarks are closing in on Opus 4.6. Dayumn

Xinyu Zhou@zxytim

huggingface.co/moonshotai/Kim…

English

Sourav@srvmshr·5d

@ml_angelopoulos The evidence seems to align as well x.com/i/status/20460…

Simon Willison@simonw

I upgraded my Claude token counter tool to compare different models and Opus 4.7 does appear to use 1.46x times the tokens for text and up to 3x the tokens for images - it's priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump

English

Sourav@srvmshr·5d

@ml_angelopoulos The pricing shock is definitely upsetting. I can get a lot done by 4.6 without running out of my session limits. But 4.7 is a real guzzler

English

Anastasios Nikolas Angelopoulos@ml_angelopoulos·6d

To be clear, 4 points over Opus 4.6 is nothing to write home about, and is not significant according to the CIs. The vibes on X have been correct about this model so far, in that it is different from, but not uniformly better than, Opus 4.6. Still, Opus 4.X remains king 👑

Anastasios Nikolas Angelopoulos tweet media

Arena.ai@arena

Claude Opus 4.7 from @AnthropicAI takes #1 in Vision & Document Arena! In Document Arena: Opus 4.7 lands +4 points over Opus-4.6 and +45 over the next non-Anthropic model, GPT-5.4 (#6). This is huge ~70 pts lead over Muse Spark and Gemini-3.1-Pro. Real world research work like literature reviews, legal analysis, clinical notes, and technical reports don’t fit in a single prompt. Document Arena evaluates long-context document reasoning on real user workflows. In Vision Arena it sweeps multiple categories: - #1 Diagram - #1 Homework - #1 OCR 🧵More updates in the thread for Vision Arena. Huge congrats to @AnthropicAI again for pushing the frontier.

English

1.5K

Sourav@srvmshr·6d

@PMinervini @iamtrask You are in an enviable position of having both :)

English

Pasquale Minervini@PMinervini·6d

@srvmshr @iamtrask Got an m3 max as a laptop and sometimes miss not being able to run cuda software right away, the spark fixed that very nicely

English

⿻ Andrew Trask@iamtrask·19 Nis

Hey - if you don’t use a local LLM as your daily driver, why? What are the remaining problems?

English

116

141

76.4K

Sourav@srvmshr·6d

How does one delete tasks in @OpenAI Codex on Web? I see only options for Archiving. Can anyone chime in why this feature is missing, especially even on Teams plan where we opt out of data retention @OpenAIDevs @thsottiaux

English

Sourav@srvmshr·6d

@iamtrask Spark is more tinker-friendly & a big W if writing anything remotely low level (CUDA, fine-tuning, kernels etc). Otherwise yes, MLX + GGUFs makes it pretty convenient

English

⿻ Andrew Trask@iamtrask·6d

@srvmshr Seems like the Mac Studio van at least be a great Mac if it doesn’t work out, yeah?

English

414

Sourav@srvmshr·6d

Gut punched @projecthailmary tumblr.com/wolfythewitch/…

English

Sourav@srvmshr·17 Nis

@thsottiaux @barret_zoph Please introduce "Delete Tasks" on the codex web UI. Frankly, it keeps code information even from disconnected GitHub accounts/repo in the task panels. Our org doesn't share data for model improvement. This isn't okay. People report that some github issues pending around it.

English

302

Tibo@thsottiaux·16 Nis

Codex Compute efficient ✅ Always up, never down ✅ Best at hardcore engineering ✅ Crazy good app, first to escape the terminal ✅

English

454

187

5.1K

2.4M

Sourav@srvmshr·17 Nis

@tarantulae Model distillation is quite likely. Opus 4.7 is downstream of Mythos. I have some serious concerns about a gimped version of the larger, more capable model i.e.more token usage but variable quality

English

Christian S. Perone@tarantulae·17 Nis

Interesting that Opus 4.7 seems to be using a new tokenizer, which means probably a new model, but Mythos is the new model, so I wonder what is going on 🤔

English

456

Sourav@srvmshr·17 Nis

@liztansz Find the sessions you like & walk around to posters that interest you. The Q&A usually tells things that aren't in the paper. The socials in the evening are nice. Outlier tip: nice to have business cards if you talk to people. Food usually is mehh - don't count on it

English

liz tan@liztansz·17 Nis

can people please share their top conference tips ? Im going to ICLR next week !! do people prep for conferences?? what do you do there? do you just go to make friends ? please-- this is my first conference !!

English

176

Sourav@srvmshr·16 Nis

Very fair points. François makes the correct distinction that scalability isn't a property of simplicity. I think scalability has dependency on how well the solution space can be parallelized & also parametrized. Probably what coincides with his views on hi-entropy

François Chollet@fchollet

There's a broadly held misconception in AI that methods that scale well are simple methods -- even, that simple methods usually scale. This is completely wrong. Pretty much none of the truly simple methods in ML scale well. SVM, kNN, random forests are some of the simplest methods out there, and they don't scale at all. Meanwhile "train a transformer via backprop and gradient descent" is a very high-entropy method, easily 10x more complex than random forest fitting. But it scales very well. Further, given a simple method that doesn't scale, it is usually the case that you alter it to make it scale by adding a lot of complication. For instance, take a simple a simple combinatorial search-based method (not scalable at all) -- you can make it scale by adding deep learning guidance (which blows up complexity). Scalability usually belongs to high-entropy, complex systems.

English

Sourav@srvmshr·15 Nis

ML folks, esp. the mass publishing usual suspects, rather believe otherwise

`@lisaawrites

A sign of intelligent person is their ability to simplify things, not complicate them.

English

Keşfet

@nvidia @mehtaab_sawhney @teortaxesTex @jukan05 @zephyr_z9 @windsurf @ml_angelopoulos @PMinervini