Sourav

1.2K posts

Sourav banner
Sourav

Sourav

@srvmshr

ML University of Tokyo. Prev: Microsoft Research RF, @virginia_tech. Personal opinions. Coasting life with @jnchrltte

Tokyo-to, Japan เข้าร่วม Nisan 2013
1K กำลังติดตาม616 ผู้ติดตาม
Sourav รีทวีตแล้ว
himanshu
himanshu@himanshustwts·
Deepseek shipping speed is truly underrated. In just 16 months, they shipped 10+ architectural innovations across training stack. > MLA > DeepseekMoE > Aux-loss-free load balancing > GRPO > R1-Zero > DSA > CSA (in v4) > HCA (in v4) > mHC > Large-scale agentic task synthesis and almost everything is open source. what am i missing?
English
24
57
738
24.9K
Sourav
Sourav@srvmshr·
13!
Sourav tweet media
0
0
0
46
Sourav รีทวีตแล้ว
Dhruv
Dhruv@dhruvtwt_·
Why is no one talking about this? @nvidia is offering around 80 AI models via hosted APIs absolutely for free. You get access to MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek 3.2, GPT-OSS-120B, Sarvam-M etc. This plugs straight into OpenClaude, OpenCode, Zed IDE, Hermes agent and even with Cursor IDE. Setup: – Grab API key: build.nvidia.com/models – base_url = "integrate.api.nvidia.com/v1" – api_key = "$NVIDIA_API_KEY" – select model (e.g. minimaxai/minimax-m2.7) If you’re building or experimenting, this is basically free inference. Lock in and start building today anon. Thank me later.
Dhruv tweet media
English
527
1.8K
18.1K
1.5M
Sourav รีทวีตแล้ว
Sebastien Bubeck
Sebastien Bubeck@SebastienBubeck·
Ramsey numbers are one the most basic objects in combinatorics, a beautiful illustration of structure within chaos. They have been heavily studied for almost a century now, so it came as a real surprise to us when an internal version of GPT-5.5 proved a new elementary result about them: \lim_{n\to \infty} R(k,n+1)/R(k,n) = 1 for all k This was also known as Erdos problem #1014, although I personally think the more relevant bit is that it's a basic result about off-diagonal Ramsey numbers. As it often happens (for now), the proof is reasonably simple in hindsight, although it is quite a wire act and it relies on some unexpected numerics (the "unexpected" part here is probably why this wasn't discovered before). Despite being simple, it's certainly the type of result that could now be taught in a combinatorics class. Pdf of the proof (produced by @mehtaab_sawhney): cdn.openai.com/pdf/6dc7175d-d… Lean verification of the proof (produced by Boris Alexeev): github.com/plby/lean-proo…
Sebastien Bubeck tweet media
English
9
79
497
64.2K
Sourav รีทวีตแล้ว
GDP
GDP@bookwormengr·
DeepSeek V4 hits it out of the park and addresses HBM shortage: DeepSeek proves why it is such a fundamental research lab. In addition to exceeding Opus 4.6 on Terminal Bench and virtually matching on other performance metrics, the most notable advancement is this statement: "In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2" To understand significance of this point, consider below diagram that shows memory layout for Prefill and Decode nodes. If you implement Decode with Data and Expert parallelism (DEP16) with 16 GPUs on GB200 or GB300 NVL72 rack with DeepSeek v3.2, you are left with 104GB or 176 GB HBRAM per GPU respectively. Here we are assuming MoE parameters are in NVFP4. The remaining HBRAM per GPU dictates how large batch size you can have for inference, which determines how many concurrent request you can serve. Consider GB300 with 176GB left: 1. For 128K context, you need 4.45 GB HBRam for KV Cache, and you can serve only 36 concurrent requests. 2. For 256K context, you need 8.90 GB HBRam for KV Cache, and you can serve only 18 concurrent requests. 3. For 512K context, you need 17.80 GB HBRam for KV Cache, and you can serve only 9 concurrent requests. 4. For 1M context, you need 35.60 GB HBRam for KV Cache, and you can serve only 4 concurrent requests. You see the point. Now you imagine, you actually required 10 times less KV cache somehow at 1M! It basically enables you to server 10 times more requests with same resources. Recall Decode is memory bound and not compute bound, unlike Prefill. This is probably the most important contribution of DeepSeek V4. @teortaxesTex @jukan05 @zephyr_z9
GDP tweet media
DeepSeek@deepseek_ai

Structural Innovation & Ultra-High Context Efficiency 🔹 Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention). 🔹 Peak Efficiency: World-leading long context with drastically reduced compute & memory costs. 🔹 1M Standard: 1M context is now the default across all official DeepSeek services. 4/n

English
27
241
1.6K
207.8K
Sourav
Sourav@srvmshr·
@ml_angelopoulos The pricing shock is definitely upsetting. I can get a lot done by 4.6 without running out of my session limits. But 4.7 is a real guzzler
English
1
0
1
39
Anastasios Nikolas Angelopoulos
To be clear, 4 points over Opus 4.6 is nothing to write home about, and is not significant according to the CIs. The vibes on X have been correct about this model so far, in that it is different from, but not uniformly better than, Opus 4.6. Still, Opus 4.X remains king 👑
Anastasios Nikolas Angelopoulos tweet media
Arena.ai@arena

Claude Opus 4.7 from @AnthropicAI takes #1 in Vision & Document Arena! In Document Arena: Opus 4.7 lands +4 points over Opus-4.6 and +45 over the next non-Anthropic model, GPT-5.4 (#6). This is huge ~70 pts lead over Muse Spark and Gemini-3.1-Pro. Real world research work like literature reviews, legal analysis, clinical notes, and technical reports don’t fit in a single prompt. Document Arena evaluates long-context document reasoning on real user workflows. In Vision Arena it sweeps multiple categories: - #1 Diagram - #1 Homework - #1 OCR 🧵More updates in the thread for Vision Arena. Huge congrats to @AnthropicAI again for pushing the frontier.

English
1
0
8
1.5K
Pasquale Minervini
Pasquale Minervini@PMinervini·
@srvmshr @iamtrask Got an m3 max as a laptop and sometimes miss not being able to run cuda software right away, the spark fixed that very nicely
English
1
0
1
29
⿻ Andrew Trask
⿻ Andrew Trask@iamtrask·
Hey - if you don’t use a local LLM as your daily driver, why? What are the remaining problems?
English
116
6
141
76.5K
Sourav
Sourav@srvmshr·
How does one delete tasks in @OpenAI Codex on Web? I see only options for Archiving. Can anyone chime in why this feature is missing, especially even on Teams plan where we opt out of data retention @OpenAIDevs @thsottiaux
English
0
0
0
54
Sourav
Sourav@srvmshr·
@iamtrask Spark is more tinker-friendly & a big W if writing anything remotely low level (CUDA, fine-tuning, kernels etc). Otherwise yes, MLX + GGUFs makes it pretty convenient
English
0
0
1
71
⿻ Andrew Trask
⿻ Andrew Trask@iamtrask·
@srvmshr Seems like the Mac Studio van at least be a great Mac if it doesn’t work out, yeah?
English
1
0
0
414
Sourav
Sourav@srvmshr·
@thsottiaux @barret_zoph Please introduce "Delete Tasks" on the codex web UI. Frankly, it keeps code information even from disconnected GitHub accounts/repo in the task panels. Our org doesn't share data for model improvement. This isn't okay. People report that some github issues pending around it.
English
0
0
0
302
Tibo
Tibo@thsottiaux·
Codex Compute efficient ✅ Always up, never down ✅ Best at hardcore engineering ✅ Crazy good app, first to escape the terminal ✅
English
454
187
5.1K
2.4M
Sourav
Sourav@srvmshr·
@tarantulae Model distillation is quite likely. Opus 4.7 is downstream of Mythos. I have some serious concerns about a gimped version of the larger, more capable model i.e.more token usage but variable quality
English
0
0
1
40
Christian S. Perone
Christian S. Perone@tarantulae·
Interesting that Opus 4.7 seems to be using a new tokenizer, which means probably a new model, but Mythos is the new model, so I wonder what is going on 🤔
English
1
0
1
456
Sourav
Sourav@srvmshr·
@liztansz Find the sessions you like & walk around to posters that interest you. The Q&A usually tells things that aren't in the paper. The socials in the evening are nice. Outlier tip: nice to have business cards if you talk to people. Food usually is mehh - don't count on it
English
1
0
1
34
liz tan
liz tan@liztansz·
can people please share their top conference tips ? Im going to ICLR next week !! do people prep for conferences?? what do you do there? do you just go to make friends ? please-- this is my first conference !!
English
2
0
3
176