Divyansh Singh (@L2_cache_miss) - Twitter Profili

Divyansh Singh@L2_cache_miss·6d

@0xSero how does one get an api which doesn't use the data for training?

English

0

1

947

0xSero@0xSero·6d

117.4M tokens for 2.24$ for a genius.

English

123

80

2.4K

206.5K

Divyansh Singh@L2_cache_miss·25 Nis

how will I ever go back to reading pdfs normally after this?

Divyansh Singh@L2_cache_miss

@nrehiew_ thank you

English

0

6

Divyansh Singh@L2_cache_miss·24 Nis

@MoonHeead @nrehiew_ yeah 😂 I have created a design reference for this sort of aesthetics and creating walk throughs for multiple papers.. x.com/L2_cache_miss/…

Divyansh Singh@L2_cache_miss

@nrehiew_ this is very addictive btw... my weekly limit was resetting tonight so I generated html walkthroughs of multiple papers from my reading list... it is so much fun to read this way first and then trace back in the paper... thanks a lot @nrehiew_

English

0

28

Moon Head@MoonHeead·24 Nis

@L2_cache_miss @nrehiew_ thank you bro , we all make typos 👏

English

1

0

23

wh@nrehiew_·24 Nis

How I read papers now. This is an explainer by Claude about the new Compressed Sparse Attention v4 uses to compress the KV cache.

wh@nrehiew_

Now reading:

English

6

69

699

55.5K

Divyansh Singh@L2_cache_miss·24 Nis

@MoonHeead @nrehiew_ pardon my french typos in the prompt :)

English

1

0

1

31

Divyansh Singh@L2_cache_miss·24 Nis

@MoonHeead @nrehiew_ Opus 4.7 with thinking reference image is original one which @nrehiew_ shared

English

1

0

1

62

Divyansh Singh@L2_cache_miss·24 Nis

@nrehiew_ this is very addictive btw... my weekly limit was resetting tonight so I generated html walkthroughs of multiple papers from my reading list... it is so much fun to read this way first and then trace back in the paper... thanks a lot @nrehiew_

English

0

1

68

Divyansh Singh@L2_cache_miss·24 Nis

@nrehiew_ thank you

English

2

0

5

722

Divyansh Singh@L2_cache_miss·24 Nis

@Xinyu2ML I would say post training data

English

0

2.2K

Xinyu Yang@Xinyu2ML·24 Nis

Excellent Infra Team, Terrible Arch Team.

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

13

1

127

36.7K

Divyansh Singh@L2_cache_miss·24 Nis

@nrehiew_ they all know how to extract full information from less

English

0

3

262

wh@nrehiew_·24 Nis

DSA, NSA, CSA, CIA, HCA, KDA, FBI What do these have in common?

English

5

1

33

5.1K

Divyansh Singh@L2_cache_miss·24 Nis

batch-invariance with high perf

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

0

17

Divyansh Singh@L2_cache_miss·24 Nis

@vicentesurraco @HSVSphere still depends on when you send the first message after limit reset

English

0

107

Vicente Surraco@vicentesurraco·24 Nis

@L2_cache_miss @HSVSphere no, when they reset limits it puts everyone on the same schedule

English

1

0

2

290

HSVSphere@HSVSphere·23 Nis

they literally reset limits 3h before the reset timestamp TWO WEEKS IN A ROW 😭

robot@alightinastorm

the new anthropic scam is to publicly reset usage limits every 6 days and 23 hours

English

27

65

2.8K

125.2K

Divyansh Singh@L2_cache_miss·23 Nis

I don't understand what is unique here... many enterprises are already using CC or codex right??

Greg Brockman@gdb

we're rolling codex out to whole companies/enterprises. ping me gdb @openai.com if of interest!

English

0

12

Divyansh Singh@L2_cache_miss·18 Nis

@GuggaLeunnam just keep posting more of such edits on X and then wait for next grok Imagine model

English

0

1

2.5K

Gugga Leunnam@GuggaLeunnam·18 Nis

AI couldn't edit this

English

241

5.6K

41.5K

1.5M

Divyansh Singh@L2_cache_miss·16 Nis

what kind of visual wizard library are these @claudeai folks using, this is literally so smooth visual UI

English

0

6

Divyansh Singh@L2_cache_miss·16 Nis

@LLMJunky I see, thanks I was looking for some theoretical backing instead of opinions, for some reason this feels like using fp32 kv cache for bf16 model weights 😂

English

0

1

6

am.will@LLMJunky·15 Nis

@L2_cache_miss more accurate, this is the recommendation of the creator of the quant. you can run it at fp8. some say it doesn't matter, others do. 🤷‍♂️🤷‍♂️

English

1

0

1

63

am.will@LLMJunky·14 Nis

Minimax M2.7 running locally on just two baby GPUs. Here’s a side-by-side of two leading local LLM serving engines for MiniMax M2.7 NVFP4: vLLM and SGLang What do you think? These results are surprising to me, I expected them to scale more or less linearly.

am.will@LLMJunky

Finally putting these RTX 6000s to good use. Minimax M2.7 running locally. There have been learning curves indeed. Running NVFP4 with full 16-bit KVCache @ with a 140K context window. I can get a full 200K context window but only with vLLM, which is slower. I think I'll opt for the speed, I dont need 200K anyway. Dedicating my free time to leveling up my game. Thanks to everyone who's helping me. YKWYA 🫶

English

27

4

117

18.3K

Divyansh Singh@L2_cache_miss·15 Nis

@bcherny hey @bcherny any plans for linux app?

English

0

1

105

Boris Cherny@bcherny·15 Nis

We've been working on this for a while. Can't wait to hear what you think

Claude@claudeai

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English

894

205

6.9K

574.6K

Divyansh Singh@L2_cache_miss·11 Nis

@__tinygrad__ just 5 lines for this is crazy 🔥

English

0

2

863

the tiny corp@__tinygrad__·11 Nis

Just merged an external PR for Bonsai-8B support (1 bit LLM). Because tinygrad has the correct abstractions, it was 5 lines. huggingface.co/prism-ml/Bonsa… github.com/tinygrad/tinyg…

English

7

12

255

35.6K

Divyansh Singh@L2_cache_miss·9 Nis

@danveloper @p_nawrot one reason might be the per layer compute time will be high enough to give room for prefetching next layer's weight otherwise you will be just waiting for the next layer's weights to come from Host

English

0

1

205

Dan Woods@danveloper·9 Nis

@p_nawrot why do Llama-3.1-405B instead of something more modern like Qwen3.5-397B?

English

2

0

11

1.6K

Piotr Nawrot@p_nawrot·9 Nis

💾🚀 Run Llama-3.1-405B FP8 (410GB) on a single 180GB GPU #NVIDIA Introducing FlexTensor — NVIDIA's new library that makes host RAM a transparent extension of your GPU memory. One call: flextensor.offload(model). No model rewrites, no framework changes. Works with vLLM, HuggingFace, and any PyTorch model. Traditional offloading is reactive — move data when you run out of memory, stall the GPU while you wait. FlexTensor instead profiles your model's layer access patterns, then solves a knapsack optimization to schedule prefetches that overlap with compute. By the time a layer needs its weights, they're already there. The freed VRAM gives vLLM more room for KV cache — enabling 4x longer contexts (8K→32K) or 4x larger batches. For video generation (Wan2.2-T2V-A14B on GB200): +0.1% overhead. Handles FP8, custom Triton kernels, and multi-GPU. Profiles saved to disk — no warmup on repeated runs. Check it out: github.com/ai-dynamo/flex…

English

14

33

220

56K

Divyansh Singh

Keşfet