Krishna Mohan

3.1K posts

Krishna Mohan

@KMohan2006

Denoising present to hopefully get brighter future | loves diffusion models

GPU Katılım Mayıs 2024

378 Takip Edilen2.5K Takipçiler

Sabitlenmiş Tweet

Krishna Mohan@KMohan2006·18 Şub

Flash Attention 1 forward pass Cuda kernel

English

238

15.3K

Krishna Mohan retweetledi

maharshi@maharshii·4d

x.com/i/article/2034…

ZXX

13.5K

Krishna Mohan@KMohan2006·18 Mar

@auto_grad_ @smallest_AI Congrats bro

English

Ishaan@auto_grad_·18 Mar

career update: i have joined @smallest_AI as a researcher engineer to work on improving small models and scale them. hoping to contribute a lot to the team and product!

English

569

33.6K

Krishna Mohan@KMohan2006·16 Mar

@himanshustwts @smallest_AI Congrats bro

English

109

himanshu@himanshustwts·16 Mar

Career update: Excited to share that I have joined the incredible team at @smallest_AI to work on Research x Devrel! The team is cooking incredible small + efficient multi-modal models and it feels like an exciting time to push the frontier on scale!

English

209

1.9K

63.6K

Krishna Mohan retweetledi

Kimi.ai@Kimi_Moonshot·16 Mar

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

330

2.1K

13.5K

4.9M

Krishna Mohan retweetledi

Sebastian Raschka@rasbt·15 Mar

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

201

1.5K

8.2K

706.6K

Krishna Mohan retweetledi

Neel Nanda@NeelNanda5·14 Mar

Out of context reasoning is one of the most fascinating developments in the science of how LLMs work. This primer by @OwainEvans_UK, one of the main discoverers of the phenomena, is a great introduction

English

724

81.4K

Krishna Mohan retweetledi

Claude@claudeai·13 Mar

1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.

English

1.2K

25.2K

5.6M

Krishna Mohan@KMohan2006·13 Mar

dimitri.ml/posts/why-diff…

ZXX

Krishna Mohan@KMohan2006·13 Mar

Great blog on future of diffusion LLM

English

349

Krishna Mohan@KMohan2006·12 Mar

@Laz4rz Is it cod mw1

English

168

Lazarz@Laz4rz·12 Mar

i got this baby running in 4k 120hz on 5090 holy shit

English

2.2K

Krishna Mohan retweetledi

Sebastian Raschka@rasbt·12 Mar

Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.

English

125

785

35.8K

Krishna Mohan@KMohan2006·12 Mar

Finally done with this A simple diffusion LLM

English

427

Krishna Mohan@KMohan2006·11 Mar

the beauty of loss

English

591

Krishna Mohan retweetledi

NVIDIA HPC Developer@NVIDIAHPCDev·10 Mar

🎉 CUDA 13.2 just dropped, and GPU programming just got simpler. This release expands CUDA Tile support to Ampere and Ada GPUs while delivering a stronger CUDA Python stack for cluster-scale workloads. What's new: ✅ Install cuTile Python directly from PyPI: pip install cuda-tile ✅ Enhanced CUDA Python profiling and debugging across Numba-CUDA flows and Nsight tools ✅ Modern CUDA C++ and refreshed math libraries optimized for AI and HPC kernels Ready to accelerate your workflows? 📝 Read the technical deep dive: nvda.ws/4rZtAq1

English

842

53.4K

Krishna Mohan retweetledi

Pratyush Kumar@pratykumar·6 Mar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English

209

1.3K

6.9K

728.2K

Krishna Mohan retweetledi

OpenAI@OpenAI·5 Mar

We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool. openai.com/index/reasonin…

English

235

322

2.8K

342K

Krishna Mohan retweetledi

OpenAI@OpenAI·5 Mar

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English

3.4K

23.7K

6.8M

Krishna Mohan retweetledi

Google DeepMind@GoogleDeepMind·26 Şub

We’re launching Nano Banana 2, built on the latest Gemini Flash model. 🍌 It’s state-of-the-art for creating and editing images, combining Pro-level capabilities with lightning-fast speed. 🧵

GIF

English

259

492

4.1K

1.3M

Krishna Mohan@KMohan2006·25 Şub

@auto_grad_ Great work bro

English

121

Ishaan@auto_grad_·25 Şub

Drop 3/4: NanoTok The fastest BPE tokenizer in written in C++ and wrapped in Python for api reference. Upto 32x faster encode, 24x faster decode vs tiktoken & HF Tokenizers (varies for different model configs). Peaks at 33M tokens/sec encode, 69M tokens/sec decode. Benchmarked across GPT-2, CL100K, O200K, Qwen 2.5, Mistral & Gemma 2 on Wikipedia, Code, News, Math datasets. github.com/0xD4rky/nanotok

English

193

16.1K

Krishna Mohan retweetledi

Philip Kiely@philipkiely·23 Şub

Inference Engineering launches today. baseten.com/inference-engi…

English

187

216

2.2K

1.3M

Keşfet

@auto_grad_ @smallest_AI @himanshustwts @OwainEvans_UK @Laz4rz @elonmusk @BarackObama @taylorswift13