Krishna Mohan

3.1K posts

Krishna Mohan banner
Krishna Mohan

Krishna Mohan

@KMohan2006

Denoising present to hopefully get brighter future | loves diffusion models

GPU Katılım Mayıs 2024
378 Takip Edilen2.5K Takipçiler
Sabitlenmiş Tweet
Krishna Mohan
Krishna Mohan@KMohan2006·
Flash Attention 1 forward pass Cuda kernel
Krishna Mohan tweet mediaKrishna Mohan tweet mediaKrishna Mohan tweet mediaKrishna Mohan tweet media
English
6
13
238
15.3K
Ishaan
Ishaan@auto_grad_·
career update: i have joined @smallest_AI as a researcher engineer to work on improving small models and scale them. hoping to contribute a lot to the team and product!
Ishaan tweet media
English
58
5
569
33.6K
himanshu
himanshu@himanshustwts·
Career update: Excited to share that I have joined the incredible team at @smallest_AI to work on Research x Devrel! The team is cooking incredible small + efficient multi-modal models and it feels like an exciting time to push the frontier on scale!
himanshu tweet media
English
209
33
1.9K
63.6K
Krishna Mohan retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
330
2.1K
13.5K
4.9M
Krishna Mohan retweetledi
Neel Nanda
Neel Nanda@NeelNanda5·
Out of context reasoning is one of the most fascinating developments in the science of how LLMs work. This primer by @OwainEvans_UK, one of the main discoverers of the phenomena, is a great introduction
Neel Nanda tweet media
English
29
53
724
81.4K
Krishna Mohan retweetledi
Claude
Claude@claudeai·
1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.
Claude tweet media
English
1.2K
2K
25.2K
5.6M
Krishna Mohan
Krishna Mohan@KMohan2006·
Great blog on future of diffusion LLM
Krishna Mohan tweet media
English
1
0
11
349
Lazarz
Lazarz@Laz4rz·
i got this baby running in 4k 120hz on 5090 holy shit
Lazarz tweet media
English
5
0
56
2.2K
Krishna Mohan retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.
Sebastian Raschka tweet media
English
37
125
785
35.8K
Krishna Mohan
Krishna Mohan@KMohan2006·
Finally done with this A simple diffusion LLM
English
2
0
13
427
Krishna Mohan
Krishna Mohan@KMohan2006·
the beauty of loss
Krishna Mohan tweet media
English
1
0
13
591
Krishna Mohan retweetledi
NVIDIA HPC Developer
NVIDIA HPC Developer@NVIDIAHPCDev·
🎉 CUDA 13.2 just dropped, and GPU programming just got simpler. This release expands CUDA Tile support to Ampere and Ada GPUs while delivering a stronger CUDA Python stack for cluster-scale workloads. What's new: ✅ Install cuTile Python directly from PyPI: pip install cuda-tile ✅ Enhanced CUDA Python profiling and debugging across Numba-CUDA flows and Nsight tools ✅ Modern CUDA C++ and refreshed math libraries optimized for AI and HPC kernels Ready to accelerate your workflows? 📝 Read the technical deep dive: nvda.ws/4rZtAq1
English
16
87
842
53.4K
Krishna Mohan retweetledi
Pratyush Kumar
Pratyush Kumar@pratykumar·
📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…
English
209
1.3K
6.9K
728.2K
Krishna Mohan retweetledi
OpenAI
OpenAI@OpenAI·
We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool. openai.com/index/reasonin…
English
235
322
2.8K
342K
Krishna Mohan retweetledi
OpenAI
OpenAI@OpenAI·
GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.
OpenAI tweet media
English
2K
3.4K
23.7K
6.8M
Krishna Mohan retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re launching Nano Banana 2, built on the latest Gemini Flash model. 🍌 It’s state-of-the-art for creating and editing images, combining Pro-level capabilities with lightning-fast speed. 🧵
GIF
English
259
492
4.1K
1.3M
Ishaan
Ishaan@auto_grad_·
Drop 3/4: NanoTok The fastest BPE tokenizer in written in C++ and wrapped in Python for api reference. Upto 32x faster encode, 24x faster decode vs tiktoken & HF Tokenizers (varies for different model configs). Peaks at 33M tokens/sec encode, 69M tokens/sec decode. Benchmarked across GPT-2, CL100K, O200K, Qwen 2.5, Mistral & Gemma 2 on Wikipedia, Code, News, Math datasets. github.com/0xD4rky/nanotok
English
18
28
193
16.1K