Krithik Ramesh

2.6K posts

Krithik Ramesh

@KrithikTweets

AI + Math @MIT, compbio stuff @broadinstitute, prev: research @togethercompute

Katılım Şubat 2017

805 Takip Edilen872 Takipçiler

Sabitlenmiş Tweet

Krithik Ramesh@KrithikTweets·21 Mar

🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac. read our paper here: arxiv.org/abs/2503.16351

English

145

730

120K

Krithik Ramesh@KrithikTweets·1d

@DavidSHolz 2019 BME category, and I did something related to using Microsoft HoloLens as an AR navigation tool for spinal reconstruction surgery!

English

David@DavidSHolz·1d

@KrithikTweets o when were u at ISEF? What was your project?

English

207

David@DavidSHolz·1d

any international science fair (ISEF) alumni in the bay area? sponsoring a event for the org and we have a few extra spots!

English

12.4K

Krithik Ramesh retweetledi

Nicholas Roberts@nick11roberts·6 Nis

That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it? As it turns out, we have a brand new scaling law for that! 🧵 [1/n]

English

359

66.1K

Krithik Ramesh@KrithikTweets·31 Mar

Work like this is genuinely so exciting! ESM ushered in a wave of PLMs and our understanding of what biological properties they learn is rather poor. The diversity and quality of evaluations in this work is refreshing.

Ava Amini@avapamini

protein language models capture rich structural signals, but where that knowledge lives in the network is still unclear we show that small subnetworks inside PLMs encode structural concepts, from residues to folds journals.plos.org/ploscompbiol/a… @PLOSCompBiol work led by @riavinod_!

English

Krithik Ramesh retweetledi

Jack Zhang@jcz42·30 Mar

We made Muon run up to 2x faster for free! Introducing Gram Newton-Schulz: a mathematically equivalent but computationally faster Newton-Schulz algorithm for polar decomposition. Gram Newton-Schulz rewrites Newton-Schulz such that instead of iterating on the expensive rectangular X matrix, we iterate on the small, square, symmetric XX^T Gram matrix to reduce FLOPs. This allows us to make more use of fast symmetric GEMM kernels on Hopper and Blackwell, halving the FLOPs of each of those GEMMs. Gram Newton-Schulz is a drop-in replacement of Newton-Schulz for your Muon use case: we see validation perplexity preserved within 0.01, and share our (long!) journey stabilizing this algorithm and ensuring that training quality is preserved above all else. This was a super fun project with @noahamsel, @berlinchen, and @tri_dao that spanned theory, numerical analysis, and ML systems! Blog and codebase linked below 🧵

English

164

208.7K

Krithik Ramesh@KrithikTweets·27 Mar

@maxmbeck Congratulations Max!

English

Maximilian Beck@maxmbeck·27 Mar

👨‍🎓Last week, I successfully defended my PhD thesis - an incredibly exciting and rewarding milestone after 3.5 years of work on xLSTM: Recurrent Neural Network Architectures for Scalable and Efficient Large Language Models

English

139

8.6K

Krithik Ramesh retweetledi

Tri Dao@tri_dao·19 Mar

Nonlinear RNNs seem to do sth genuinely different from attn and linear RNNs/SSMs. By themselves they already do quite well w the right parametrization, but just one nonlinear RNN layers substantially improve transformer-mamba/deltanet hybrid!

Mayank Mishra@MayankMish98

Introducing M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling We bring back non-linear recurrence to language modeling and show it's been held back by small state sizes, not by non-linearity itself. 📄 Paper: arxiv.org/abs/2603.14360 💻 Code: github.com/open-lm-engine… 🤗 Models: huggingface.co/collections/op…

English

338

32.8K

Krithik Ramesh retweetledi

Mayank Mishra@MayankMish98·19 Mar

English

109

512

142K

Krithik Ramesh retweetledi

Albert Gu@_albertgu·17 Mar

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

English

310

1.6K

433.5K

Krithik Ramesh retweetledi

Ava Amini@avapamini·9 Mar

designing substrates for enzymes like proteases is a combinatorial problem. tackling this, we built CleaveNet: a deep learning pipeline that designs peptide substrates with targeted efficiency & selectivity, validated end-to-end in the lab. nature.com/articles/s4146… @NatureComms

English

126

17K

Krithik Ramesh retweetledi

Ted Zadouri@tedzadouri·5 Mar

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

131

783

225.1K

Krithik Ramesh retweetledi

Adam Zweiger@AdamZweiger·28 Şub

Fun fact: Back in 2014, Demis had a red line condition for any potential acquisition of DeepMind: "no technology coming out of DeepMind will be used for military or intelligence purposes." Google accepting this more eagerly was part of why Demis chose them over Facebook. This red line is even broader than Dario's (no mass surveillance or fully autonomous weapons), though it was quietly removed by Google 1 year ago.

English

997

81.1K

Krithik Ramesh@KrithikTweets·26 Şub

I’ve had my suspicions about this when I looked at BioML papers where mamba variants underperformed against certain long conv models. The other one I’ve seen commonly is not ensuring the A matrix stays in fp32. Super cool catch, and glad it’s been caught!

Albert Gu@_albertgu

many papers have reported Mamba results inconsistent with what we found internally. we finally traced down the cause, which comes from wrong initializations in very popular implementations (HF and FLA) the initialization makes a huge difference - see @MayankMish98 's report!

English

3.4K

Krithik Ramesh retweetledi

Fred Zhangzhi Peng@pengzhangzhi1·30 Eyl

🚨 New paper! We introduce a planner-aware training tweak to diffusion language models. ⚡ One-line-of-code change to the loss 💡 Fixes training–inference mismatch 📈 Strong gains in protein, text, and code generation arxiv.org/abs/2509.23405 (1/n)

English

19.5K

Krithik Ramesh@KrithikTweets·3 Şub

Surely we win the constructors now…

Atlassian Williams F1 Team@WilliamsF1

Welcoming Claude, @AnthropicAI's frontier AI model, as the team’s Official Thinking Partner! Through this partnership, Claude will be integrated across the entire Williams organisation—working alongside engineers and team strategists to support how the team thinks, plans, and performs. Read more about the partnership – and what it means for our mission to get back to the front of the grid - here: bit.ly/46sYJtg

English

295

Krithik Ramesh retweetledi

Ricursive Intelligence@RicursiveAI·2 Ara

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English

150

1.1K

485.3K

Krithik Ramesh retweetledi

Benjamin F Spector@bfspector·28 Oca

Very proud of the team we've assembled! Back to work!

Flapping Airplanes@flappyairplanes

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.

English

255

45.6K

Krithik Ramesh@KrithikTweets·22 Oca

@JonZLuo @EziraYimerWolle Hi there! Thank you so much for following up here. We’ve been working on a proper release, please find an early version of our code here: github.com/lyralabs-admin…

English

162

Jon Luo@JonZLuo·22 Oca

@KrithikTweets @EziraYimerWolle Hi Krithik, do you still intend to upload the codebase?

English

Krithik Ramesh@KrithikTweets·21 Mar

English

145

730

120K

Krithik Ramesh@KrithikTweets·13 Oca

I love these blog posts. The intuition, the theory, and the implementation are provided so clearly. Highly recommend reading!

Radical Numerics@RadicalNumerics

Scaling scientific world models requires co-designing architectures, training objectives, and numerics. Today, we share the first posts in our series on low-precision pretraining, starting with NVIDIA's NVFP4 recipe for stable 4-bit training. Part 1: radicalnumerics.ai/blog/nvfp4-par… Part 2: radicalnumerics.ai/blog/nvfp4-par… We cover floating point fundamentals, heuristics, custom CUDA kernels, and stabilization techniques. Future entries will cover custom recipes and results on hybrid architectures.

English

1.5K

Krithik Ramesh retweetledi

Albert Gu@_albertgu·23 Ara

a silly milestone but one nonetheless - 20k citations! although i don't think it's a very useful metric (i got my position at @CarnegieMellon with ~700 citations iirc), i'm mostly proud of the fact that it's predominantly first author papers that i spent years pushing on a single thread on 😎 now for christmas could santa help my X followers hit 20k too 🥹🎄

English

675

52.2K

Keşfet

@DavidSHolz @noahamsel @berlinchen @tri_dao @maxmbeck @aakash_lahoti @kevinyli_ @_berlinchen