Krithik Ramesh

2.6K posts

Krithik Ramesh banner
Krithik Ramesh

Krithik Ramesh

@KrithikTweets

AI + Math @MIT, compbio stuff @broadinstitute, prev: research @togethercompute

Katılım Şubat 2017
805 Takip Edilen872 Takipçiler
Sabitlenmiş Tweet
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac. read our paper here: arxiv.org/abs/2503.16351
Krithik Ramesh tweet media
English
18
145
730
120K
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
@DavidSHolz 2019 BME category, and I did something related to using Microsoft HoloLens as an AR navigation tool for spinal reconstruction surgery!
English
1
0
0
64
David
David@DavidSHolz·
@KrithikTweets o when were u at ISEF? What was your project?
English
1
0
2
211
David
David@DavidSHolz·
any international science fair (ISEF) alumni in the bay area? sponsoring a event for the org and we have a few extra spots!
English
22
3
72
12.6K
Krithik Ramesh retweetledi
Nicholas Roberts
Nicholas Roberts@nick11roberts·
That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it? As it turns out, we have a brand new scaling law for that! 🧵 [1/n]
Nicholas Roberts tweet media
English
11
53
359
66.1K
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
Work like this is genuinely so exciting! ESM ushered in a wave of PLMs and our understanding of what biological properties they learn is rather poor. The diversity and quality of evaluations in this work is refreshing.
Ava Amini@avapamini

protein language models capture rich structural signals, but where that knowledge lives in the network is still unclear we show that small subnetworks inside PLMs encode structural concepts, from residues to folds journals.plos.org/ploscompbiol/a… @PLOSCompBiol work led by @riavinod_!

English
0
2
33
5.1K
Krithik Ramesh retweetledi
Jack Zhang
Jack Zhang@jcz42·
We made Muon run up to 2x faster for free! Introducing Gram Newton-Schulz: a mathematically equivalent but computationally faster Newton-Schulz algorithm for polar decomposition. Gram Newton-Schulz rewrites Newton-Schulz such that instead of iterating on the expensive rectangular X matrix, we iterate on the small, square, symmetric XX^T Gram matrix to reduce FLOPs. This allows us to make more use of fast symmetric GEMM kernels on Hopper and Blackwell, halving the FLOPs of each of those GEMMs. Gram Newton-Schulz is a drop-in replacement of Newton-Schulz for your Muon use case: we see validation perplexity preserved within 0.01, and share our (long!) journey stabilizing this algorithm and ensuring that training quality is preserved above all else. This was a super fun project with @noahamsel, @berlinchen, and @tri_dao that spanned theory, numerical analysis, and ML systems! Blog and codebase linked below 🧵
Jack Zhang tweet media
English
17
164
1K
208.7K
Maximilian Beck
Maximilian Beck@maxmbeck·
👨‍🎓Last week, I successfully defended my PhD thesis - an incredibly exciting and rewarding milestone after 3.5 years of work on xLSTM: Recurrent Neural Network Architectures for Scalable and Efficient Large Language Models
Maximilian Beck tweet media
English
16
3
139
8.6K
Krithik Ramesh retweetledi
Tri Dao
Tri Dao@tri_dao·
Nonlinear RNNs seem to do sth genuinely different from attn and linear RNNs/SSMs. By themselves they already do quite well w the right parametrization, but just one nonlinear RNN layers substantially improve transformer-mamba/deltanet hybrid!
Mayank Mishra@MayankMish98

Introducing M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling We bring back non-linear recurrence to language modeling and show it's been held back by small state sizes, not by non-linearity itself. 📄 Paper: arxiv.org/abs/2603.14360 💻 Code: github.com/open-lm-engine… 🤗 Models: huggingface.co/collections/op…

English
4
45
338
32.8K
Krithik Ramesh retweetledi
Albert Gu
Albert Gu@_albertgu·
The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
Albert Gu tweet media
English
38
310
1.6K
433.6K
Krithik Ramesh retweetledi
Ava Amini
Ava Amini@avapamini·
designing substrates for enzymes like proteases is a combinatorial problem. tackling this, we built CleaveNet: a deep learning pipeline that designs peptide substrates with targeted efficiency & selectivity, validated end-to-end in the lab. nature.com/articles/s4146… @NatureComms
Ava Amini tweet media
English
4
24
126
17K
Krithik Ramesh retweetledi
Ted Zadouri
Ted Zadouri@tedzadouri·
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Ted Zadouri tweet media
English
7
131
783
225.1K
Krithik Ramesh retweetledi
Adam Zweiger
Adam Zweiger@AdamZweiger·
Fun fact: Back in 2014, Demis had a red line condition for any potential acquisition of DeepMind: "no technology coming out of DeepMind will be used for military or intelligence purposes." Google accepting this more eagerly was part of why Demis chose them over Facebook. This red line is even broader than Dario's (no mass surveillance or fully autonomous weapons), though it was quietly removed by Google 1 year ago.
English
7
38
997
81.1K
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
I’ve had my suspicions about this when I looked at BioML papers where mamba variants underperformed against certain long conv models. The other one I’ve seen commonly is not ensuring the A matrix stays in fp32. Super cool catch, and glad it’s been caught!
Albert Gu@_albertgu

many papers have reported Mamba results inconsistent with what we found internally. we finally traced down the cause, which comes from wrong initializations in very popular implementations (HF and FLA) the initialization makes a huge difference - see @MayankMish98 's report!

English
0
1
19
3.4K
Krithik Ramesh retweetledi
Fred Zhangzhi Peng
Fred Zhangzhi Peng@pengzhangzhi1·
🚨 New paper! We introduce a planner-aware training tweak to diffusion language models. ⚡ One-line-of-code change to the loss 💡 Fixes training–inference mismatch 📈 Strong gains in protein, text, and code generation arxiv.org/abs/2509.23405 (1/n)
Fred Zhangzhi Peng tweet media
English
2
18
98
19.5K
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
Surely we win the constructors now…
Atlassian Williams F1 Team@WilliamsF1

Welcoming Claude, @AnthropicAI's frontier AI model, as the team’s Official Thinking Partner! Through this partnership, Claude will be integrated across the entire Williams organisation—working alongside engineers and team strategists to support how the team thinks, plans, and performs. Read more about the partnership – and what it means for our mission to get back to the front of the grid - here: bit.ly/46sYJtg

English
0
0
2
295
Krithik Ramesh retweetledi
Ricursive Intelligence
Ricursive Intelligence@RicursiveAI·
Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com
English
49
150
1.1K
485.3K
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac. read our paper here: arxiv.org/abs/2503.16351
Krithik Ramesh tweet media
English
18
145
730
120K
Krithik Ramesh
Krithik Ramesh@KrithikTweets·
I love these blog posts. The intuition, the theory, and the implementation are provided so clearly. Highly recommend reading!
Radical Numerics@RadicalNumerics

Scaling scientific world models requires co-designing architectures, training objectives, and numerics. Today, we share the first posts in our series on low-precision pretraining, starting with NVIDIA's NVFP4 recipe for stable 4-bit training. Part 1: radicalnumerics.ai/blog/nvfp4-par… Part 2: radicalnumerics.ai/blog/nvfp4-par… We cover floating point fundamentals, heuristics, custom CUDA kernels, and stabilization techniques. Future entries will cover custom recipes and results on hybrid architectures.

English
0
0
10
1.5K
Krithik Ramesh retweetledi
Albert Gu
Albert Gu@_albertgu·
a silly milestone but one nonetheless - 20k citations! although i don't think it's a very useful metric (i got my position at @CarnegieMellon with ~700 citations iirc), i'm mostly proud of the fact that it's predominantly first author papers that i spent years pushing on a single thread on 😎 now for christmas could santa help my X followers hit 20k too 🥹🎄
Albert Gu tweet media
English
24
15
675
52.2K