

Krithik Ramesh
2.6K posts

@KrithikTweets
AI + Math @MIT, compbio stuff @broadinstitute, prev: research @togethercompute




protein language models capture rich structural signals, but where that knowledge lives in the network is still unclear we show that small subnetworks inside PLMs encode structural concepts, from residues to folds journals.plos.org/ploscompbiol/a… @PLOSCompBiol work led by @riavinod_!



Introducing M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling We bring back non-linear recurrence to language modeling and show it's been held back by small state sizes, not by non-linearity itself. 📄 Paper: arxiv.org/abs/2603.14360 💻 Code: github.com/open-lm-engine… 🤗 Models: huggingface.co/collections/op…









many papers have reported Mamba results inconsistent with what we found internally. we finally traced down the cause, which comes from wrong initializations in very popular implementations (HF and FLA) the initialization makes a huge difference - see @MayankMish98 's report!



Welcoming Claude, @AnthropicAI's frontier AI model, as the team’s Official Thinking Partner! Through this partnership, Claude will be integrated across the entire Williams organisation—working alongside engineers and team strategists to support how the team thinks, plans, and performs. Read more about the partnership – and what it means for our mission to get back to the front of the grid - here: bit.ly/46sYJtg


Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.





Scaling scientific world models requires co-designing architectures, training objectives, and numerics. Today, we share the first posts in our series on low-precision pretraining, starting with NVIDIA's NVFP4 recipe for stable 4-bit training. Part 1: radicalnumerics.ai/blog/nvfp4-par… Part 2: radicalnumerics.ai/blog/nvfp4-par… We cover floating point fundamentals, heuristics, custom CUDA kernels, and stabilization techniques. Future entries will cover custom recipes and results on hybrid architectures.

