Aarush Sah
2.6K posts

Aarush Sah
@aarush
Superintelligence @Meta. prev @NVIDIA LPU, @GroqInc



The Claude Mythos Preview system card is available here: anthropic.com/claude-mythos-…




1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵


“gpt2-large is too powerful to be publicly released” vibes





On the right: Vera Rubin. Middle: NVLink 6th Gen Left: The brand new Groq system


hmm I sort of disagree and I am bullish for TML. I think they really really have the top talents that I admire in the field, e.g. Jeremy and Sam for optimization, Songlin for Attn, Lia for MoE, Andrew for FSDPv2, and a bunch more folks it's just natural that it takes a while to publish good models: - dpsk starts to publish papers in 2023, even piblished dspkv2 (which I think is already amazing) in mid 2024 and nobody cares, until dpskv3 and r1 - msh took 10+ month to deliver a first not bad long ctx model in 2023 and be silent for the whole 2024 year, and starts to catch up gradually in 2025 - qwen starts to be a much better model than llama until qwen2.5, mid or late 2024, while the lab has been there forever it takes time to get infra and data done, but as long as you have good folks, and principled ways of doing science and experiments, some time or later, scaling laws will pay back










