Kyle Kastner

9.4K posts

Kyle Kastner

@kastnerkyle

computers and music are fun

Out of the city, Massachusetts Sumali Ocak 2011

3.2K Sinusundan3.3K Mga Tagasunod

Naka-pin na Tweet

Kyle Kastner@kastnerkyle·20 Şub

Really enjoying Lyria 3 (via Gemini App). It's a fun model to create with! Glad to be at Google, and to see all the neat tracks people are making. Check out the prompt guide too if you are interested in trying it out deepmind.google/models/lyria/p…

English

612

Kyle Kastner nag-retweet

Elorian AI@ElorianAI·1d

Excited to introduce Elorian. Together with an incredible team of researchers, we're building AI that natively understands the visual world — not by translating images into text, but by reasoning through them directly. elorian.ai youtu.be/YlvfNpOMeOY

YouTube

English

9.6K

Kyle Kastner nag-retweet

yamakatz@kyama0321·30 Mar

👀👂　> [2603.19195v1] How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation arxiv.org/abs/2603.19195

English

688

Kyle Kastner nag-retweet

Jason Weston@jaseweston·30 Mar

🔗Learning to Aggregate through Online RL🎯 ParaGator🔀🐊: strong parallel reasoning aggregation Core claim: aggregation works best when training both stages together: - LLM generator should produce diverse candidates - LLM aggregator should synthesize into final answer ParaGator trains candidate generation with pass@k, and aggregation with pass@1 on-policy, end-to-end. Stops mode collapse/off-policy mismatch. Improves math & scientific reasoning. 🚀🏆 Read more in the blog post: facebookresearch.github.io/RAM/blogs/para…

English

121

10.3K

Kyle Kastner nag-retweet

Andrew Carr 🤸@andrew_n_carr·30 Mar

Here's a cool take on auto research with a bit more scaffolding. Context on restart dramatically improves convergence results. A good read! chrisbrousseau2.substack.com/p/i-left-an-ai…

English

3.1K

Kyle Kastner nag-retweet

Google AI Studio@GoogleAIStudio·25 Mar

x.com/i/article/2036…

ZXX

441

60.9K

Kyle Kastner nag-retweet

Lucas Maes@lucasmaes_·23 Mar

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

English

104

540

3.9K

914.6K

Kyle Kastner nag-retweet

Phillip Isola@phillip_isola·16 Mar

A few clarifications to common q's about our thickets paper: 1. Is this just ensembling? Seed averaging? Bagging? ... 2. Is this just Qwen? 3. Is it K times slower inference? 4. RL is dead? Post-training is dead?

Phillip Isola@phillip_isola

Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: arxiv.org/abs/2603.12228 Web: thickets.mit.edu 1/

English

265

44.4K

Kyle Kastner nag-retweet

Eric W. Tramel@fujikanaeda·11 Mar

Nemotron 3 Super 120B-A12B is good, fast, open, and it’s out :) Awesome demonstration of Nvidia's commitment to accelerating AI everywhere. It’s been really cool to work up and down the stack on Nemotron. I wanted to share some of my favorite parts from our tech report 🧵

English

5.3K

Kyle Kastner nag-retweet

Emre Can Acikgoz@emrecanacikgoz·3 Mar

Can LLMs self-evolve into general-purpose tool-calling agents without any external data? Yes. Introducing "Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data"! ♾️ Initialized from the same LLM, Tool-R0 co-evolves a Generator and Solver through self-play RL; training general-purpose tool-calling agents entirely from scratch with zero data. (1/n)

English

289

25.4K

Kyle Kastner nag-retweet

templar@tplr_ai·10 Mar

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English

218

939

6.2K

1.8M

Kyle Kastner nag-retweet

Suhas Kotha@kothasuhas·6 Mar

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English

498

71.4K

Kyle Kastner nag-retweet

t.toda@Trtd6Trtd·7 Mar

arxiv.org/abs/2603.01683 LLMの事後学習で問題になる破滅的忘却に対処する手法面白かったのはDPOにElastic Tetherという、既に知っていることについては忘れにくくする、ある種の正則化のような効果があることを見つけたとこ

日本語

119

6.1K

Kyle Kastner nag-retweet

Davis Blalock@davisblalock·4 Mar

🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 arxiv.org/abs/2602.23349 A bunch of cool ideas make this possible: [1/n]

English

225

1.5K

214.4K

Kyle Kastner nag-retweet

Tanishq Kumar@tanishqkumar07·4 Mar

I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.

English

134

453

4.1K

605.4K

Kyle Kastner nag-retweet

NotebookLM@NotebookLM·4 Mar

Introducing Cinematic Video Overviews, the next evolution of the NotebookLM Studio. Unlike standard templates, these are powered by a novel combination of our most advanced models to create bespoke, immersive videos from your sources. Rolling out now for Ultra users in English!

English

493

1.7K

14.9K

3.5M

Kyle Kastner nag-retweet

Yuma Koizumi@yuma_koizumi·3 Mar

GDM Tokyo is hiring🗼 Want to build the future of Gemini Audio with us? 🔥 Beyond the Paper: Research that scales to billions. 👀 The Focus: Solving multilingual & APAC AI nuances. 🏯 The Life: Tokyo as a "force multiplier" for your creativity job-boards.greenhouse.io/deepmind/jobs/…

English

9.1K

Kyle Kastner nag-retweet

Yuma Koizumi@yuma_koizumi·3 Mar

🎙️ Google DeepMind Tokyo is hiring! 日本で、世界最高峰のAIを自らの手で作りませんか？🗼🌏 私のチームでは Gemini Live 等の核となる音声対話技術、そしてAPAC拠点を活かした多言語・多文化LLMの研究を推進する Research Scientist を募集中です。 Apply here: job-boards.greenhouse.io/deepmind/jobs/…

日本語

310

36.9K

Kyle Kastner nag-retweet

StepFun@StepFun_ai·2 Mar

"can we get the base model?" sure. here's two. "can we get the code?" sure. here's SteptronOSS. "what about the SFT data?" coming soon. maximum sincerity, minimum barriers. - Step 3.5 Flash Base — pretrained foundation - Step 3.5 Flash Base-Midtrain — code, agents & long-context - SteptronOSS — open-sourced, ready for your custom workflows - SFT Data — coming soon for reference not just the final checkpoint — a customizable pipeline. 🤗 huggingface.co/stepfun-ai/Ste… 🤗 huggingface.co/stepfun-ai/Ste… 💻 github.com/stepfun-ai/Ste…

English

120

1.2K

144.2K

Kyle Kastner nag-retweet

Max Li 李赵硕@mli0603·1 Mar

I've been debugging RoPE recently and kept getting tripped up by details that most explanations gloss over. So I wrote a deep dive. "Understanding RoPE: From Rotary Embeddings to Context Extension" mli0603.notion.site/Understanding-… The blog covers: • Full RoPE derivation from rotation matrices • A clean proof of why RoPE's attention decays with distance (and when it breaks) • The π boundary (RoPE's Nyquist limit) • NTK-aware scaling derivation • Dynamic NTK • YaRN's frequency ramp + attention scaling • Reference PyTorch code Hope it helps! Feedback welcome!

English

538

60.6K

Kyle Kastner nag-retweet

Gokul Swamy@g_k_swamy·27 Şub

It took a few years of deep thinking, but I'm super excited to finally share PROSPER: a beautiful, regression-based algorithm for RL from *rubric rewards* that robustly handles the *inconsistent feedback* that LLM judges provide. Let's go Back to Black(well)! 🧵(1/n)

English

271

50.9K

Tuklasin

@percyliang @tri_dao @avnermay @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates