Kyle Kastner

9.4K posts

Kyle Kastner

Kyle Kastner

@kastnerkyle

computers and music are fun

Out of the city, Massachusetts Sumali Ocak 2011
3.2K Sinusundan3.3K Mga Tagasunod
Naka-pin na Tweet
Kyle Kastner
Kyle Kastner@kastnerkyle·
Really enjoying Lyria 3 (via Gemini App). It's a fun model to create with! Glad to be at Google, and to see all the neat tracks people are making. Check out the prompt guide too if you are interested in trying it out deepmind.google/models/lyria/p…
English
1
1
4
612
Kyle Kastner nag-retweet
Elorian AI
Elorian AI@ElorianAI·
Excited to introduce Elorian. Together with an incredible team of researchers, we're building AI that natively understands the visual world — not by translating images into text, but by reasoning through them directly. elorian.ai youtu.be/YlvfNpOMeOY
YouTube video
YouTube
English
4
12
43
9.6K
Kyle Kastner nag-retweet
yamakatz
yamakatz@kyama0321·
👀👂 > [2603.19195v1] How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation arxiv.org/abs/2603.19195
English
0
1
7
688
Kyle Kastner nag-retweet
Jason Weston
Jason Weston@jaseweston·
🔗Learning to Aggregate through Online RL🎯 ParaGator🔀🐊: strong parallel reasoning aggregation Core claim: aggregation works best when training both stages together: - LLM generator should produce diverse candidates - LLM aggregator should synthesize into final answer ParaGator trains candidate generation with pass@k, and aggregation with pass@1 on-policy, end-to-end. Stops mode collapse/off-policy mismatch. Improves math & scientific reasoning. 🚀🏆 Read more in the blog post: facebookresearch.github.io/RAM/blogs/para…
Jason Weston tweet media
English
3
24
121
10.3K
Kyle Kastner nag-retweet
Lucas Maes
Lucas Maes@lucasmaes_·
JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io
English
104
540
3.9K
914.6K
Kyle Kastner nag-retweet
Phillip Isola
Phillip Isola@phillip_isola·
A few clarifications to common q's about our thickets paper: 1. Is this just ensembling? Seed averaging? Bagging? ... 2. Is this just Qwen? 3. Is it K times slower inference? 4. RL is dead? Post-training is dead?
Phillip Isola@phillip_isola

Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: arxiv.org/abs/2603.12228 Web: thickets.mit.edu 1/

English
4
20
265
44.4K
Kyle Kastner nag-retweet
Eric W. Tramel
Eric W. Tramel@fujikanaeda·
Nemotron 3 Super 120B-A12B is good, fast, open, and it’s out :) Awesome demonstration of Nvidia's commitment to accelerating AI everywhere. It’s been really cool to work up and down the stack on Nemotron. I wanted to share some of my favorite parts from our tech report 🧵
Eric W. Tramel tweet mediaEric W. Tramel tweet media
English
3
15
87
5.3K
Kyle Kastner nag-retweet
Emre Can Acikgoz
Emre Can Acikgoz@emrecanacikgoz·
Can LLMs self-evolve into general-purpose tool-calling agents without any external data? Yes. Introducing "Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data"! ♾️ Initialized from the same LLM, Tool-R0 co-evolves a Generator and Solver through self-play RL; training general-purpose tool-calling agents entirely from scratch with zero data. (1/n)
Emre Can Acikgoz tweet media
English
11
48
289
25.4K
Kyle Kastner nag-retweet
templar
templar@tplr_ai·
We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n
English
218
939
6.2K
1.8M
Kyle Kastner nag-retweet
Suhas Kotha
Suhas Kotha@kothasuhas·
to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)
Suhas Kotha tweet media
English
15
64
498
71.4K
Kyle Kastner nag-retweet
t.toda
t.toda@Trtd6Trtd·
arxiv.org/abs/2603.01683 LLMの事後学習で問題になる破滅的忘却に対処する手法 面白かったのはDPOにElastic Tetherという、既に知っていることについては忘れにくくする、ある種の正則化のような効果があることを見つけたとこ
t.toda tweet media
日本語
0
16
119
6.1K
Kyle Kastner nag-retweet
Davis Blalock
Davis Blalock@davisblalock·
🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 arxiv.org/abs/2602.23349 A bunch of cool ideas make this possible: [1/n]
Davis Blalock tweet media
English
30
225
1.5K
214.4K
Kyle Kastner nag-retweet
Tanishq Kumar
Tanishq Kumar@tanishqkumar07·
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
English
134
453
4.1K
605.4K
Kyle Kastner nag-retweet
NotebookLM
NotebookLM@NotebookLM·
Introducing Cinematic Video Overviews, the next evolution of the NotebookLM Studio. Unlike standard templates, these are powered by a novel combination of our most advanced models to create bespoke, immersive videos from your sources. Rolling out now for Ultra users in English!
English
493
1.7K
14.9K
3.5M
Kyle Kastner nag-retweet
Yuma Koizumi
Yuma Koizumi@yuma_koizumi·
GDM Tokyo is hiring🗼 Want to build the future of Gemini Audio with us? 🔥 Beyond the Paper: Research that scales to billions. 👀 The Focus: Solving multilingual & APAC AI nuances. 🏯 The Life: Tokyo as a "force multiplier" for your creativity job-boards.greenhouse.io/deepmind/jobs/…
English
1
14
79
9.1K
Kyle Kastner nag-retweet
Yuma Koizumi
Yuma Koizumi@yuma_koizumi·
🎙️ Google DeepMind Tokyo is hiring! 日本で、世界最高峰のAIを自らの手で作りませんか?🗼🌏 私のチームでは Gemini Live 等の核となる音声対話技術、そしてAPAC拠点を活かした多言語・多文化LLMの研究を推進する Research Scientist を募集中です。 Apply here: job-boards.greenhouse.io/deepmind/jobs/…
日本語
2
69
310
36.9K
Kyle Kastner nag-retweet
StepFun
StepFun@StepFun_ai·
"can we get the base model?" sure. here's two. "can we get the code?" sure. here's SteptronOSS. "what about the SFT data?" coming soon. maximum sincerity, minimum barriers. - Step 3.5 Flash Base — pretrained foundation - Step 3.5 Flash Base-Midtrain — code, agents & long-context - SteptronOSS — open-sourced, ready for your custom workflows - SFT Data — coming soon for reference not just the final checkpoint — a customizable pipeline. 🤗 huggingface.co/stepfun-ai/Ste… 🤗 huggingface.co/stepfun-ai/Ste… 💻 github.com/stepfun-ai/Ste…
English
33
120
1.2K
144.2K
Kyle Kastner nag-retweet
Max Li 李赵硕
Max Li 李赵硕@mli0603·
I've been debugging RoPE recently and kept getting tripped up by details that most explanations gloss over. So I wrote a deep dive. "Understanding RoPE: From Rotary Embeddings to Context Extension" mli0603.notion.site/Understanding-… The blog covers: • Full RoPE derivation from rotation matrices • A clean proof of why RoPE's attention decays with distance (and when it breaks) • The π boundary (RoPE's Nyquist limit) • NTK-aware scaling derivation • Dynamic NTK • YaRN's frequency ramp + attention scaling • Reference PyTorch code Hope it helps! Feedback welcome!
English
8
58
538
60.6K
Kyle Kastner nag-retweet
Gokul Swamy
Gokul Swamy@g_k_swamy·
It took a few years of deep thinking, but I'm super excited to finally share PROSPER: a beautiful, regression-based algorithm for RL from *rubric rewards* that robustly handles the *inconsistent feedback* that LLM judges provide. Let's go Back to Black(well)! 🧵(1/n)
Gokul Swamy tweet media
English
3
33
271
50.9K