Nadav Timor

300 posts

Nadav Timor

@NadavTimor

AI inference, speculative decoding, open source. Built novel decoding algorithms – default in Hugging Face Transformers (155+ ⭐). Making AI faster + cheaper

nyc Katılım Aralık 2017

7.5K Takip Edilen1.3K Takipçiler

Sabitlenmiş Tweet

Nadav Timor@NadavTimor·18 Tem

Humbled that @TheRegister covered our oral presentation at #icml25! ⚡️

The Register@TheRegister

Boffins detail new algorithms to losslessly boost AI perf by up to 2.8x dlvr.it/TLyjVv

English

5.6K

Nadav Timor@NadavTimor·5 Mar

@feulf @tanishqkumar07 @tri_dao @avnermay Thanks for the ping, Fed! It’s definitely exploring a very similar space to our distributed speculative decoding paper from iclr25 (openreview.net/pdf?id=cJd1BgZ…). Great to see Tanishq, Tri, and Avner working on this too!

English

Federico Ulfo@feulf·5 Mar

@tanishqkumar07 @tri_dao @avnermay @NadavTimor i recall you doing a presentation on it 2 years ago.. is this your project?

English

Tanishq Kumar@tanishqkumar07·4 Mar

I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.

English

134

454

4.1K

603.9K

Nadav Timor retweetledi

Yilun Kuang@KuangYilun·3 Şub

How do we build sparsity into JEPA representations by design, while preserving task-relevant information? Introducing Rectified LpJEPA, a JEPA architecture that learns sparse, non-negative, informative representations through principled distributional regularization. 📐 📄 Paper: arxiv.org/abs/2602.01456 💻 Code: github.com/YilunKuang/rec… 📝 Blog: yilunkuang.github.io/blog/2026/rect… (1/n)

English

524

50.4K

Nadav Timor@NadavTimor·18 Oca

@miramurati @soumithchintala @soumithchintala @johnschulman2 what a great team you’re building there. congrats!

English

244

Mira Murati@miramurati·15 Oca

We have parted ways with Barret Zoph. Soumith Chintala will be the new CTO of Thinking Machines. He is a brilliant and seasoned leader who has made important contributions to the AI field for over a decade, and he’s been a major contributor to our team. We could not be more excited to have him take on this new responsibility.

English

250

136

4.2K

989.7K

Nadav Timor@NadavTimor·1 Oca

@SonglinYang4 @MITEECS @thinkymachines congrats!

English

949

Songlin Yang@SonglinYang4·1 Oca

Life update at the end of 2025: I’ve completed my PhD at @MITEECS and joined @thinkymachines to work on LLM archs

English

1.7K

86.1K

Nadav Timor@NadavTimor·24 Ara

@sgl_project released eagle3 checkpoints for sota models (incl. kimi-k2, gpt-oss, deepseek-v3.2) + the training recipe

LMSYS Org@lmsysorg

Speculative decoding has shown a lot of promise, though broader adoption has taken time due to the complexity of building production-ready tooling and high-quality draft models. We’re releasing SpecBundle, a collection of large-scale EAGLE-3 draft models trained with SpecForge v0.2. This release brings major system improvements, including refactored training pipelines, multi-backend support with SGLang and @huggingface , and better usability at scale. We also built a performance dashboard to make real end-to-end speedups visible across models and settings. See the dashboard and blog in the thread 👇

English

1.5K

Nadav Timor@NadavTimor·23 Ara

@svlevine and how bad is the forgetting?

English

Sergey Levine@svlevine·23 Ara

@NadavTimor Varies by task but typical numbers are around 8 hours (which is about two work days for one person to collect with breaks, resets, etc.). Using a pre-trained robot foundation model drastically lowers the per-task data requirements, which I guess is not surprising.

English

444

Sergey Levine@svlevine·22 Ara

A while back Benjie Holson described a set of "Robot Olympics" challenge tasks -- washing a pan, making a peanut butter sandwich, and more. We tried to fine-tune our models at PI to these tasks, and found that we could do most of them. A few highlights below.

English

303

81.1K

Nadav Timor@NadavTimor·17 Ara

Even w/o training, you can still use speculative decoding. No need to train a speculator per model. Our spec decoding algos for heterogeneous vocabs (open-sourced in HF Transformers; not yet in vLLM) let any off-the-shelf model serve as the speculator. ♻️ That means day-0 support for new models, and spec decoding for anyone who can’t train.

Red Hat AI@RedHat_AI

Speculative decoding is a powerful way to improve inference performance, but in practice it has been hard to adopt. Training a unique draft model per LLM is time-consuming, and production-ready training utilities that work cleanly with vLLM have been limited. Speculators v0.3.0 closes this gap with end-to-end training support for Eagle3 draft models that run seamlessly with vLLM. The release adds offline data generation using vLLM and training support for single and multi-layer draft models, across both MoE and non-MoE verifiers. Here's a 🧵 on speculative decoding and how to get started today in @vllm_project (1/8):

English

1.8K

Nadav Timor@NadavTimor·18 Kas

Tons of high-impact opportunities! And btw, our NYC open-space inference hub is still welcoming active vLLM/SGLang contributors

Greg Brockman@gdb

inference is perhaps the most valuable emerging software category. as models get smarter and more economically valuable, compute will increasingly be spent drawing samples from the models. if you'd like to work on inference at openai, reach out — gdb @openai.com. include a description of an exceptional team you've been a part of, and your contribution towards that team's goals. also indicate any experience in inference, large-scale system optimization, or other areas where you've built up domain expertise. lots of exciting problems to work on, ranging from deeply understanding the model forward pass (including simulating/finding creative opportunities for optimization); to system-level efficiencies such as speculative decoding or kv offloading or workload-aware load balancing; to managing and making observable a massive fleet at scale.

English

2.1K

Nadav Timor@NadavTimor·8 Kas

@PoratEitan @elonmusk @StefanoErmon @_inception_ai @itai_gat @helibenhamu @d_haziza @lipmanya 💡

QME

103

Eitan Porat@PoratEitan·8 Kas

@NadavTimor @elonmusk @StefanoErmon @_inception_ai @itai_gat @helibenhamu @d_haziza @lipmanya Oh I see so why couldn’t you just AR the first part and continue with a diffuser?

English

Stefano Ermon@StefanoErmon·6 Kas

When we began applying diffusion to language in my lab at Stanford, many doubted it could work. That research became Mercury diffusion LLM: 10X faster, more efficient, and now the foundation of @_inception_ai. Proud to raise $50M with support from top investors.

Inception@_inception_ai

Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in @TechCrunch. techcrunch.com/2025/11/06/inc…

English

1.3K

200.2K

Nadav Timor@NadavTimor·8 Kas

@PoratEitan @elonmusk @StefanoErmon @_inception_ai @itai_gat @helibenhamu @d_haziza @lipmanya @PoratEitan, yeah, block diffusion works. But mitigating the ttft tradeoff might require prioritizing the first block

English

146

Eitan Porat@PoratEitan·8 Kas

@NadavTimor @elonmusk @StefanoErmon @_inception_ai @itai_gat @helibenhamu @d_haziza @lipmanya I think in practice it’s better if it’s implemented on blocks. Like in bd3lms.

English

167

Nadav Timor@NadavTimor·8 Kas

@elonmusk @StefanoErmon @_inception_ai @elonmusk, 1/ diffusing only the next-k tokens could mitigate this ttft tradeoff. 2/ transformers already handle next-k prediction efficiently (eg fair’s “set block decoding” by @itai_gat @helibenhamu @d_haziza @lipmanya and others)

English

2.6K

Elon Musk@elonmusk·7 Kas

Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and generation, so good chance diffusion is the biggest winner overall. Also means that the ratio of compute to memory bandwidth will increase.

English

129

186

2.3K

581.3K

Nadav Timor@NadavTimor·22 Eki

@lianegalanti @Princeton @HazanPrinceton @tri_dao 🚀

QME

646

Liane Galanti@lianegalanti·22 Eki

Feels like a dream! I’ve recently started my Ph.D. in Computer Science @Princeton! Working on exciting research with Professors @HazanPrinceton and @tri_dao 🤩

English

108

2.3K

174.9K

Nadav Timor@NadavTimor·16 Eki

@MattHartman @LeRobotHF @huggingface come get your hands dirty with us 🤓

English

137

Matt Hartman@MattHartman·16 Eki

.@NadavTimor and I are going to train a SO-ARM101 with @LeRobotHF at the @huggingface office next week. If you’re in NYC and have an ARM101 and want to join us let me know! BYO arm 🦾

English

Nadav Timor@NadavTimor·16 Eki

let’s have some robot training fun 🤓

Matt Hartman@MattHartman

.@NadavTimor and I are going to train a SO-ARM101 with @LeRobotHF at the @huggingface office next week. If you’re in NYC and have an ARM101 and want to join us let me know! BYO arm 🦾

English

1.7K

Nadav Timor@NadavTimor·15 Eki

@WajahatAli_231 Just drop links to your PRs here and we’ll add you to the next sprint 🙂

English

201

Wajahat Ali Basharat 🇵🇰@WajahatAli_231·15 Eki

@NadavTimor Awesome how to join the group

English

228

Nadav Timor@NadavTimor·14 Eki

NYC open-source AI infra contributors — we’ve launched a community research hub above Grand Central where GPUs go brrr 🔥🗽 A place to hack, benchmark, and collaborate — vLLM, SGLang, kernels, inference optimizations all welcome. Open space. Open source. Weekends too. Huge thanks to @Company for supporting this initiative 🙌 𝐋𝐢𝐦𝐢𝐭𝐞𝐝 𝐬𝐞𝐚𝐭𝐬. 𝐃𝐫𝐨𝐩 𝐲𝐨𝐮𝐫 𝐏𝐑𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐜𝐨𝐦𝐦𝐞𝐧𝐭𝐬 𝐭𝐨 𝐣𝐨𝐢𝐧 𝐭𝐡𝐞 𝐧𝐞𝐱𝐭 𝐬𝐩𝐫𝐢𝐧𝐭!

English

9.3K

Nadav Timor@NadavTimor·15 Eki

@vamshi_ihsmav Just drop links to your PRs here and we’ll add you to the next sprint 🙂

English

263