Tim Dettmers

3.8K posts

Tim Dettmers

@Tim_Dettmers

Creator of bitsandbytes. Professor @CarnegieMellon and Research Scientist @allen_ai . I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.

Pittsburgh, PA Katılım Ekim 2012

899 Takip Edilen44.9K Takipçiler

Sabitlenmiş Tweet

Tim Dettmers@Tim_Dettmers·30 Tem

After 7 months on the job market, I am happy to announce: - I joined @allen_ai - Professor at @CarnegieMellon from Fall 2025 - New bitsandbytes maintainer @Titus_vK My main focus will be to strengthen open-source for real-world problems and bring the best AI to laptops 🧵

English

155

2.4K

255K

Tim Dettmers@Tim_Dettmers·4d

If these sizes are true, that is pretty devastating for closed-sourced labs. Training very large-scale models is difficult due to unexplained, sudden, rogue loss spikes. But once you manage that, it is mostly spending more compute on your model.

Bojie Li@bojie_li

Closed labs hide model sizes. They can't hide what their models know, and what a model knows is an indicator on how big it is. Reasoning compresses. Factual knowledge doesn't. So you can size a frontier model from black-box API calls alone, and across releases you can literally watch a single fact arrive in the parameters over time. For three years, my friends Jiyan He and Zihan Zheng have been asking frontier LLMs the same question: "what do you know about USTC Hackergame?", a CTF contest. May 2024: GPT-4o invented fake titles. Feb 2025: Claude 3.7 Sonnet listed 19 verified 2023 challenges. By April 2026, frontier models recall specific challenges across consecutive years. After DeepSeek-V4 dropped, I instructed my agent to spend four days autonomously turning that habit into Incompressible Knowledge Probes (IKP) — 1,400 questions, 7 tiers of obscurity, 188 models, 27 vendors. Three findings: 1/ You can approximately size any black-box LLM from factual accuracy alone. Penalized accuracy is log-linear in log(params), R² = 0.917 on 89 open-weight models from 135M to 1.6T params. Project closed APIs onto the curve → GPT-5.5 ~9T, Claude Opus 4.7 ~4T, GPT-5.4 ~2.2T, Claude Sonnet 4.6 ~1.7T, Gemini 2.5 Pro ~1.2T (90% CI: 0.3-3x size). 2/ Citation count and h-index don't predict whether a frontier model recognizes a researcher. Two researchers with similar citation profiles get very different responses. Models memorize impact — work that shaped a field, not many incremental papers. 3/ Factual capacity doesn't compress over time. Across 96 open-weight models across 3 years, the IKP time coefficient is statistically zero, rejecting the Densing-Law prediction of +0.0117/month at p<10⁻¹⁵. Reasoning benchmarks saturate; factual capacity keeps scaling with parameters. Website: 01.me/research/ikp/ Paper: arxiv.org/pdf/2604.24827

English

296

75.9K

Tim Dettmers retweetledi

Sean Welleck@wellecks·4d

Amazing work by Weihua Du (@StigLidu) and the team @JingmingZhuo @yi_xin_dong @Andre3035858461 @sunweiwei12 @regunivers Manupa Karunaratne, Ivan Fox @Tim_Dettmers @tqchenml Yiming Yang! github.com/StigLidu/AdaEx… arxiv.org/abs/2604.16625 stiglidu.github.io/AdaExplore/

Indonesia

3.2K

Tim Dettmers retweetledi

Reiner Pope@reinerpope·20 Nis

Intelligence per picojoule, with @itsclivetime and @dylan522p (0:00) Intro (1:22) What is codesign? (2:49) Codesign example: Swish vs ReLU (4:22) Are DeepSeek papers codesign? (6:45) Predicting where ML research will go (8:06) Should researchers hate your chips? (9:34) Can you codesign too much? (13:23) Picking the right grain size for specialization (16:22) How much hardware flexibility for The Age of Research? (20:05) Did reasoning and RL disrupt hardware roadmaps? (23:09) Cerebras/Groq: unexpected wins on reasoning and RL (25:34) Disaggregating MLP and attention (29:06) The right metrics for quantization and codesign papers

English

601

140.3K

Tim Dettmers@Tim_Dettmers·17 Nis

So cool to see that open-source, with open experimentation (and with the help of someone posting blog posts about their personal research), can yield a very robust method for MoE balancing. This method seems more elegant than all other methods I have seen. Open source is Awesome!

Percy Liang@percyliang

Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.

English

18.9K

Tim Dettmers@Tim_Dettmers·16 Nis

Something is about to drop 🔥

English

6.4K

Tim Dettmers retweetledi

Qwen@Alibaba_Qwen·16 Nis

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog：qwen.ai/blog?id=qwen3.… Qwen Studio：chat.qwen.ai HuggingFace：huggingface.co/Qwen/Qwen3.6-3… ModelScope：modelscope.cn/models/Qwen/Qw… API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

English

449

1.7K

11.6K

2.7M

Tim Dettmers retweetledi

Zhihao Jia@JiaZhihao·13 Nis

The #MLSys2026 program is out, and it is awesome! 📄 107 research papers + 28 industry papers spanning the full AI systems stack 🏆 Three exciting contests: AWS Trainium programming, Google graph scheduling, and NVIDIA AI kernel generation 🎤 Keynotes from an outstanding lineup: Amin Vahdat (Google) on infra; @LukeZettlemoyer (UW & Meta) on models; @kozyraki (Stanford & NVIDIA) on architecture; Lidong Zhou (Microsoft) on systems; and @marksaroufim (GPUMode) on GPUs and kernels. Join us in Bellevue, WA in a month! Early registration ends April 19 — don’t miss it: mlsys.org.

English

104

31.4K

Tim Dettmers retweetledi

Liang Chen@liangchen5518·12 Nis

GLM 5.1 from @Zai_org ranks as the top open model on the newly released Monthly-SWEBench by @UniPat_AI—second only to Claude-Opus-4.6. Congrats to the team! 🚀Explore the benchmark: unipat.ai/benchmarks/Mon…

English

135

15.3K

Tim Dettmers retweetledi

Graham Neubig@gneubig·9 Nis

Everyone's talking about Anthropic's new model discovering new security vulnerabilities. What people aren't talking about is the millions of KNOWN vulnerabilities remaining unfixed due to lack time, interest, etc. e.g. OpenClaw has 67 CVEs right now, including 4 critical ones.

English

145

11.6K

Tim Dettmers@Tim_Dettmers·9 Nis

@sahandsharif Sorry about that! I realized that later, too.

English

181

Sahand Sharifzadeh@sahandsharif·8 Nis

@Tim_Dettmers Tbh I always found it hurtful that ConvE had no mention of RESCAL too Tim.

English

225

Tim Dettmers@Tim_Dettmers·7 Nis

I was going crazy because I could not replicate TurboQuant. Turns out the community also had issues. The community quickly made adjustments to "make it work", but what they did not realize is that they reimplemented (most of) HIGGS in the process (full HIGGS would be even better)

English

847

97.5K

Tim Dettmers retweetledi

Graham Neubig@gneubig·8 Nis

In this new work, we create 10,000 training/eval environments from 200 unique software packages, spanning tasks from every major sector of the economy. Fully open source, and ready to scale to even more 😃

Pranjal Aggarwal ✈️ ICLR'26@PranjalAggarw16

What if computer-use agents could do real work? We built Gym-Anything: a framework that turns any software into a computer-use agent environment. We used it to create CUA-World: 200+ real software, 10,000+ tasks and environments, across all major occupation groups, from medical imaging to financial trading. 🧵

English

192

33.8K

Tim Dettmers@Tim_Dettmers·8 Nis

@YouJiacheng There is more discussion below that comment. It is all a bit complicated and a bit evidence back and forth. But there seems to be pretty big evidence that QJL does not help. It certainly does not help in my own benchmarks

English

145

You Jiacheng@YouJiacheng·8 Nis

@Tim_Dettmers #discussioncomment-16449050" target="_blank" rel="nofollow noopener">github.com/ggml-org/llama… this comment says QJL is worth keeping?

English

202

Tim Dettmers retweetledi

Nicholas Boffi@nmboffi·7 Nis

🤯 big update to our flow map language models paper! we believe this is the future of non-autoregressive text generation. read about it in the blog: one-step-lm.github.io/blog/ full details in the paper: arxiv.org/abs/2602.16813 we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation. we beat all discrete diffusion baselines at ~8x speed! v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps. beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale. we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language. we’re extremely excited about the future of this class of models. stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀

English

472

72.5K

Tim Dettmers retweetledi

Sam Bowman@sleepinyourhat·7 Nis

Mythos Preview seems to be the best-aligned model out there on basically every measure we have. But it also likely poses more misalignment risk than any model we’ve used: Its new capabilities significantly increase the risk from any bad behavior. 🧵

English

190

1.4K

978.4K

Tim Dettmers retweetledi

Z.ai@Zai_org·7 Nis

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English

550

1.3K

10.9K

4.3M

Tim Dettmers@Tim_Dettmers·7 Nis

@turbo_xo_ @yacineMTB For weights, too. HIGGS is flexible. Difficult to improve for data-free quantization methods.

English

2.6K

Greer@turbo_xo_·7 Nis

@Tim_Dettmers @yacineMTB Is this for weights too or just KV cache

English

3.1K

Tim Dettmers@Tim_Dettmers·7 Nis

@waltuuuhr Ooff. Thank you so much! Totally missed this

English

2.9K

Forward Unemployed Engineer@waltuuuhr·7 Nis

@Tim_Dettmers FYI you made a typo tagging Carnegie Mellon in your bio and it links to a scam crypto token

English

3.7K

Tim Dettmers@Tim_Dettmers·7 Nis

We in the quantization community could quickly see this and were flabbergastered by the response to TurboQuant. Whenever I saw TurboQuant on my timeline, I found it hurtful, because the work of other academics who worked so hard was discounted.

English

234

19.2K

Tim Dettmers@Tim_Dettmers·7 Nis

QJL hurts performance: github.com/ggml-org/llama… github.com/ggml-org/llama… github.com/TheTom/turboqu…

English

11.8K

Keşfet

@StigLidu @JingmingZhuo @yi_xin_dong @Andre3035858461 @sunweiwei12 @regunivers @tqchenml @itsclivetime