Tim Messerschmidt

25.7K posts

Tim Messerschmidt banner
Tim Messerschmidt

Tim Messerschmidt

@SeraAndroid

DevRel Ecosystems Lead EMEA at Google. Proud dad, happy husband, and feminist. O'Reilly author. I ♥️ home automation. Opinions stated here are my own.

Berlin, Germany • he/him Katılım Ocak 2010
1.4K Takip Edilen6.3K Takipçiler
Tim Messerschmidt retweetledi
Shengzhe
Shengzhe@shengzheyao·
Antigravity CLI 1.0.1 is out. Key updates: - Fixed OAuth not persisting in some environments. - Enhanced the visual experience on Windows. - Added the new "proceed in sandbox" permission control. Restart agy to auto update or run “agy update". See the full changelog for details: github.com/google-antigra…
English
59
43
473
44.4K
Tim Messerschmidt retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
We just 3xed the rate limits across all tiers in Antigravity so that you can put 3.5 Flash through its paces even more, enjoy, and keep the feedback coming! :)
Varun Mohan@_mohansolo

An update: we’re 3xing the rate limits for Gemini models across all paid tiers in Antigravity and resetting everyone’s Gemini quota for the week. We understand some people hit their rate limits quickly and wanted to respond fast. Lots more to come and enjoy building!

English
252
152
2.4K
287.1K
Tim Messerschmidt retweetledi
Google
Google@Google·
Gemini 3.5 Flash is built to help you execute complex, agentic workflows. 3.5 Flash rivals flagship models to deliver frontier performance for agents and coding, at the lightning speeds you expect from the Flash series.
Google tweet media
English
78
182
2.3K
962.6K
Tim Messerschmidt retweetledi
Google
Google@Google·
Meet Gemini 3.5 Flash — our strongest agentic and coding model yet. It delivers frontier-level performance at 4x the speed of comparable frontier models — often at less than half the cost. Generally available, starting today. 🧵 #GoogleIO
Google tweet media
English
393
945
9.5K
860.7K
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
Damn. 1491 TPS. Love the keynote already #GoogleIO Bonus points for the awesome 80s movie training montage with the TPUs 💪
Tim Messerschmidt tweet media
English
0
0
1
108
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
For founders and teams running locally — for cost, sovereignty, or latency — quantization turns "barely usable" into "actually useful." The right quant on the right hardware changes the math entirely.
English
0
0
0
47
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
The catch: not all quants are equal. Compress too aggressively and coherence degrades. Kullback-Leibler Divergence (KLD) measures how far the quantized model drifts from the original. A good quant gets the speed-up without meaningful drift. That's the craft.
English
1
0
0
144
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
The biggest bottleneck for local LLM inference isn't compute — it's memory bandwidth. During decode, the GPU streams the entire model from memory for every single token. At batch size 1, it spends most of its time waiting for data, not doing math.
English
1
0
0
113
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
@antirez @antirez I did some work on this myself. This is a Dual Node DGX Spark cluster and I am sure I am leaving some performance on the table. I may need another day of doing some optimization here before submitting a PR. I work on this fork/branch: github.com/SeraphimSerapi…
Tim Messerschmidt tweet media
English
0
0
0
67
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
@antirez I'd be happy to help test drive if you need somebody with a Spark Cluster (2 nodes) to test drive. Tensor parallelism could help speed things up!
English
1
0
1
442
antirez
antirez@antirez·
DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this system, at 270GB/sec. But prefill is ways more alighed to M3 Max at ~200 t/s. I'll release when more mature, but it is almost sure that it will get merged.
English
49
73
786
82.2K
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
@spark_arena Much love ❤️ Glad to be a part of the extended DGX ecosystem and happy to contribute where I can
English
0
0
0
41
sparkarena
sparkarena@spark_arena·
Tool Eval Bench by @SeraAndroid is one of the best tools in the NVIDIA Developer Forum. We're working with Tim to get it integrated with SparkRun and Spark Arena as a first class citizen. If you have the Spark(s) You should definitely take a look at it. github.com/SeraphimSerapi…
sparkarena tweet media
English
2
3
19
1K
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
@mervenoyann I could literally host a script on GitHub and achieve the same effect. Installing skills/plugins without checking the code they execute is the problem. Sure, hosting platforms can do some work on their end but I'm not sure why @huggingface gets the blame here.
English
0
0
1
95
Tim Messerschmidt retweetledi
Google AI Studio
Google AI Studio@GoogleAIStudio·
gemini 3.1 flash-lite is here it's our most cost-efficient model, optimized for high-volume agentic tasks, translation, and simple data processing
English
217
430
4.3K
588.4K
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
🤯 I love how well Gemma 4's draft models work. Here is Intel's AutoRound quant of the 31B model at 256K context length in action on my Dual Node DGX Spark system with 6 speculative tokens. These draft models are pretty special given they power KV cache sharing.
Tim Messerschmidt tweet media
English
0
0
0
142
Haku
Haku@DKP85D·
Haku@DKP85D

Hi @sudoingX How did you manage for it to run the test since loading two 27B seems not possible. I am just assuming the agent uses local LLM in DGX but to test it needs to load it up to test while keeping the main 27B running . or is just testing the individual part of the kernel (like you mentioned markel, etc)

English
1
0
3
60
Sudo su
Sudo su@sudoingX·
do you understand what's happening here? if this doesn't excite you about local ai nothing will. my dgx spark is writing custom CUDA kernels to optimize its own inference. the agent studied the triton-proven algorithm, understood the dispatch chain, and is now writing a native CUDA kernel as a fast path for Q8 matmul decode. this is a machine improving itself. autonomously. powered by hermes agent /goal running qwen 27B locally. no human wrote this. no api was called. just local silicon teaching itself to run faster.
Sudo su tweet media
Sudo su@sudoingX

my dgx spark is writing custom CUDA kernels to make itself faster. let that sink in. hermes agent running qwen 3.6 27B Q8 autonomously decided to port its own triton kernel to native CUDA C++ for llama.cpp integration. it understood the dispatch chain. studied the mmq kernel structure. now it's writing the port itself. this machine is literally optimizing its own inference pipeline. no human in the loop. i set a /goal last night and woke up to a 12.91x speedup on SSM and 9.66x on Q8 matmul. now it wants another 2-3x through FP8 tensor cores. local ai. autonomous agents. self-improving inference. this is not science fiction. this is my friday.

English
15
15
182
14.6K
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
@sudoingX That's pretty neat. How did you invoke that? I run a Dual Node DGX Spark cluster and would love to play with it. Also: FP8 model or a different quant for 27B? I found that quantized models often get stuck in thinking loops and struggle to get out of these.
English
0
0
0
103
Felipe Sztutman
Felipe Sztutman@sztlink·
I’m building an evaluation lens I call Action-Trace Fidelity. If model, task, prompt, seed, and decoding stay fixed — but the inference apparatus changes — does the operational trace survive? Not just “is the answer right?” Which tools, in what order, with what args?
Felipe Sztutman tweet media
English
1
0
1
114
Tim Messerschmidt
Tim Messerschmidt@SeraAndroid·
@vr8vr8 How does Intel's AutoRound compare? It would be interesting to see the difference between 8, 5.5 and 4 bit. Also in terms of t/s.
English
0
0
0
15
vr8vr8
vr8vr8@vr8vr8·
PrismaQuant wins on quality on every prompt I threw at it, and it's faster per token. Pretty wild that a 5.5-bit community quant outscores the official FP8 release — the usual assumption is that vendor quants are the ceiling.
vr8vr8 tweet media
English
1
0
1
156
vr8vr8
vr8vr8@vr8vr8·
🤯PrismaQuant-5.5b beats official FP8 on Qwen3.6-27B — community quant outperforms the vendor build Been running a head-to-head between the official Qwen/Qwen3.6-27B [FP8 MTP k=3] and the community-built Qwen/Qwen3.6-27B [PrismaQuant-5.5b MTP k=3] using my usual visual-coding battery (solar system, analog clock, etc.). Blind A/B, 6 rounds.
vr8vr8 tweet media
English
8
2
9
1.1K