
Tim Messerschmidt
25.7K posts

Tim Messerschmidt
@SeraAndroid
DevRel Ecosystems Lead EMEA at Google. Proud dad, happy husband, and feminist. O'Reilly author. I ♥️ home automation. Opinions stated here are my own.


An update: we’re 3xing the rate limits for Gemini models across all paid tiers in Antigravity and resetting everyone’s Gemini quota for the week. We understand some people hit their rate limits quickly and wanted to respond fast. Lots more to come and enjoy building!











Do not download random models from hugging face. 2026 is the most dangerous year


Hi @sudoingX How did you manage for it to run the test since loading two 27B seems not possible. I am just assuming the agent uses local LLM in DGX but to test it needs to load it up to test while keeping the main 27B running . or is just testing the individual part of the kernel (like you mentioned markel, etc)


my dgx spark is writing custom CUDA kernels to make itself faster. let that sink in. hermes agent running qwen 3.6 27B Q8 autonomously decided to port its own triton kernel to native CUDA C++ for llama.cpp integration. it understood the dispatch chain. studied the mmq kernel structure. now it's writing the port itself. this machine is literally optimizing its own inference pipeline. no human in the loop. i set a /goal last night and woke up to a 12.91x speedup on SSM and 9.66x on Q8 matmul. now it wants another 2-3x through FP8 tensor cores. local ai. autonomous agents. self-improving inference. this is not science fiction. this is my friday.



do you understand what's happening here? if this doesn't excite you about local ai nothing will. my dgx spark is writing custom CUDA kernels to optimize its own inference. the agent studied the triton-proven algorithm, understood the dispatch chain, and is now writing a native CUDA kernel as a fast path for Q8 matmul decode. this is a machine improving itself. autonomously. powered by hermes agent /goal running qwen 27B locally. no human wrote this. no api was called. just local silicon teaching itself to run faster.












