
Mattia Verasani
6K posts






"I was definitely the first prompt engineer at Anthropic. Might have been the first in the world." Alex Albert just spent 35 minutes explaining how they train Claude's personality from the inside. 35 minutes. free. by the person who invented the role. most people think Claude's character is a system prompt. it's not. you'll never look at Claude the same way.







In Oct last year, Representation Autoencoders provided an elegant solution to unified tokenization for understanding and generation. Today we make them a bit more simple. a bit more general. Result: >10x faster convergence, better reconstruction, better generation. And yes we test them on T2I and world models :) Introducing RAEv2

Our Gemini Vision team @GoogleDeepMind is hiring in MTV/SF. Join us to push the frontiers of visual perception, reasoning and generation, and contribute to Gemini, Nano Banana and Omni. Also get to do cool research such as Vision Banana 🍌: deepmind.google/research/publi…. Job posting below. It's one of the best times to be working on Vision as the frontier is moving rapidly, come join us!

If you're a software engineer who wants to upskill in system design, read these 14 articles (links below):

Our Gemini Vision team @GoogleDeepMind is hiring in MTV/SF. Join us to push the frontiers of visual perception, reasoning and generation, and contribute to Gemini, Nano Banana and Omni. Also get to do cool research such as Vision Banana 🍌: deepmind.google/research/publi…. Job posting below. It's one of the best times to be working on Vision as the frontier is moving rapidly, come join us!


Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

It was an honor to give the keynote at MLSys Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto Recording should be out soon, in the meantime slides


🎉 Congrats to the VeRL-Omni team on the pre-release of a general RL post-training framework for multimodal generative models. Built on verl + vllm-omni. vLLM-Omni handles the multimodal rollout with step-wise continuous batching and embedding caching; vLLM serves the VLM-as-judge / OCR reward model, overlapped with rollout and training. In the Qwen-Image OCR demo, moving the reward to its own GPU cuts per-step wall-clock by ~14%. Released: Qwen-Image with FlowGRPO / MixGRPO / GRPO-Guard. BAGEL and Qwen3-Omni-Thinker PR-ready. Excited to push multimodal generative RL forward together with VeRL-Omni and the broader community. 🙌 📖 vllm.ai/blog/2026-05-1… 🔗 github.com/verl-project/v…


Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

This is good code. Those asserts make any comment superfluous and stop execution if something's wrong

Sparse attention mechanisms are finally moving beyond academic benchmarks into production systems, including DeepSeek Sparse Attention, and recently @NousResearch 's Lighthouse Attention. BLASST by NVIDIA, from paper Dynamic Blocked Attention Sparsity via Softmax Thresholding, attempts to sparsify attention in a different way, leveraging a similar rescale factor threshold idea from Flash Attention 4. We expect to see more interesting sparse attention techniques in the future. arxiv.org/abs/2512.12087 (2/4)


Please stop flushing the KV cache in Claude Code every x hrs of being idle. When i wake up and go back to a session that was running through the night, but stalled for whatever reason, Claude is noticeably far worse than resuming within the time frame of not flushing. Also i hate hearing I’m absolutely right when I’m not. :) has significantly reduced my trust in the model.
