Mattia Verasani

6K posts

Mattia Verasani

Mattia Verasani

@MatRazor

Katılım Aralık 2017
320 Takip Edilen75 Takipçiler
Mattia Verasani retweetledi
PyTorch
PyTorch@PyTorch·
Model Optimization and Post-Training Quantization Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices. By lowering computational and memory requirements while preserving model quality, quantization helps AI models run more efficiently in resource-constrained environments. This post walks through how to use NVIDIA Model Optimizer to quantize a CLIP model in FP8 format with the post-training quantization (PTQ) method, including an example workflow exporting a PyTorch checkpoint. Read the complete blog post: developer.nvidia.com/blog/model-qua…
English
2
18
111
9.4K
Mattia Verasani retweetledi
Rosinality
Rosinality@rosinality·
arxiv.org/abs/2605.24326 Meta's experience on multi-datacenter training. They have used a PP schedule called Doraemon PP which allows integration with ZeRO-2/3.
Rosinality tweet mediaRosinality tweet media
English
0
6
54
3.5K
Mattia Verasani retweetledi
Raytar
Raytar@Raytar·
he tested 5760 architectures at Google for a full year. the winner was the original Transformer from 2017. Hyung Won Chung told that story at MIT with a small smile. then went to OpenAI and trained o1. 1 hour. free. by one of the few people on earth who actually moves the frontier. meanwhile your feed is full of guys writing architecture threads who have never trained a model anyone uses. he just told MIT that 99% of AI research is theater. your AI worldview was built by men who read his papers. badly. now you can read him directly. you will rewatch this. save it now.
Raytar@Raytar

"I was definitely the first prompt engineer at Anthropic. Might have been the first in the world." Alex Albert just spent 35 minutes explaining how they train Claude's personality from the inside. 35 minutes. free. by the person who invented the role. most people think Claude's character is a system prompt. it's not. you'll never look at Claude the same way.

English
12
91
1K
102.4K
Mattia Verasani retweetledi
Rosinality
Rosinality@rosinality·
arxiv.org/abs/2605.23857 Could it be useful to distill from a smaller model? I think, beyond distillation, we could get some signal from the loss difference across the scales.
Rosinality tweet mediaRosinality tweet media
English
4
27
132
11.1K
Mattia Verasani retweetledi
zhyncs
zhyncs@zhyncs42·
Correctness is critical for LLM inference engines. Recently, I found TRT-LLM’s work on Hypothesis Testing Methodology to be extremely professional. #hypothesis-testing-methodology" target="_blank" rel="nofollow noopener">github.com/NVIDIA/TensorR…
zhyncs tweet media
English
4
22
233
14.1K
Mattia Verasani retweetledi
Greg Brockman
self improvement prompt for codex
Vaibhav (VB) Srivastav@reach_vb

UPDATE: Came up with an even better version of this prompt after the feedback Ask Codex to look across your sessions, Memories, and Chronicle, identify patterns, reuse what already exists, and only create the smallest useful skill, subagent, or automation. "Look back over my recent work from the last 30 days, or all available history if shorter, and identify repeated manual workflows worth packaging. Use available evidence in this order: - Recent Codex sessions and task summaries. - Codex Memories and rollout summaries to find patterns repeated across sessions. - Chronicle, if enabled, to spot repeated work outside Codex. Use Chronicle for discovery only; confirm important details in the relevant source system when possible. - Existing skills, custom agents, and automations, so you reuse or extend what already exists instead of duplicating it. Look broadly for work that is repeated, time-consuming, error-prone, context-heavy, or benefits from a consistent process. Include workflows across coding, research, writing, planning, communication, operations, analysis, and personal administration. Only act on a candidate when it: - occurred at least twice, or is clearly likely to recur and costly to repeat; - has stable inputs, a repeatable procedure, and a clear output or stopping condition; - would materially improve speed, quality, consistency, or reliability; - is not already adequately covered. Choose the smallest appropriate form: - Skill: a reusable workflow or playbook. - Custom subagent: a bounded specialist role or investigation task suitable for delegation. - Automation: a scheduled or recurring check, report, reminder, or monitor. - Skip: work that is too one-off, ambiguous, sensitive, or poorly evidenced to package. First produce a compact shortlist with: - repeated workflow - supporting evidence and dates - frequency/confidence - recommended form: skill, subagent, automation, extend existing, or skip - why it is or is not worth creating Then create only the high-confidence missing items. Keep them narrow, practical, source-aware, and easy to validate. Do not create speculative, overlapping, or overly broad assets. Finish with: - what you created or extended - what you deliberately skipped - what needs more evidence before packaging"

English
113
350
3.9K
476K
Mattia Verasani retweetledi
Saining Xie
Saining Xie@sainingxie·
check out RAEv2 led by Jas. through extensive exps, we found some really intriguing behaviors showing why strong representation encoders are key for pixel decoders. spoiler: it’s not about hillclimbing fid; new metrics like ep@fid-k/fdr^k show there’s a lot more left to explore!
Jaskirat Singh@1jaskiratsingh

In Oct last year, Representation Autoencoders provided an elegant solution to unified tokenization for understanding and generation. Today we make them a bit more simple. a bit more general. Result: >10x faster convergence, better reconstruction, better generation. And yes we test them on T2I and world models :) Introducing RAEv2

English
4
32
336
52K
Mattia Verasani retweetledi
Gabriele Berton
Gabriele Berton@gabriberton·
Apply here to join the frontier of computer vision!
Nithish Kannen@NithishKannen

Our Gemini Vision team @GoogleDeepMind is hiring in MTV/SF. Join us to push the frontiers of visual perception, reasoning and generation, and contribute to Gemini, Nano Banana and Omni. Also get to do cool research such as Vision Banana 🍌: deepmind.google/research/publi…. Job posting below. It's one of the best times to be working on Vision as the frontier is moving rapidly, come join us!

English
2
11
192
35K
Mattia Verasani retweetledi
Song Han
Song Han@songhan_mit·
Explore our kernel design agents:
English
1
3
37
4.7K
Mattia Verasani retweetledi
Swaroop Mishra
Swaroop Mishra@Swarooprm7·
Apply to join the Gemini vision team! Highly Recommend!
Nithish Kannen@NithishKannen

Our Gemini Vision team @GoogleDeepMind is hiring in MTV/SF. Join us to push the frontiers of visual perception, reasoning and generation, and contribute to Gemini, Nano Banana and Omni. Also get to do cool research such as Vision Banana 🍌: deepmind.google/research/publi…. Job posting below. It's one of the best times to be working on Vision as the frontier is moving rapidly, come join us!

English
2
6
147
27.4K
Mattia Verasani retweetledi
Sebastian Raschka
It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."
Sebastian Raschka tweet media
Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

English
32
79
664
64.7K
Mattia Verasani retweetledi
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
Interesting slides. Using LLMs to generate code easily results in hacking the reward, see slides 41-49. A very similar phenomenon is at play in kaggle neurogolf competition where the host has to fix the evaluator every week to catch new reward hacking tricks. There is much more in the presentation, have a look.
Mark Saroufim@marksaroufim

It was an honor to give the keynote at MLSys Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto Recording should be out soon, in the meantime slides

English
3
3
43
5.4K
Mattia Verasani retweetledi
Mattia Verasani retweetledi
vLLM
vLLM@vllm_project·
KV cache shouldn't disappear every time vLLM restarts. With @novita_labs, we're sharing PegaFlow — a production-grade external KV cache service that plugs into vLLM through the external KV connector interface. PegaFlow runs as a standalone Rust daemon owning the host KV pool, SSD cache, and RDMA resources. vLLM workers attach via CUDA IPC + gRPC, and cache survives engine crashes, upgrades, and model switches. In production-oriented evaluations: 🚀 2.15× faster vLLM startup with a pre-warmed 500 GiB host pool 📈 56% higher throughput for 8 Qwen3-8B instances sharing one cache ⚡ 72% higher throughput for DeepSeek-V3.2 MLA TP8 (logical KV stored once, not per rank) 🌐 194 GB/s average remote-read throughput across nodes Three-level hierarchy: pinned DRAM, remote DRAM over RDMA, local SSD on io_uring. Integrates through the existing `kv_transfer_config` path — no vLLM source changes. 📖 vllm.ai/blog/2026-05-1…
English
6
35
288
28.8K
Mattia Verasani retweetledi
Noam Brown
Noam Brown@polynoamial·
Andrej @karpathy is back in the game! I would have loved for him to rejoin @OpenAI, but I'm happy he's at any frontier lab pushing the field forward. It’s easy to frame this as zero-sum among the labs, but in truth we’re collectively advancing the most important tech of our era.
Andrej Karpathy@karpathy

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English
64
128
3.7K
211.6K
Mattia Verasani retweetledi
finbarr
finbarr@finbarrtimbers·
This is an elegant paper; hope to try it out soon.
SemiAnalysis@SemiAnalysis_

Sparse attention mechanisms are finally moving beyond academic benchmarks into production systems, including DeepSeek Sparse Attention, and recently @NousResearch 's Lighthouse Attention. BLASST by NVIDIA, from paper Dynamic Blocked Attention Sparsity via Softmax Thresholding, attempts to sparsify attention in a different way, leveraging a similar rescale factor threshold idea from Flash Attention 4. We expect to see more interesting sparse attention techniques in the future. arxiv.org/abs/2512.12087 (2/4)

English
0
3
65
16.6K
Mattia Verasani retweetledi