Facu Fagalde - e/acc

138.6K posts

Facu Fagalde - e/acc banner
Facu Fagalde - e/acc

Facu Fagalde - e/acc

@facundo_fagalde

I'm a fan of Javier Milei, Agustín Laje, Lionel Messi, Tesla, The Ocean Cleanup and BROAD Group Ltd. I wish for AGI. Deserving is more important than getting.

Beigetreten Mart 2016
554 Folgt397 Follower
Facu Fagalde - e/acc retweetet
HustleBitch
HustleBitch@HustleBitch_·
🚨 WALMART’S AI PRICE SYSTEM JUST ACTIVATED — DIGITAL PRICES CAN CHANGE IN SECONDS WHILE YOU SHOP AND A CUSTOMER CAUGHT IT ON CAMERA America’s biggest retailers are quietly replacing paper tags with digital screens. Prices are no longer fixed. They can change in seconds. • Grab it at one price • Walk to checkout • It’s already higher This unlocks dynamic pricing inside stores. • Demand spikes → price jumps • Inventory drops → price adjusts • Algorithms decide what you pay in real time Not tomorrow. Not next week. Right now. When prices can change in seconds… are you even buying anything anymore, or just paying whatever the system decides you owe?
English
1.1K
2.2K
5.9K
1M
Facu Fagalde - e/acc retweetet
SUN YOUNG HWANG ᯅ 🇰🇷
Guys.. this model is just crazy. If you have just less than 48gb vram, just try the 8q gguf format. Feels just like opus! Tool calling is working smoothly!! Appreciate for this! (Hf and qwen!!) huggingface.co/Jackrong/Qwen3…
English
64
166
1.9K
120.1K
Facu Fagalde - e/acc retweetet
Inter Miami News Hub
Inter Miami News Hub@Intermiamicfhub·
🚨🚨🚨BREAKING: The grass installation at Miami’s historic Nu Stadium is now complete. 🤩🔥🐐 🎥 Via IG xasensimcf
English
12
51
980
19.7K
Facu Fagalde - e/acc retweetet
Fahd Mirza
Fahd Mirza@fahdmirza·
💥 @RedHat_AI Quietly Made Qwen Run 6X FASTER 🚀 ♠ And barely anyone talked about it 🔥 🔹 Speculative Decoding with EAGLE-3 — zero quality loss, just pure speed 🔹 Tiny draft model guesses tokens ahead, big model verifies in one shot 🔹 6.5x faster inference on a single GPU — no extra hardware needed 🔹 Drops straight into vLLM with one command — zero friction deployment 🔹 Full hands-on demo — download, serve, and test it live 🎯 This is where LLM inference optimization is heading — not bigger models, smarter execution 🔥 Watch the full video below 👇
English
3
9
108
9.9K
Facu Fagalde - e/acc retweetet
Raffi Hotter
Raffi Hotter@raffi_hotter·
This algorithm uses one of my favourite theorems in math, the Johnson-Lindentrauss Lemma, which says you can drastically reduce the dimensionality of n points to just log(n) dimensions and still preserve pairwise distances
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
13
35
625
36.9K
Facu Fagalde - e/acc retweetet
SONIA
SONIA@S0N_IA·
🚨 ¿Entiendes lo que pasó ayer… en un solo martes…? - Disney se retiró de su acuerdo de 1.000 millones de dólares con OpenAI… la empresa que pasó 100 años demandando por Mickey Mouse decidió que el video con IA no valía el riesgo… - OpenAI eliminó Sora como app independiente… 1.000 millones en I+D integrados de nuevo en ChatGPT… perdieron el producto y el socio el mismo día… - Trump dijo “hemos ganado esta guerra” con Irán… y 30 minutos después el Pentágono desplegó 3.000 soldados MÁS en Oriente Medio… - MBS llamó personalmente a Trump para presionarlo a continuar los ataques contra Irán… el hombre que gestiona un fondo petrolero de 930.000 millones quiere guerra porque la guerra significa precios del petróleo más altos… - Irán dijo que está “dispuesto a escuchar” propuestas de paz… la quinta contradicción en un solo día sobre el mismo conflicto… - Karpathy expuso un ataque a la cadena de suministro en un paquete de Python con 97 millones de descargas… un solo pip install podía robar todas las contraseñas y carteras cripto de tu máquina… el atacante solo fue descubierto porque programó de forma descuidada… - Anthropic lanzó el modo automático de Claude Code… la IA ahora aprueba sus propias escrituras de archivos y comandos… el mismo día alguien demostró que el software en el que confías puede robarlo todo… - Satya Nadella dijo que el mayor obstáculo para la IA es convencer a la gente de cambiar cómo trabaja… traducción: “construimos el reemplazo, ahora necesitamos que lo entrenes antes de dejarte ir”… - El CEO de Pinterest pidió a los gobiernos prohibir las redes sociales para menores de 16… el hombre que dirige una empresa de 3.600 millones basada en adolescentes guardando ideas de outfits… la operación interna de 1.500 millones de dólares de esta mañana sigue sin investigación… alguien sabía sobre la “paz” antes de que el presidente la anunciara… todo esto… en un solo martes…
Español
13
250
1.8K
382.1K
Facu Fagalde - e/acc retweetet
myles
myles@themylesfiles·
I'm an interactive learner, so I turned Google's TurboQuant paper into a @marimo_io notebook. Random rotations → Beta distributions → optimal 3-bit quantization → 6x memory savings on LLM KV caches. Way easier to grok when you can drag a slider and watch the math happen. molab.marimo.io/notebooks/nb_7…
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
10
41
287
43.9K
Facu Fagalde - e/acc retweetet
News from Google
News from Google@NewsFromGoogle·
We're expanding @GoogleQuantumAI research to include neutral atoms. By investing in two modalities — superconducting qubits and the promising platform of neutral atom quantum computing, which uses individual atoms as qubits — we can cross-pollinate research and accelerate our timeline of building a large scale, error corrected quantum computer
English
12
44
654
22.6K
Facu Fagalde - e/acc retweetet
Facu Fagalde - e/acc retweetet
Nicolás Schürmann
Esto es increíble, modelos de 96GB serán reducidos al menos a 16GB. Modelos más poderosos se podrán ejecutar en máquinas de consumidor sin pérdida . Se reduce el tamaño DESDE 6 veces!
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

Español
14
64
1.1K
54.9K
Facu Fagalde - e/acc retweetet
Facu Fagalde - e/acc retweetet
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
🚨 Google just introduced TurboQuant, a new way to massively compress AI models without losing accuracy. TLDR - TurboQuant compresses model memory up to 6x with zero accuracy loss - Can shrink KV cache down to ~3 bits without fine tuning - Up to 8x speed improvement in attention computation - Solves one of the biggest bottlenecks in LLMs which is memory - Uses PolarQuant for main compression and QJL for error correction - Zero memory overhead unlike traditional quantization methods - Works across LLM tasks like QA, coding, summarization - Strong gains in vector search performance and recall - Helps scale semantic search across billions of vectors
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
6
13
121
9.9K
Facu Fagalde - e/acc retweetet
Amin Karbasi
Amin Karbasi@aminkarbasi·
I left @GoogleResearch almost two years ago, so it makes me genuinely happy to see our work on polar quantization (my last project), which eventually led to extreme compression, being recognized there. It is a nice reminder that good fundamental work tends to find its place with time.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
23
43
1.1K
78.9K
Facu Fagalde - e/acc retweetet
🇨🇳XuZhenqing徐祯卿
✨🇨🇳Autonomous taxis are undergoing road testing on the streets of Guiyang, China and will soon be put into operation.😯
English
10
63
303
6.6K
Facu Fagalde - e/acc retweetet
Messias
Messias@Messias30_·
Dios, cómo olvidar cuando Lavezzi subió jugadas de Messi de la nada. Nadie entendía nada.
Messias tweet mediaMessias tweet media
Español
23
189
20.2K
449.8K
Facu Fagalde - e/acc retweetet
Anish Moonka
Anish Moonka@AnishA_Moonka·
Every time you message an AI chatbot, the model stores your entire conversation in temporary memory called a KV cache (a cheat sheet so it doesn’t re-read everything from scratch). On a large model like Llama 70B running a long conversation, that cache alone eats 40GB of GPU space, often more than the AI model itself. That’s half a $30,000 GPU chip consumed by one user’s memory. Google just published TurboQuant, a compression algorithm that shrinks this cache by 6x, down to just 3 bits per value, with zero accuracy loss across every benchmark tested. No retraining. No fine-tuning. Drop-in replacement. AI inference (running models for actual users, not training them) now makes up 55% of all AI compute spending. Hyperscalers are pouring nearly $700 billion into AI infrastructure in 2026. The KV cache is the single biggest memory bottleneck in that stack. When GPU cache memory fills up, the system can’t take more users. 6x compression means the same hardware handles roughly 6x more simultaneous conversations, or 6x longer context windows, or some mix of both. At cloud rates of $2-3/hour per H100 GPU, that’s the difference between profitable and unprofitable AI deployment. TurboQuant randomly rotates data to simplify its structure, applies a compressor, then adds a 1-bit error correction step to catch errors before they compound. On H100 GPUs it delivers up to 8x speedup over uncompressed computation. Google tested it across five long-context benchmarks on Llama, Gemma, and Mistral models. Perfect scores on needle-in-a-haystack (finding one specific fact buried in massive text). Being presented at ICLR 2026. It also outperforms existing methods for vector search, the technology that powers how search engines find similar results across billions of entries. Google runs billions of these searches daily. Three bits. Zero loss. 6x compression on the biggest memory bottleneck in a $700 billion infrastructure buildout.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
29
114
1.5K
223.6K
Facu Fagalde - e/acc retweetet
Josh Kale
Josh Kale@JoshKale·
This post got ZERO attention but is BY FAR the biggest AI news this week Google just published TurboQuant: a compression algorithm that makes AI inference 8x faster while using 6x less memory. No retraining. No accuracy loss. The biggest cost is inference which happens billions of times a day, scaling with every user and query. It’s the bill that never stops growing. Inference also eats memory alive. The reason why GPU memory is the scarcest, most expensive resource in AI. Previous compression methods had a little secret: shrinking the data required storing extra instructions about how it was shrunk. That overhead ate nearly half the savings. Google found a way to restructure the data so those instructions aren’t needed at all. The overhead just vanishes. 32 bits compressed to 3. The entire cost structure shifts. Context windows expand on existing hardware. API costs compress. Models that needed clusters start fitting on smaller machines. This seems like a pretty big deal for team google and the industry at large
Josh Kale tweet mediaJosh Kale tweet media
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
21
74
711
81K