Facu Fagalde - e/acc (@facundo_fagalde) - Twitter-Profil

Precios digitales en las góndolas

🚨 WALMART’S AI PRICE SYSTEM JUST ACTIVATED — DIGITAL PRICES CAN CHANGE IN SECONDS WHILE YOU SHOP AND A CUSTOMER CAUGHT IT ON CAMERA America’s biggest retailers are quietly replacing paper tags with digital screens. Prices are no longer fixed. They can change in seconds. • Grab it at one price • Walk to checkout • It’s already higher This unlocks dynamic pricing inside stores. • Demand spikes → price jumps • Inventory drops → price adjusts • Algorithms decide what you pay in real time Not tomorrow. Not next week. Right now. When prices can change in seconds… are you even buying anything anymore, or just paying whatever the system decides you owe?

Español

0

4

Facu Fagalde - e/acc retweetet

HustleBitch@HustleBitch_·15h

🚨 WALMART’S AI PRICE SYSTEM JUST ACTIVATED — DIGITAL PRICES CAN CHANGE IN SECONDS WHILE YOU SHOP AND A CUSTOMER CAUGHT IT ON CAMERA America’s biggest retailers are quietly replacing paper tags with digital screens. Prices are no longer fixed. They can change in seconds. • Grab it at one price • Walk to checkout • It’s already higher This unlocks dynamic pricing inside stores. • Demand spikes → price jumps • Inventory drops → price adjusts • Algorithms decide what you pay in real time Not tomorrow. Not next week. Right now. When prices can change in seconds… are you even buying anything anymore, or just paying whatever the system decides you owe?

English

1.1K

2.2K

5.9K

1M

Facu Fagalde - e/acc retweetet

SUN YOUNG HWANG ᯅ 🇰🇷@SOSOHAJALAB·16h

Guys.. this model is just crazy. If you have just less than 48gb vram, just try the 8q gguf format. Feels just like opus! Tool calling is working smoothly!! Appreciate for this! (Hf and qwen!!) huggingface.co/Jackrong/Qwen3…

English

64

166

1.9K

120.1K

Facu Fagalde - e/acc retweetet

Inter Miami News Hub@Intermiamicfhub·13h

🚨🚨🚨BREAKING: The grass installation at Miami’s historic Nu Stadium is now complete. 🤩🔥🐐 🎥 Via IG xasensimcf

English

12

51

980

19.7K

Facu Fagalde - e/acc retweetet

Fahd Mirza@fahdmirza·17h

💥 @RedHat_AI Quietly Made Qwen Run 6X FASTER 🚀 ♠ And barely anyone talked about it 🔥 🔹 Speculative Decoding with EAGLE-3 — zero quality loss, just pure speed 🔹 Tiny draft model guesses tokens ahead, big model verifies in one shot 🔹 6.5x faster inference on a single GPU — no extra hardware needed 🔹 Drops straight into vLLM with one command — zero friction deployment 🔹 Full hands-on demo — download, serve, and test it live 🎯 This is where LLM inference optimization is heading — not bigger models, smarter execution 🔥 Watch the full video below 👇

English

3

9

108

9.9K

Facu Fagalde - e/acc retweetet

Raffi Hotter@raffi_hotter·7h

This algorithm uses one of my favourite theorems in math, the Johnson-Lindentrauss Lemma, which says you can drastically reduce the dimensionality of n points to just log(n) dimensions and still preserve pairwise distances

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

13

35

625

36.9K

Facu Fagalde - e/acc retweetet

SONIA@S0N_IA·6h

🚨 ¿Entiendes lo que pasó ayer… en un solo martes…? - Disney se retiró de su acuerdo de 1.000 millones de dólares con OpenAI… la empresa que pasó 100 años demandando por Mickey Mouse decidió que el video con IA no valía el riesgo… - OpenAI eliminó Sora como app independiente… 1.000 millones en I+D integrados de nuevo en ChatGPT… perdieron el producto y el socio el mismo día… - Trump dijo “hemos ganado esta guerra” con Irán… y 30 minutos después el Pentágono desplegó 3.000 soldados MÁS en Oriente Medio… - MBS llamó personalmente a Trump para presionarlo a continuar los ataques contra Irán… el hombre que gestiona un fondo petrolero de 930.000 millones quiere guerra porque la guerra significa precios del petróleo más altos… - Irán dijo que está “dispuesto a escuchar” propuestas de paz… la quinta contradicción en un solo día sobre el mismo conflicto… - Karpathy expuso un ataque a la cadena de suministro en un paquete de Python con 97 millones de descargas… un solo pip install podía robar todas las contraseñas y carteras cripto de tu máquina… el atacante solo fue descubierto porque programó de forma descuidada… - Anthropic lanzó el modo automático de Claude Code… la IA ahora aprueba sus propias escrituras de archivos y comandos… el mismo día alguien demostró que el software en el que confías puede robarlo todo… - Satya Nadella dijo que el mayor obstáculo para la IA es convencer a la gente de cambiar cómo trabaja… traducción: “construimos el reemplazo, ahora necesitamos que lo entrenes antes de dejarte ir”… - El CEO de Pinterest pidió a los gobiernos prohibir las redes sociales para menores de 16… el hombre que dirige una empresa de 3.600 millones basada en adolescentes guardando ideas de outfits… la operación interna de 1.500 millones de dólares de esta mañana sigue sin investigación… alguien sabía sobre la “paz” antes de que el presidente la anunciara… todo esto… en un solo martes…

Español

13

250

1.8K

382.1K

Facu Fagalde - e/acc retweetet

myles@themylesfiles·11h

I'm an interactive learner, so I turned Google's TurboQuant paper into a @marimo_io notebook. Random rotations → Beta distributions → optimal 3-bit quantization → 6x memory savings on LLM KV caches. Way easier to grok when you can drag a slider and watch the math happen. molab.marimo.io/notebooks/nb_7…

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

10

41

287

43.9K

Facu Fagalde - e/acc retweetet

News from Google@NewsFromGoogle·20h

We're expanding @GoogleQuantumAI research to include neutral atoms. By investing in two modalities — superconducting qubits and the promising platform of neutral atom quantum computing, which uses individual atoms as qubits — we can cross-pollinate research and accelerate our timeline of building a large scale, error corrected quantum computer

English

12

44

654

22.6K

Facu Fagalde - e/acc retweetet

Prince Canuma@Prince_Canuma·12h

Me at 3 AM: let me just post these results real quick and go to sleep. X community: “hold on, we need to talk about this…” 🤣

Prince Canuma@Prince_Canuma

Just implemented Google’s TurboQuant in MLX and the results are wild! Needle-in-a-haystack using Qwen3.5-35B-A3B across 8.5K, 32.7K, and 64.2K context lengths: → 6/6 exact match at every quant level → TurboQuant 2.5-bit: 4.9x smaller KV cache → TurboQuant 3.5-bit: 3.8x smaller KV cache The best part: Zero accuracy loss compared to full KV cache.

English

5

6

157

14.5K

Facu Fagalde - e/acc retweetet

Nicolás Schürmann@_nasch_·5h

Esto es increíble, modelos de 96GB serán reducidos al menos a 16GB. Modelos más poderosos se podrán ejecutar en máquinas de consumidor sin pérdida . Se reduce el tamaño DESDE 6 veces!

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

Español

14

64

1.1K

54.9K

Facu Fagalde - e/acc@facundo_fagalde·4h

👀 So it's fucking real, yeah! 🚀

Brian Roemmele@BrianRoemmele

We are testing TurboQuant at the Zero-Human Company and are fascinated by the speed up! We are at a consistent 5x increase! More testing…

English

0

7

Facu Fagalde - e/acc retweetet

Brian Roemmele@BrianRoemmele·11h

We are testing TurboQuant at the Zero-Human Company and are fascinated by the speed up! We are at a consistent 5x increase! More testing…

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

10

24

239

24.4K

Facu Fagalde - e/acc retweetet

AshutoshShrivastava@ai_for_success·9h

🚨 Google just introduced TurboQuant, a new way to massively compress AI models without losing accuracy. TLDR - TurboQuant compresses model memory up to 6x with zero accuracy loss - Can shrink KV cache down to ~3 bits without fine tuning - Up to 8x speed improvement in attention computation - Solves one of the biggest bottlenecks in LLMs which is memory - Uses PolarQuant for main compression and QJL for error correction - Zero memory overhead unlike traditional quantization methods - Works across LLM tasks like QA, coding, summarization - Strong gains in vector search performance and recall - Helps scale semantic search across billions of vectors

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

6

13

121

9.9K

Facu Fagalde - e/acc retweetet

Amin Karbasi@aminkarbasi·15h

I left @GoogleResearch almost two years ago, so it makes me genuinely happy to see our work on polar quantization (my last project), which eventually led to extreme compression, being recognized there. It is a nice reminder that good fundamental work tends to find its place with time.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

23

43

1.1K

78.9K

Facu Fagalde - e/acc retweetet

🇨🇳XuZhenqing徐祯卿@XueJia24682·8h

✨🇨🇳Autonomous taxis are undergoing road testing on the streets of Guiyang, China and will soon be put into operation.😯

English

10

63

303

6.6K

Facu Fagalde - e/acc retweetet

Messias@Messias30_·12h

Dios, cómo olvidar cuando Lavezzi subió jugadas de Messi de la nada. Nadie entendía nada.

Español

23

189

20.2K

449.8K

Facu Fagalde - e/acc retweetet

All About Argentina 🛎🇦🇷@AlbicelesteTalk·12h

Life lesson.

English

8

221

1.9K

24.6K

Facu Fagalde - e/acc retweetet

Anish Moonka@AnishA_Moonka·13h

Every time you message an AI chatbot, the model stores your entire conversation in temporary memory called a KV cache (a cheat sheet so it doesn’t re-read everything from scratch). On a large model like Llama 70B running a long conversation, that cache alone eats 40GB of GPU space, often more than the AI model itself. That’s half a $30,000 GPU chip consumed by one user’s memory. Google just published TurboQuant, a compression algorithm that shrinks this cache by 6x, down to just 3 bits per value, with zero accuracy loss across every benchmark tested. No retraining. No fine-tuning. Drop-in replacement. AI inference (running models for actual users, not training them) now makes up 55% of all AI compute spending. Hyperscalers are pouring nearly $700 billion into AI infrastructure in 2026. The KV cache is the single biggest memory bottleneck in that stack. When GPU cache memory fills up, the system can’t take more users. 6x compression means the same hardware handles roughly 6x more simultaneous conversations, or 6x longer context windows, or some mix of both. At cloud rates of $2-3/hour per H100 GPU, that’s the difference between profitable and unprofitable AI deployment. TurboQuant randomly rotates data to simplify its structure, applies a compressor, then adds a 1-bit error correction step to catch errors before they compound. On H100 GPUs it delivers up to 8x speedup over uncompressed computation. Google tested it across five long-context benchmarks on Llama, Gemma, and Mistral models. Perfect scores on needle-in-a-haystack (finding one specific fact buried in massive text). Being presented at ICLR 2026. It also outperforms existing methods for vector search, the technology that powers how search engines find similar results across billions of entries. Google runs billions of these searches daily. Three bits. Zero loss. 6x compression on the biggest memory bottleneck in a $700 billion infrastructure buildout.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

29

114

1.5K

223.6K

Facu Fagalde - e/acc retweetet

Josh Kale@JoshKale·13h

This post got ZERO attention but is BY FAR the biggest AI news this week Google just published TurboQuant: a compression algorithm that makes AI inference 8x faster while using 6x less memory. No retraining. No accuracy loss. The biggest cost is inference which happens billions of times a day, scaling with every user and query. It’s the bill that never stops growing. Inference also eats memory alive. The reason why GPU memory is the scarcest, most expensive resource in AI. Previous compression methods had a little secret: shrinking the data required storing extra instructions about how it was shrunk. That overhead ate nearly half the savings. Google found a way to restructure the data so those instructions aren’t needed at all. The overhead just vanishes. 32 bits compressed to 3. The entire cost structure shifts. Context windows expand on existing hardware. API costs compress. Models that needed clusters start fitting on smaller machines. This seems like a pretty big deal for team google and the industry at large

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

21

74

711

81K

Facu Fagalde - e/acc

Entdecken