Chase Fagen

1.2K posts

Chase Fagen banner
Chase Fagen

Chase Fagen

@chasef07

Lifestyle Engineer x Apple Silicon Inference Engineer

Katılım Haziran 2014
225 Takip Edilen35 Takipçiler
Demis Hassabis
Demis Hassabis@demishassabis·
Gemini 3.1 Flash Live is our highest quality audio & voice model yet - and a big leap towards building next-gen voice-first agents. Lower latency, better precision, more natural interactions... try it now with Gemini Live in the @GeminiApp or build with it in @GoogleAIStudio!
Google DeepMind@GoogleDeepMind

Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵

English
99
131
1.5K
243.3K
Chase Fagen retweetledi
Google Research
Google Research@GoogleResearch·
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI
GIF
English
954
5.6K
38K
18.5M
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Inworld, ElevenLabs, and MiniMax continue to lead our Text to Speech leaderboard for most preferred models Recent checkpoints from each of the labs continue to push the frontier of TTS quality, with 4 out of the top 5 models being released this year. Leading TTS models are increasingly realistic, particularly on relatively straightforward text, with preference differences increasingly coming down to affinity for different voices. Latest results also reflect stronger bot vote filtering, confirmed via triangulation against third-party evaluators. We've also added rank ranges based on each model's 95% confidence interval, showing where a model could land based on its Elo score range. Key results: ➤ Most preferred: Current top 5 per our TTS leaderboard: 1. Inworld TTS 1.5 Max (Elo of 1,238); 2. ElevenLabs Eleven v3 (1,197); 3. Inworld TTS 1 Max (1,183); 4. Inworld TTS 1.5 Mini (1,182); 5. MiniMax Speech 2.8 HD (1,175) ➤ Price: Kokoro 82M v1.0 (Replicate) leads at $0.65 per 1M characters, followed by Inworld TTS 1 and 1.5 Mini at $5, and AsyncFlow V2 at $8.33 ➤ Speed: WaveNet leads for batch generation at 419 characters processed per second, followed by Kokoro 82M v1.0 (Replicate) at 235, and Inworld TTS 1.5 Mini at 214 See below for further detail ⬇️
Artificial Analysis tweet media
English
8
7
113
11.2K
Chase Fagen retweetledi
Prince Canuma
Prince Canuma@Prince_Canuma·
Just implemented Google’s TurboQuant in MLX and the results are wild! Needle-in-a-haystack using Qwen3.5-35B-A3B across 8.5K, 32.7K, and 64.2K context lengths: → 6/6 exact match at every quant level → TurboQuant 2.5-bit: 4.9x smaller KV cache → TurboQuant 3.5-bit: 3.8x smaller KV cache The best part: Zero accuracy loss compared to full KV cache.
Prince Canuma tweet media
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
146
405
5.2K
704.6K
Chase Fagen retweetledi
The All-In Podcast
The All-In Podcast@theallinpod·
🚨INTERVIEW SPECIAL!: Four CEOs on the Future of AI: CoreWeave, Perplexity, Mistral, and IREN (0:00) Intro live from Nvidia GTC (0:37) CoreWeave CEO, Michael Intrator (32:58) Perplexity CEO, Aravind Srinivas (1:07:11) Mistral CEO, Arthur Mensch (1:18:57) IREN CEO, Daniel Roberts -------------------------------------- Our episode is sponsored by the New York Stock Exchange - a modern marketplace and exchange for building the future. It all happens at the NYSE - nyse.com
English
62
158
1.3K
290K
Chase Fagen retweetledi
Lex Fridman
Lex Fridman@lexfridman·
Here's my conversation with Jensen Huang, CEO of NVIDIA, the most valuable & one of the most influential companies in the history of human civilization. It is the engine powering the AI revolution. This was a fascinating & inspiring conversation, in parts super-technical on engineering of every part of the AI stack, memory, power, supply chain (TSMC, ASML, etc), in parts about leadership & psychology, and in parts personal & philosophical about life, consciousness, mortality, and human nature. It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Introduction 0:33 - Extreme co-design and rack-scale engineering 3:18 - How Jensen runs NVIDIA 22:40 - AI scaling laws 37:40 - Biggest blockers to AI scaling laws 39:23 - Supply chain 41:18 - Memory 47:24 - Power 52:43 - Elon and Colossus 56:11 - Jensen's approach to engineering and leadership 1:01:37 - China 1:09:50 - TSMC and Taiwan 1:15:04 - NVIDIA's moat 1:20:41 - AI data centers in space 1:24:30 - Will NVIDIA be worth $10 trillion? 1:34:39 - Leadership under pressure 1:48:25 - Video games 1:55:16 - AGI timeline 1:57:29 - Future of programming 2:11:01 - Consciousness 2:17:22 - Mortality
English
785
2.8K
12.2K
2.3M
Chase Fagen
Chase Fagen@chasef07·
Benchmarked Qwen3.5-9B on my M4 Pro Mac Mini: - 51 tok/s generation - 362 tok/s prompt processing - 187ms time to first token - 14.9W GPU power (100% utilization) - 3.4 tokens/sec/watt A 4090 needs 300W+ for this. Apple Silicon does it at 15W. Wild.
English
1
0
0
83
Chase Fagen
Chase Fagen@chasef07·
Im still confused do people prefer MLX or LLama.cpp GGUF for mac mini inference
English
0
0
0
31
Chase Fagen
Chase Fagen@chasef07·
People are like New York’s awesome becuase they have 12 dollar coffee to come with how shit the weather and people are
English
2
0
1
36
Chase Fagen
Chase Fagen@chasef07·
EVERYTIME I step foot here i feel sub human
English
0
0
1
19
Chase Fagen
Chase Fagen@chasef07·
A lot of the great minds in AI talking about new memory and tool usage things for long running hyper intelligent agents. For realtime agents where actions must be decided in milliseconds it’s wya different
English
0
0
0
20
Chase Fagen
Chase Fagen@chasef07·
@trq212 @bcherny Interesting the design considerations for real time agents is way different than long running ones
English
0
0
0
68
Thariq
Thariq@trq212·
I put a lot of heart into my technical writing, I hope it's useful to you all. 📌 Here's a pinned thread of everything I've written. (much of this will be posted on the Claude blog soon as well)
English
228
752
7.2K
963.4K
Chase Fagen
Chase Fagen@chasef07·
@trq212 Ya prompt caching is so importan. Does Claude api give a flag for when you hit cache?
English
0
0
0
51
Chase Fagen retweetledi
Aaron Levie
Aaron Levie@levie·
It is quite ridiculous how agile you have to be with your AI agent stack right now. Whatever you spent 6 months perfecting 12 months ago probably is already out of date and you’re better off doing a reset than trying to resuscitate it architecturally. And what’s interesting is that for every jump in progress that eliminates one part of the stack, generally a new capability becomes possible that you need to build new scaffolding for. For instance, probably lots of RAG pipelines have had to adjust because of context windows have improved dramatically and you can now just using agentic search due to improve tool use. But that same improved tool use means you probably need to be supporting code execution with sandboxes so the agent can handle more complex work. So one capability gets bitter lessened, and a new one opens up altogether. This is the cycle we’re going to be in for years. If you don’t have the speed and agility to deal with it, probably going to be in a tough spot.
Matt Carey@mattzcarey

every new model generation you see the pinch of the bitter lesson. harnesses, pipelines, rules which previously felt important now hold you back from innovating. what took months of grind for you is now just a prompt away at ½ the cost. look for it and you will see. Both large and small companies re-evaluating. Company directions change before your eyes. it’s a wild moment for our industry

English
63
38
460
109.6K
Chase Fagen
Chase Fagen@chasef07·
Halved my prompt today
English
0
0
0
12