Chase Fagen

1.2K posts

Chase Fagen

@chasef07

Lifestyle Engineer x Apple Silicon Inference Engineer

Katılım Haziran 2014

225 Takip Edilen35 Takipçiler

Chase Fagen retweetledi

AT@AliesTaha·1d

x.com/i/article/2037…

ZXX

614

68.3K

Chase Fagen@chasef07·1d

@demishassabis @GeminiApp @GoogleAIStudio Is it good for telephony?

English

Demis Hassabis@demishassabis·1d

Gemini 3.1 Flash Live is our highest quality audio & voice model yet - and a big leap towards building next-gen voice-first agents. Lower latency, better precision, more natural interactions... try it now with Gemini Live in the @GeminiApp or build with it in @GoogleAIStudio!

Google DeepMind@GoogleDeepMind

Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵

English

131

1.5K

243.3K

Chase Fagen retweetledi

Google Research@GoogleResearch·3d

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

GIF

English

954

5.6K

38K

18.5M

Chase Fagen@chasef07·2d

@ArtificialAnlys Need the best for telephony specific rankings

English

Artificial Analysis@ArtificialAnlys·3d

Inworld, ElevenLabs, and MiniMax continue to lead our Text to Speech leaderboard for most preferred models Recent checkpoints from each of the labs continue to push the frontier of TTS quality, with 4 out of the top 5 models being released this year. Leading TTS models are increasingly realistic, particularly on relatively straightforward text, with preference differences increasingly coming down to affinity for different voices. Latest results also reflect stronger bot vote filtering, confirmed via triangulation against third-party evaluators. We've also added rank ranges based on each model's 95% confidence interval, showing where a model could land based on its Elo score range. Key results: ➤ Most preferred: Current top 5 per our TTS leaderboard: 1. Inworld TTS 1.5 Max (Elo of 1,238); 2. ElevenLabs Eleven v3 (1,197); 3. Inworld TTS 1 Max (1,183); 4. Inworld TTS 1.5 Mini (1,182); 5. MiniMax Speech 2.8 HD (1,175) ➤ Price: Kokoro 82M v1.0 (Replicate) leads at $0.65 per 1M characters, followed by Inworld TTS 1 and 1.5 Mini at $5, and AsyncFlow V2 at $8.33 ➤ Speed: WaveNet leads for batch generation at 419 characters processed per second, followed by Kokoro 82M v1.0 (Replicate) at 235, and Inworld TTS 1.5 Mini at 214 See below for further detail ⬇️

English

113

11.2K

Chase Fagen@chasef07·2d

@ArtificialAnlys The quality varies drastically over telephony.

English

Chase Fagen retweetledi

Prince Canuma@Prince_Canuma·3d

Just implemented Google’s TurboQuant in MLX and the results are wild! Needle-in-a-haystack using Qwen3.5-35B-A3B across 8.5K, 32.7K, and 64.2K context lengths: → 6/6 exact match at every quant level → TurboQuant 2.5-bit: 4.9x smaller KV cache → TurboQuant 3.5-bit: 3.8x smaller KV cache The best part: Zero accuracy loss compared to full KV cache.

Google Research@GoogleResearch

English

146

405

5.2K

704.6K

Chase Fagen retweetledi

The All-In Podcast@theallinpod·4d

🚨INTERVIEW SPECIAL!: Four CEOs on the Future of AI: CoreWeave, Perplexity, Mistral, and IREN (0:00) Intro live from Nvidia GTC (0:37) CoreWeave CEO, Michael Intrator (32:58) Perplexity CEO, Aravind Srinivas (1:07:11) Mistral CEO, Arthur Mensch (1:18:57) IREN CEO, Daniel Roberts -------------------------------------- Our episode is sponsored by the New York Stock Exchange - a modern marketplace and exchange for building the future. It all happens at the NYSE - nyse.com

English

158

1.3K

290K

Chase Fagen retweetledi

Lex Fridman@lexfridman·4d

Here's my conversation with Jensen Huang, CEO of NVIDIA, the most valuable & one of the most influential companies in the history of human civilization. It is the engine powering the AI revolution. This was a fascinating & inspiring conversation, in parts super-technical on engineering of every part of the AI stack, memory, power, supply chain (TSMC, ASML, etc), in parts about leadership & psychology, and in parts personal & philosophical about life, consciousness, mortality, and human nature. It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Introduction 0:33 - Extreme co-design and rack-scale engineering 3:18 - How Jensen runs NVIDIA 22:40 - AI scaling laws 37:40 - Biggest blockers to AI scaling laws 39:23 - Supply chain 41:18 - Memory 47:24 - Power 52:43 - Elon and Colossus 56:11 - Jensen's approach to engineering and leadership 1:01:37 - China 1:09:50 - TSMC and Taiwan 1:15:04 - NVIDIA's moat 1:20:41 - AI data centers in space 1:24:30 - Will NVIDIA be worth $10 trillion? 1:34:39 - Leadership under pressure 1:48:25 - Video games 1:55:16 - AGI timeline 1:57:29 - Future of programming 2:11:01 - Consciousness 2:17:22 - Mortality

English

785

2.8K

12.2K

2.3M

Chase Fagen@chasef07·5d

Benchmarked Qwen3.5-9B on my M4 Pro Mac Mini: - 51 tok/s generation - 362 tok/s prompt processing - 187ms time to first token - 14.9W GPU power (100% utilization) - 3.4 tokens/sec/watt A 4090 needs 300W+ for this. Apple Silicon does it at 15W. Wild.

English

Chase Fagen@chasef07·5d

Im still confused do people prefer MLX or LLama.cpp GGUF for mac mini inference

English

Chase Fagen@chasef07·6d

@JKBallin55 Bahahahaha

Filipino

Holden Mikehawk@JKBallin55·6d

@chasef07 You don’t like $26 matcha and Muslims?

English

Chase Fagen@chasef07·6d

People are like New York’s awesome becuase they have 12 dollar coffee to come with how shit the weather and people are

English

Chase Fagen@chasef07·6d

EVERYTIME I step foot here i feel sub human

English

Chase Fagen@chasef07·6d

Anybody drink mate in New York? @_kyleshechtman and I want to shlurp this week

English

Chase Fagen@chasef07·6d

A lot of the great minds in AI talking about new memory and tool usage things for long running hyper intelligent agents. For realtime agents where actions must be decided in milliseconds it’s wya different

English

Chase Fagen@chasef07·6d

@trq212 @bcherny Interesting the design considerations for real time agents is way different than long running ones

English

Thariq@trq212·21 Mar

I put a lot of heart into my technical writing, I hope it's useful to you all. 📌 Here's a pinned thread of everything I've written. (much of this will be posted on the Claude blog soon as well)

English

228

752

7.2K

963.4K

Chase Fagen@chasef07·21 Mar

@trq212 Ya prompt caching is so importan. Does Claude api give a flag for when you hit cache?

English

Chase Fagen retweetledi

Thariq@trq212·21 Mar

imo my highest alpha post is on prompt caching, but it's only really relevant if you're building agents from scratch x.com/trq212/status/…

Thariq@trq212

x.com/i/article/2024…

English

192

68.2K

Chase Fagen retweetledi

Aaron Levie@levie·21 Mar

It is quite ridiculous how agile you have to be with your AI agent stack right now. Whatever you spent 6 months perfecting 12 months ago probably is already out of date and you’re better off doing a reset than trying to resuscitate it architecturally. And what’s interesting is that for every jump in progress that eliminates one part of the stack, generally a new capability becomes possible that you need to build new scaffolding for. For instance, probably lots of RAG pipelines have had to adjust because of context windows have improved dramatically and you can now just using agentic search due to improve tool use. But that same improved tool use means you probably need to be supporting code execution with sandboxes so the agent can handle more complex work. So one capability gets bitter lessened, and a new one opens up altogether. This is the cycle we’re going to be in for years. If you don’t have the speed and agility to deal with it, probably going to be in a tough spot.

Matt Carey@mattzcarey

every new model generation you see the pinch of the bitter lesson. harnesses, pipelines, rules which previously felt important now hold you back from innovating. what took months of grind for you is now just a prompt away at ½ the cost. look for it and you will see. Both large and small companies re-evaluating. Company directions change before your eyes. it’s a wild moment for our industry

English

460

109.6K

Chase Fagen@chasef07·21 Mar

Halved my prompt today

English

Keşfet

@demishassabis @GeminiApp @GoogleAIStudio @ArtificialAnlys @JKBallin55 @_kyleshechtman @trq212 @bcherny