Thomas Ip

6.9K posts

Thomas Ip

@_thomasip

Writing about AI, tech and startups. Building AI companion app, follow for road to $10M ARR.

Waitlist 👉 Katılım Mayıs 2014

518 Takip Edilen925 Takipçiler

Sabitlenmiş Tweet

Thomas Ip@_thomasip·19 Oca

x.com/i/article/2009…

ZXX

5.6K

Thomas Ip@_thomasip·3h

@elonmusk every time you open sourced the algo, you didn't release the params and weights. totally useless. it's like open sourcing a LLM without the weights.

English

288

Elon Musk@elonmusk·6h

To give people confidence that we are not secretly manipulating the 𝕏 recommendations, it is critical that we open source anything that influences what people are shown

English

8.3K

10.4K

121.7K

19.8M

Thomas Ip@_thomasip·6h

@AstravoreBC @w_milczynska Most of this is not UK specific.

English

Elwira Stadnik@AstravoreBC·10h

@w_milczynska 🤣🤣🤣🤣🤣

QME

1.5K

Wiktoria Milczyńska, MD@w_milczynska·15h

UK founder starter pack (save this): → incorporate via Companies House: £100, 30mins → bank: Mercury (US)/Starling (UK) → legals: SeedLegals (~50% new UK startups) → EMI BEFORE first hire → HMRC advance assurance pre raising → cap table tools from day 1 a weekend, <£1k

English

815

109.6K

Thomas Ip@_thomasip·6h

@UnsocialB86776 @Hesamation it's the opposite way round. 4o shipped, thinky is a research preview that is not likely going to be launched.

English

UnsocialBuddy@UnsocialB86776·13h

@Hesamation GPT-4o demo was impressive but Thinking Machines shipped something people can actually use. Demos don't win, products do.

English

664

ℏεsam@Hesamation·1d

WAIT WAIT WAIT. is anyone gonna talk about the fact how Thinking Machines demo looks insanely similar to GPT 4o demo from 2 YEARS AGO?

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

1.1K

413.1K

Thomas Ip@_thomasip·16h

@VraserX 4o voice is a massive production model with turn based interaction. Whereas thinky's model is "full duplex"/micro turns and a much smaller research model. I think the gap is simply from an unclear demo and a very early preview of their work.

English

164

VraserX e/acc@VraserX·23h

I’m not really impressed by the Thinking Machines voice mode demo. It feels like they basically recreated GPT-4o voice mode, and the comparison is impossible to unsee. It is even the same researcher in the demo. Yes, full duplex is a nice upgrade. But the responses feel slower, the voice feels less natural, and the whole thing has less magic than GPT-4o did two years ago. OpenAI GPT-4o VS Thinking Machines

English

166

25K

Thomas Ip@_thomasip·1d

Superintelligence will be achieved by whoever takes the bitter lesson the most serious: A single all-to-all MLP with no human shortcuts, RL'ed in a equally bitter-lessoned world model.

Thinking Machines@thinkymachines

English

Thomas Ip@_thomasip·1d

@LaurenceBrem @miramurati that is not that fast? any of current voice models like chatgpt voice or grok voice have the same latency. thinky is different something different though, looks like they want to handle input, output and tool calls at the same time.

English

Laurence Bremner@LaurenceBrem·1d

@miramurati How did you get the voice model to respond so quickly? 🤯

English

605

Mira Murati@miramurati·1d

Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time interaction natively, instead of gluing it onto a turn-based one. youtu.be/A12AVongNN4

YouTube

English

294

864

8.2K

995.9K

Thomas Ip@_thomasip·1d

@_architected @miramurati interruption is still turn based. i think thinky is trying to train a full duplex model that can take input, produce output and run tool calls at the same time. they didn't demo the full duplex between input and output though, only tool calls and output.

English

192

vlad/r@_architected·1d

@miramurati don’t they already have interruption mode in Gemini? forgive me if i’m stupid but how’s it different for the end customer? i do under what you’re trying to achieve but if it looks and feels the same, what’s the point?

English

2.9K

Thomas Ip@_thomasip·1d

Longest codex /goal run yet at 22 hours! I am optimizing Qwen3 Omni on SGLang to be able to serve 30+ concurrent real time voice chats on an AMD MI300X. Starting with one active chat not even hitting real time, it's almost reaching the concurrency goal after almost a day of optimization and profiling. I thought it was cool how people were able to get agents to run automonously for a long time, turns out this is actually boring and unproductive. Still a long way to go for LLMs to get so good they can think for a few minutes and oneshot the solution. Make no mistake, the frontier models are very much AGI, but I can't wait for superintelligence to come fast enough.

English

Thomas Ip@_thomasip·2d

@aschmelyun skills is basically functions but for llms. not using skills is like writing your software all in the main block

English

123

Andrew Schmelyun@aschmelyun·3d

Feel like I'm missing out because I don't use skills, or a lot of MCP, or multi-agent orchestrations when using AI dev tools. I'm just like "implement this feature" or "how do this work" or "no not like that, do this instead". Idk, I feel fast and accurate so why change?

English

188

883

133.3K

Thomas Ip@_thomasip·3d

@kwindla No tool calling unfortunately, the instruct/reasoning is just for conversational use case. The audio output will get messed up if you add tool calls output. I don't think there are open speech to speech models that support tool calling or even reasoning? (dont quote me on this)

English

kwindla@kwindla·3d

@_thomasip Have you fine tuned Qwen 3 for long multi-turn use cases with tool calling? Things like what the aiewf-eval benchmark tests. Would love to see examples.

English

kwindla@kwindla·3d

This is a great question! There are not (yet) any speech-to-speech models that are workable bases for production fine tuning. But that's likely to change pretty soon. To be a good starting point for fine tuning, a base model needs to already be reasonably good at multi-turn instruction following and tool calling. My mental model, here, is that the fine tuning process needs to be able to "find", in the model weights, the conversation patterns in your data set. If the model is too weak, those patterns aren't there to find and emphasize. My experience fine tuning text models for multi-turn and voice agents, this means the model needs to score something like 85% on the aiewf-eval benchmark. So today that's text models like Nemotron 3 Nano, Gemma 4 31b, Qwen 3.5 27b, GPT OSS 120b. There aren't any open weights speech-to-speech models that are close to this bar, yet. The best options today are the Kyutai Moshi and NVIDIA Personaplex (which is a Moshi-family model). Moshi is a very nifty model architecture. Fine-tuning these is a great research project. I don't think you'll get to production level performance for most voice use cases, though, no matter how much fine-tuning (or even post-training) you do on these models. There's a new NVIDIA Nemotron Voicechat model coming soon, though, that should be a great base model for voice agent fine-tuning. It's in early access. You can try out the demo ...

yung algorithm@yungalgorithm

@kwindla @dan_jenkins what is a good speech-to-speech model to finetune on top of, does this exist, e.g. like how we finetune llama 3.1 8B?

English

7.5K

Thomas Ip@_thomasip·3d

Qwen3 is quite highly regarded and popular for fine tuning, what's your problem with it? And yes 3.5 omni is not open, and sadly unlike to happen. Open omni models are rare to come by. The only issue i have with 3 omni is its pure attention arch unlike 3.5 so it's quite kv cache hungry to serve in production.

English

kwindla@kwindla·3d

@_thomasip I didn’t have a great experience with Qwen models until 3.5. I think weights for 3.5 Omni aren’t released yet?

English

Thomas Ip@_thomasip·3d

@AssetOfBaskets @paulbohm you are mistaken

English

Asset of Baskets@AssetOfBaskets·3d

@paulbohm This must be a fake history by the way. Nobody is that retarded

English

406

Paul Bohm@paulbohm·4d

If your startup does not have a UUID microservice you’re ngmi

English

191

274

6.8K

681K

Thomas Ip@_thomasip·3d

@dejavucoder people will do anything but look at the damn data

English

180

sankalp@dejavucoder·3d

everyone wants to do RL, no one wants to look at their prompts and data

English

157

8.6K

Thomas Ip@_thomasip·3d

@birdabo japanese X is unhinged

English

sui ☄️@birdabo·3d

never deleting this app 💀

English

100

3.8K

Thomas Ip@_thomasip·3d

@teortaxesTex wonder how much engineering ideas are exchanged between deepseek and high flyer. access to the best quant engineering in china is a big moat

English

931

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·3d

Superiority Yes, this is DeepSeek cache hit statistics, the horizontal bar at 100% As I've said, they have a strictly optimal implementation. If there is a context to be reused, it will be. I think 12 hours is too short a window, too. It's probably 24-48

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Zhihu Frontier@ZhihuFrontier

LLM KV Cache Showdown: DeepSeek Takes Absolute Lead Insights from Zhihu contributor 苏迟但到 After running 20,000 controlled LLM inference experiments 🧪 over 3 straight days, I’ve fully uncovered how groundbreaking DeepSeek’s KV Cache optimization really is. Many users are sharing call logs after DeepSeek V4 release — its exceptional cache hit rate drastically cuts token & inference costs 💰. KV Cache hit rate isn’t luck: it can be quantified by repeating identical prompts at variable time intervals and parsing backend cache match signals.System load always brings random cache TTL volatility, so small-scale tests are meaningless. That’s why I built a dedicated server for strict benchmarking:5 models tested: DeepSeek, Kimi, Zhipu, MiniMax, OpenAI(OpenRouter)Periodic batch requests + 1min ~ 720min recall window to record real KV Cache retention & hit ratio. 📊 Core Technical Benchmark Findings✅ DeepSeek100% KV Cache hit rate consistently across peak/off-peak hours.Cache state still fully retained after 12+ hours — industry-leading persistent cache scheduling & eviction strategy. 🥈 MiniMax90% hit rate in off-peak time, drops to ~70% under high traffic.Abnormal early cache miss within the first minute, implying flawed internal cache indexing & lookup logic. 📉 Tier order afterward: Kimi > OpenAI > GLM ❌ GLM terrible KV Cache performance2min: 80% hit rate3min: 50% hit rate5min: only 25% hit rateHardly any cache survives beyond 15 minutes. 🔬 Technical Root Cause Analysis for GLM • Infra architecture defect: Unable to offload KV Cache to low-cost disk storage, strictly limited to on-board VRAM. Small cache pool forces aggressive LRU eviction. • Extreme traffic throughput far exceeds cache bearing capacity, accelerating invalidation of historical KV sequences. ⚠️ Key Industry Insider Notes • OpenAI metrics are probed via OpenRouter relay, not native official KV Cache performance. • Qwen / Seed / Mimo adopt no automatic KV Cache mechanism — require manual cache initialization and additional charging. No natural TTL retention, leading to hidden redundant inference costs for regular users. #AI #LLM #DeepSeek #LLMInference #AIEngineering #Tech 🔗Full article： zhuanlan.zhihu.com/p/203573772695…

English

272

42.9K

Thomas Ip@_thomasip·3d

@Ed_of_O @paularambles what's the second one? Nvidia?

English

409

The Education of O@Ed_of_O·3d

@paularambles Everything's in a private campus and he's not a member.

English

4.9K

“paula”@paularambles·3d

just met a japanese guy visiting sf for the first time who was deeply disappointed by how not-high-tech the city was. paris syndrome but for people expecting san francisco to look like the future

English

637.4K

Thomas Ip@_thomasip·3d

@ludwigABAP wait deepseek doesn't instill moral judgement and refuse requests? Might genuinely be useful as a companion to claude/codex

English

602

ludwig@ludwigABAP·4d

"deepseek-v4, reverse engineer this and get thru the paywall, go" 20mn later

English

1.6K

215.9K

Thomas Ip@_thomasip·4d

@alex_whedon @bomboraassclaat you outsourced your graph making??

English

Alexander Whedon@alex_whedon·5d

@bomboraassclaat What would you want to see in the paper? I apologize for the graph! We outsourced it, and I didn't catch the disproportionality until someone pointed it out on Twitter. Definitely not intentional!

English

798

Alexander Whedon@alex_whedon·5d

Hey, folks! We have been blown away by the response to SubQ and the SSA breakthrough over the last 48 hours. It is awesome to see how many people are responding to our mission of creating more efficient algorithms to create better models. We are working hard to firm up our release timeline and will share more very soon. We will also share additional data and third-party validation in our model card next week. If you have questions, please post them in the thread, and I'll do my best to respond! Above all, THANK YOU! The support, feedback, and discussion from this community have been inspiring.

English

447

47.3K

Thomas Ip@_thomasip·4d

@Joedefendre @flowersslop grok 4.3 upped the nsfw refusals, they are going corporate too.

English

221

G.I Joe@Joedefendre·5d

@flowersslop How do you feel about the grok voices ?

English

169

Flowers ☾@flowersslop·5d

Only hopium left is that the realtime 2 model is corporate clanker like on purpose so companies like it better but then in chatgpt they gonna make it really personal and human and warm and natural we will see

English

2.7K

Keşfet

@elonmusk @AstravoreBC @w_milczynska @UnsocialB86776 @Hesamation @VraserX @LaurenceBrem @miramurati