Martin Gale

774 posts

Martin Gale

@finstratege

taming the beasts 🪽🦞

SF Katılım Mart 2024

407 Takip Edilen114 Takipçiler

Martin Gale@finstratege·22h

@TheDavidaGinter @polsia I agree, it’s annoying that they exaggerate their “””arr””” so much though.. doesn’t inspire confidence 😮‍💨

English

Davida Ginter@TheDavidaGinter·1d

Everyone’s talking about @polsia raising $30M on a product that looks… meh. This is actually the strongest validation yet of what the market is thirsty for: AI that works for you, instead of you working for it. Polsia’s product is far from perfect (tried. stopped). But the promise is interesting: Can you run a business while AI handles the boring operational work for you?

English

10.4K

Martin Gale@finstratege·1d

@siddsax Please make it a show!!! So amazing 🤩

English

Siddhartha Saxena@siddsax·1d

Anthropic onboarding day: Michael Scott introducing Karpathy like he just signed Wemby in free agency.

English

370

1.4K

16.4K

1.9M

Martin Gale@finstratege·1d

omg this is so amazing

English

Martin Gale@finstratege·5d

@kwindla it’s just that it gave us great speed and agent seems very reactive which is good for the customer experience

English

kwindla@kwindla·5d

@finstratege Interesting. I didn't love the results when I tested 3.6 sparse, but if it's working well for you I should spend more time with it. Can you talk about the use cases where that model is doing well, for you?

English

kwindla@kwindla·5d

Gemini 3.5 Flash is out today. Here are numbers from my main voice and task agent benchmarks. Some notes: All the Gemini 3 models so far are too slow to work well for voice agents. Gemini 2.5 Flash was a *great* model for voice agents, when it was SOTA. It was fast and good at instruction following. Its big weakness was tool calling. It was quite difficult to prompt Gemini 2.5 Flash to perform tool calling reliably in long context, multi-turn use cases. With Gemini 3, Google improved the tool calling issues a lot. But time to first token is ~1s. We really need TTFT down below 700ms. Google isn't alone in this. All the SOTA models released this year have been reasoning models that aren't optimized for low latency. Claude Haiku 4.5 (released last October) remains the best-performing model with a TTFT under 700ms. Gemini 3.5 Flash is the first Flash model in the 3 family to be released as "generally available." It's quite different from gemini-3-flash-preview, which was released last December. That model actually scored a bit better on my voice agent benchmark. This new model is the new overall top scorer on my task agent benchmark. This benchmark tests a multi-turn task, requiring that models achieve a P50 turn execution time faster than four seconds. Gemini 3.5 Flash with a "high" thinking budget scores significantly better than any other model I've tested. So even though the TTFT isn't what we'd like to see from this model, the overall generation speed makes up for it, and allows us to use the "high" thinking budget and still achieve a per-turn P50 under two seconds. Very impressive. This performance costs money, though. I had become accustomed to thinking of Gemini models as aggressively priced. But Gemini 3.5 Flash is actually more expensive than GPT-5.4 and Claude Sonnet 4.6 on this benchmark. Also note that lower reasoning settings don't always save money. Gemini 3.5 Flash "minimal" costs more, on this benchmark, than "high," because it makes more mistakes, so it uses more tokens to complete the task. Please note that performance of this model on your benchmarks might be very different. My voice agent and task agent results are often wildly out of line with the reported results on standard benchmarks in the model cards and release notes. The voice agent benchmark is 30 turns, and heavily tests tool calling in a long-context scenario. The task agent benchmark injects large streams of structured data events into the context, all tool calls are asynchronous, and the test task takes at least 32 turns to complete. (My motto for evals is "30 turns or it didn't happen.") Make your own benchmarks! (And post the source code and the results for different models, if you can.)

English

113

14K

Martin Gale@finstratege·5d

@kwindla for my cases I find it hard to justify maintaining anything else..

English

kwindla@kwindla·5d

@finstratege That’s pretty good. 27b dense or 35B sparse? The 27b version seems to me like it performs a lot better. But it’s more expensive to serve at scale.

English

217

Martin Gale@finstratege·6d

micromanaging | /ˌmī-krō-ˈma-ni-jiŋ/ | noun - the practice of reviewing and approving every action your AI agent takes, i.e. not running it in YOLO bypass-permissions mode. "instead of trusting the agent, he kept micromanaging it, hand-approving each command with cmd + enter."

English

Martin Gale@finstratege·6d

lol @claudeai 🧢🧢🧢🧢🧢

Martin Gale@finstratege·6d

@caydengineer @a_israelov @MentraGlass bravo 👏👏

Português

cayden 凯登@caydengineer·18 May

Launching Mentra Live open-source smart glasses. Deploy smart glasses for real world work. We already shipped thousands. Now, they're generally available. Build apps that leave the screen. Let your AI step into the real world.

San Francisco, CA 🇺🇸 English

538

75.7K

Martin Gale@finstratege·18 May

[🔮 vision tweet] you won't manage a knowledge base for your agents. your computer / workspace / server IS the knowledge base

English

Martin Gale@finstratege·15 May

@egocgp Y’a une pomme pourrie dans le lot et ça a contaminé tout le panier lol

Français

820

Léo Bachelot@egocgp·15 May

ça me fume le downfall / glowdown des entrepreneurs de Qui Veut Être Mon Associé Tous dans des plans foireux de club d'investissement, de webinaire formation en publicité insta youtube pour t'apprendre la méthode croissance Alors qu'ils ont tous un background très respectables

Français

177

34.6K

Martin Gale@finstratege·15 May

@Racem @LegalPlace Bravo! Cocorico 🐔

Italiano

115

Racem Flazi@Racem·15 May

Une étape très intéressante commence pour @LegalPlace J’en parle bientôt dans un épisode du podcast.

Les Echos@LesEchos

French Tech : la start-up LegalPlace met la main sur son concurrent historique trib.al/vRJNFzJ

Français

9.8K

Martin Gale@finstratege·12 May

Just closed our biggest account to date… $18K ARR. Fuck yeahhh!!!!!!!!!!!!!!!

English

Martin Gale@finstratege·12 May

@thinkymachines a $2B voice agent with few tools and video stream… 💀

English

Thinking Machines@thinkymachines·11 May

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

460

1.9K

15.7K

7.6M

Martin Gale@finstratege·9 May

@daedalium @NanoCorpHQ C’est un paid partnership ou tu y trouves vraiment de la valeur? Tu payes? Just curious 🧐

Français

214

Oussama Ammar@daedalium·9 May

I just launched my autonomous AI company "Dwell HQ" on @NanoCorpHQ Verification: bask-Mw3A

English

10.3K

Martin Gale@finstratege·8 May

@parisbayarea vive la France

Français

parisbayarea@parisbayarea·8 May

bro I have rewatched this video so many times - I still can't over the fact that I look like a LITERALL GOBLIN

Don@donatelli2026

the american mind cannot understand this

English

4.1K

Martin Gale@finstratege·8 May

@kai_brokering What case is better for speech-to-speech?

English

418

Kai Brokering@kai_brokering·8 May

These speech-to-speech models are getting insanely good, but the pricing still makes most real-world apps unrealistic I have so many ideas for this tech, but at current costs I’d need to charge ~$200/user just to make it work

OpenAI@OpenAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

English

254

160.8K

Martin Gale@finstratege·8 May

@kwindla Is it really? Would you use it in customer facing app over STT-to-TTS?

English

164

kwindla@kwindla·8 May

OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough to use in my voice agents that do "real work." Or real play, for that matter. Here's gpt-realtime-2 as the brain of the ship AI in Gradient Bang. The voice-to-voice response and tool calling times here are unedited, so you can see exactly what the interaction with the model is like in an agent with a very complex system instruction and frequent tool calls. (I did clip out the subagent task execution segments, after gpt-realtime-2 starts a subagent via a tool call. Subagents in this config used gpt-5.2 "medium" effort.)

English

450

54.7K

Keşfet

@TheDavidaGinter @polsia @siddsax @kwindla @claudeai @caydengineer @a_israelov @MentraGlass