Alexandre Momeni

476 posts

Alexandre Momeni

@AlexandreMomeni

_investor (@GeneralCatalyst) _alumni(@NablaTech, @GoldmanSachs, @Stanford, @Polytechnique, @HECParis, @LSEEcon). Health & Bio, Machine Intelligence, Infra

London, England Katılım Eylül 2018

900 Takip Edilen400 Takipçiler

Alexandre Momeni retweetledi

General Catalyst@generalcatalyst·4d

“The digital world lacks physical information. The physical world lacks dense digital information. Games perfectly merge these two together, and we believe that’s just the next phase of pretraining.” Every few weeks @gen_intuition ships new emergent capabilities that are orthogonal to anything you see in the LLM world. Watch GC’s @max_rimpel in conversation with @PimDeWitte. Chapters 00:00 — Introduction 00:10 — The World's Biggest Private RuneScape Server 03:09 — From RuneScape to Ebola 07:41 — Mapping the Unmappable 09:47 — Why LLMs Can't See the World 13:45 — The Accidental Foundation of General Intuition 19:10 — Turning Down a Life-Changing Acquisition Offer 21:12 — One Foot in Front of the Other 24:04 — Atoms to Atoms 27:33 — The Talent Flywheel 30:20 — Protecting the Last Weird Corner of the Internet

English

18.6K

Alexandre Momeni retweetledi

Siavash@siavashg·14 May

x.com/i/article/2054…

ZXX

723K

Alexandre Momeni retweetledi

Patrick Collison@patrickc·17 Nis

I'm lucky enough to have a great doctor and access to excellent Bay Area medical care. I've taken lots of standard screening tests over the years and have tried lots of "health tech" devices and tools. With all this said, by far the most useful preventative medical advice that I've ever received has come from unleashing coding agents on my genome, having them investigate my specific mutations, and having them recommend specific follow-on tests and treatments. Population averages are population averages, but we ourselves are not averages. For example, it turns out that I probably have a 30x(!) higher-than-average predisposition to melanoma. Fortunately, there are both specific supplements that help counteract the particular mutations I have, and of course I can significantly dial up my screening frequency. So, this is very useful to know. I don't know exactly how much the analysis cost, but probably less than $100. Sequencing my genome cost a few hundred dollars. (One often sees papers and articles claiming that models aren't very good at medical reasoning. These analyses are usually based on employing several-year-old models, which is a kind of ludicrous malpractice. It is true that you still have to carefully monitor the agents' reasoning, and they do on occasion jump to conclusions or skip steps, requiring some nudging and re-steering. But, overall, they are almost literally infinitely better for this kind of work than what one can otherwise obtain today.) There are still lots of questions about how this will diffuse and get adopted, but it seems very clear that medical practice is about to improve enormously. Exciting times!

English

488

642

9.6K

4.1M

Alexandre Momeni retweetledi

Atila@atiorh·14 Nis

This is what context does to your speech-to-text system! Our new paper studies the impact of contextual information on the accuracy of leading open-source and proprietary systems.

English

2.3K

Alexandre Momeni retweetledi

Atila@atiorh·10 Nis

localhost Ep. 2 Bryan Catanzaro (@ctnzr) on @NVIDIAAI's open models and risky bets (00:20) Who is Bryan? (07:38) Getting Nvidia to care about Deep Learning (14:13) Why did Bryan leave Nvidia right when Deep Learning was taking off (18:02) Leadership: Aligning a village of researchers (24:12) Will the frontier flip back to open? (32:16) Nvidia's models: Side project or core business? (38:19) Efficiency leads to edge inference: Does Apple capture inference? (42:43) Nvidia’s risky bets: Fewer and fewer bits (47:19) Nvidia's misstep with Volta (52:30) Every model is already obsolete as soon as you stop training it

English

1.5K

Alexandre Momeni@AlexandreMomeni·17 Mar

@AElkrief Nice

English

Alex Elkrief@AElkrief·17 Mar

AI agents in digital asset management are moving from concept to production. Continuous market monitoring, cross-protocol rebalancing, automated strategy execution — the operational advantages are clear. The underexplored risk is hallucination. LLM-powered agents generate outputs probabilistically. They can misinterpret oracle data, fabricate a protocol address, or route funds into a pool with no liquidity — all with high confidence. On-chain, where transactions are irreversible, this class of error is uniquely costly. Traditional finance mitigates this through pre-trade checks, approval chains, and segregation of duties. Most on-chain vault infrastructure has no equivalent — the agent gets signing authority, and that authority is unconstrained. Upshift vaults, with their on-chain policy engine, provide an elegant solution here. The architecture enforces deterministic constraints at the smart contract level, independent of the agent's reasoning: Pre-execution: - Role-based access control scopes each agent to a defined set of operations - Granular permissions validate every external call down to the function selector - Asset and protocol whitelists bound the universe of allowable interactions During execution: - Balance checks before and after every call verify expected outcomes — mismatches trigger a revert - NAV growth rate caps limit portfolio-level impact per unit of time - A module discovery pattern separates intent declaration from execution — the agent proposes, the contract validates before committing Post-execution: - Timelocked governance enforces delays on critical changes - Emergency pause allows security providers to halt operations immediately Smart contracts are deterministic — they enforce exactly the boundaries they were programmed with, regardless of the calling agent's confidence level. As autonomous agents take on a larger role in on-chain asset management, the policy layer underneath them becomes the critical infrastructure. That's what we're building.

English

184

Alexandre Momeni retweetledi

HackerNewsTop5@hackernewstop5·17 Mar

Mistral Releases Leanstral #HackerNews mistral.ai/news/leanstral

English

Alexandre Momeni@AlexandreMomeni·4 Mar

@Alfred_Lin 😂😂😂

QME

Alfred Lin@Alfred_Lin·4 Mar

PI's robot can now make a grilled cheese without burning it. It has thus passed the Alfred Test, a higher bar than the Turing Test, because I still cannot do that reliably.

Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

463

53K

Alexandre Momeni retweetledi

Atila@atiorh·28 Şub

Why is the 100 ms barrier for Qwen3-TTS (1.7b) this important?👇 Nvidia GPUs scale up amazingly, but they don't scale down well to serving a single user with sub-3b Transformers. They are throughput-maximizers, not latency-minimizers. @Alibaba_Qwen's Qwen3-TTS paper showed that an optimized vLLM implementation on Nvidia GPUs achieved 101 ms time-to-first-byte latency under idealized conditions: no concurrency and no network round-trip latency. Argmax TTSKit achieves as low as 70 ms on Apple Silicon Macs in the post below, but the takeaway is not 70 vs 101 ms here. The takeaway is that, when we move from idealized conditions to the real world: - Mac will actually serve a single user without an internet round-trip, and the user will experience sub-100ms latency as-is - Nvidia GPUs will serve many users concurrently in the cloud, resulting in at least 3-5x higher latency. Most importantly, latency will have high variance. Real-time streaming inference for sub-3b Transformers is where on-device inference is differentiated from cloud, and companies pay the premium for this today. This is the only commercially relevant market segment where the broadly repeated but rarely substantiated claim of "on-device is faster" actually holds, not running 1T LLMs on 2 Mac Studios.

argmax@argmax

TTSKit now achieves sub-100ms time-to-first-byte for Qwen3-TTS 1.7b on Apple Silicon! Link to the code repo and details in comments.

English

139

23.2K

Alexandre Momeni retweetledi

Atila@atiorh·25 Şub

WhisperKit is at 5M! Up 5x in 35 days 2026 is the year of on-device inference❤️

argmax@argmax

We are thrilled that WhisperKit reached 1 million monthly on @huggingface! - First ever Apple Silicon-only model to reach 1M - Usage grew 10x in 2025 - Free, MIT open-source and pure-Swift

English

105

12.7K

Alexandre Momeni retweetledi

Atila@atiorh·24 Şub

Real-time Transcription with Speakers is now generally available!

argmax@argmax

Real-time Transcription with Speakers is now generally available on iOS and macOS! Details for installing or simply testing Argmax SDK 2 are in the comments.

English

2.7K

Alexandre Momeni retweetledi

Atila@atiorh·23 Şub

Ultra low-latency real-time speech-to-text in Superwhisper is out!

superwhisper@superwhisper

✨ Realtime speech to text in superwhisper v2.10

English

9.1K

Alexandre Momeni@AlexandreMomeni·21 Şub

Beyond @GoogleDeepMind and @IsomorphicLabs, @demishassabis’s legscy may be the generation of founders he’s inspired - @MistralAI @orbitalmaterials @latentlabs and many more.

James Dacombe@jamesdacombe

Two observations: 1. @demishassabis has done more for the UK by demanding DeepMind remain headquartered in London than arguably any Briton in recent decades (never mind all of his other achievements for the world). His actions will single-handedly account for the majority of the UK’s future growth, if the politicians can manage to stay out of the way. What a legend. 2. Sequoia appear to be back and playing aggressively again.

English

245

Alexandre Momeni retweetledi

argmax@argmax·19 Şub

We are open-sourcing TTSKit! Run state-of-the-art text-to-speech models on your Mac and iPhone. The launch version supports @Alibaba_Qwen Qwen3-TTS and generates audio faster than real-time playback with sub-200 ms time-to-first-byte. Voice cloning and advanced speed optimizations will be in the next version. Link to the GitHub repo and models on @huggingface in comments.

English

388

62K

Alexandre Momeni retweetledi

Atila@atiorh·11 Şub

Pro tip: When using @superwhisper for AI meeting notes, select Parakeet (voice to text) + Sonnet 4.5 (text to summary) and put all of your company jargon in Vocabulary. Thank me later.

English

438

Alexandre Momeni@AlexandreMomeni·5 Şub

@MistralAI and @argmax are going to be a fire combo

Mistral AI@MistralAI

Introducing Voxtral Transcribe 2, next-gen speech-to-text models by @MistralAI. State-of-the-art transcription, speaker diarization, sub-200ms real-time latency. Details in 🧵

English

Alexandre Momeni@AlexandreMomeni·16 Oca

Great piece from my partner @AlexaLiautaud. Devs are only ~1% of the workforce, but code runs the economy. This new era of software developer products treat the remaining ~99% as first-class citizens, and it’s going to put consumers back at the center of value creation

Alexa Liautaud@AlexaLiautaud

x.com/i/article/2008…

English

Alexandre Momeni retweetledi

Mistral AI@MistralAI·9 Eyl

We’ve raised €1.7B to accelerate technological progress with AI! This Series C funding round, led by @ASMLcompany, fuels Mistral AI scientific research to keep pushing the frontier of AI to tackle the most critical technological challenges faced by strategic industries.

English

213

419

3.8K

561.2K

Alexandre Momeni retweetledi

Sahaj Garg@SahajGarg6·18 Ağu

Latency is the most underrated product feature. 500ms feels instant. 1s feels broken. 2s and you’ve lost the user completely. At Wispr Flow we’ve had to rethink infra from the ground up just to hit sub-500ms LLM inference worldwide. If you like sweating the milliseconds, we’re hiring ML + infra engineers @WisprFlow 👉 wisprflow.ai/jobs

English

4.5K

Alexandre Momeni retweetledi

Atila@atiorh·22 Ağu

@argmax BTW - the transcription is instant under the hood but Apple is rate limiting the Dynamic Island UI refresh rate (as they should 😇). The actual speed:

argmax@argmax

Introducing Real-time Transcription with Nvidia Parakeet - Same top accuracy as file transcription - Best-in-market 160 ms lips-to-screen latency - 744x more cost-efficient compared to cloud APIs - Available in Argmax Pro SDK starting today! Link in comments

English

410

Keşfet

@gen_intuition @max_rimpel @PimDeWitte @ctnzr @NVIDIAAI @AElkrief @Alfred_Lin @Alibaba_Qwen