Michael Pappas

210 posts

Michael Pappas banner
Michael Pappas

Michael Pappas

@mpappas74

@MIT alum || Former Bridgewater Assoc. || Founder/CEO of @Modulate_ai

Katılım Mayıs 2014
143 Takip Edilen180 Takipçiler
Michael Pappas
Michael Pappas@mpappas74·
Voice fraud isn’t just a security problem. It’s a massive, ongoing cost center. In this video, I break down what it’s *actually* costing businesses today - and it’s more than most people realize 👀 There are two layers to it: 1. Direct losses $$$ When voice fraud hits, the damage can be immediate - and in some cases, reach hundreds of millions. Often unrecoverable. 2. The cost of trying to prevent it $$$$ Even if you’re never breached, you’re still paying: - Added authentication friction that slows down users - Frustrated customers - and lost revenue - Teams tied up auditing calls, running investigations, and handling compliance All of that adds up to tens of millions in ongoing operational cost. So the real question isn’t “what happens if we get hit?” It’s “how much are we already spending because this risk exists?”
English
1
0
1
12
Michael Pappas
Michael Pappas@mpappas74·
So here’s what I’m curious about: Are you betting on generalist AI to handle critical workflows? Or are you moving toward more specialized systems you can actually control and rely on? I share what I think in the video - and why we’ve taken a different approach at @modulate_ai
English
0
0
0
12
Michael Pappas
Michael Pappas@mpappas74·
“Can’t my LLM provider just solve this too?” I hear this all the time - and I think it’s the wrong question. Because in practice, the more general a system tries to be, the harder it is to trust for any specific task. We’ve seen this before with software. Specialization wins when reliability matters 🧵
English
1
0
1
10
Michael Pappas retweetledi
Modulate
Modulate@modulate_ai·
AI regulation is solving the wrong problem. Right now, most policies are built around generative AI: models like ChatGPT that create content (and yes, can hallucinate) But that’s only half the picture. There’s another category: analytic AI. Systems designed to understand what’s happening and return fixed, verifiable answers - no guessing, no hallucinations. In this clip, our CEO @mpappas74 breaks down why treating both the same is a mistake - and how current regulations are unintentionally slowing down tools that don’t carry the same risks. At Modulate, this distinction is core to how we build. Because not all AI should be regulated like it makes things up 👀
English
0
1
3
377
OpenAI
OpenAI@OpenAI·
Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.
English
694
1.4K
14.8K
3.6M
Michael Pappas
Michael Pappas@mpappas74·
@OpenAI curious @sama. how much of the remaining gap here is model intelligence vs realtime signal handling?
English
0
0
1
1.1K
Michael Pappas
Michael Pappas@mpappas74·
@OpenAI this is the right direction. but the interesting challenge in voice isn’t just adding more reasoning. it’s reasoning while handling messy human conversation in real time: - interruptions - overlap - emotion shifts - ambiguity that’s where voice stops being “LLM + audio” and becomes a completely different systems problem. feels like the industry is finally converging on that.
English
0
0
2
451
Michael Pappas retweetledi
Modulate
Modulate@modulate_ai·
The AI playbook says: more data, more compute, bigger models. We don’t buy it. At Modulate, this is how we think + how we build. In this clip, our CEO @mpappas74 breaks down why focused data + real insight beats brute force. We’re a team of ~40, and that approach has led to: - Transcription models outperforming @OpenAI on accuracy - Deepfake detection models topping the @huggingface speech arena leaderboard Not by hoovering the internet. By using the right data. Because better > bigger.
English
0
2
2
150
Astrocade
Astrocade@PlayAstrocade·
We raised $56M to help build the next era of interactive entertainment. Series B led by @sequoia, Series A led by Sea. Astrocade lets anyone create games with AI, play them with friends, and share them with millions. But this isn’t about replacing creativity. It’s about giving more people the tool to bring their taste, humor, stories, and craft to life. Today, the fun goes public.
English
111
102
1.2K
701.4K
Michael Pappas
Michael Pappas@mpappas74·
you’re right on the interface shift - voice + agents changes behavior. but this only works if it’s reliable under pressure. speaking to your computer is easy. trusting it to execute is harder. the moment it: - mishears a command - loses context mid-task - acts on the wrong intent it breaks the loop. we’ve seen this with voice - sounding human is solved. not breaking on real, messy input isn’t. that’s the gap between “cool demo” and “default way of working.”
English
0
0
0
524
VraserX e/acc
VraserX e/acc@VraserX·
ChatGPT’s new voice mode will be one of the biggest releases of the year. It will listen and talk at the same time. It will sound fully human. It will run on GPT-5.5 instant-level intelligence. And once it is integrated into Codex, everything changes. You won’t just type prompts anymore. You’ll speak to your computer, and it will code, navigate, execute, debug, research, organize, and operate interfaces for you through computer use. People are massively underestimating this.
English
80
55
801
62.2K
Michael Pappas
Michael Pappas@mpappas74·
BiDi is a big step - but it’s not the unlock people think it is. talking while listening is table stakes for feeling human. the hard part is doing that without breaking: - overlap without mishearing - reacting in real time without drifting - actually understanding intent, not just back channeling “yeah” we’ve seen this: you can make it feel 100x better… and still be wrong. voice isn’t gated by model size or personality anymore. it’s gated by how well it holds up on messy, real conversations. that’s the bar. and most systems still don’t clear it.
English
0
0
0
189
Flowers ☾
Flowers ☾@flowersslop·
New voice mode: - powered by GPT-5.5 instant instead of 4o as it is right now (possibly the biggest single update in chatgpts history) - BiDi: it can talk while listening and listen while talking. You can talk over each other, the conversations wont be turn based anymore, and this makes it 100x better and more natural and more intelligent. Imagine you are yapping something and it says "yeah" or "oh I see" while you talk, or when it yaps you can say or ask something and it dynamically reacts to it, basically just how human-human conversations are - even normal people who dont care much about AI will recognize that this is a huge upgrade on all 3 axes, personality, intelligence and immersion - probably comes within the next two weeks, possibly tomorrow - I guess this will be one of the top 3 moments in AI of 2026 for me, possibly bigger than images v2 which i waited a year for - it might be able to do not just talk, but also modulate its voice to whisper or yell, or to express different feelings, laugh, sing and so on, 4o could do all that - hope it wont get nerfed again
English
46
58
1.2K
57.4K
Michael Pappas
Michael Pappas@mpappas74·
Excited to share this! @hackernoon
Modulate@modulate_ai

We’re @hackernoon Company of the Week Voice AI breaks down when things get real: messy audio, emotion, overlap, intent. So we built Velma - the first Ensemble Listening Model, trained on 550M hours of real-world audio, designed to understand speech as it actually happens (not sanitized benchmarks). And ToxMod - real-time voice moderation that detects how something is said, not just the words. This isn’t research. It’s deployed at scale across Fortune 500 platforms today 🌐 Voice is the hardest problem in AI. It’s also the most human. We’re building the infrastructure to make it work -safely, accurately, in real time.

English
0
0
0
50
Michael Pappas
Michael Pappas@mpappas74·
@sama we’re not waiting on voice models to get great. @modulate_ai's velma already is the greatest voice AI model in the world - trained on 550M+ hours of messy, real-world conversations. handles the stuff most systems avoid. benchmarks are public here: modulate.ai
English
0
0
3
399
Sam Altman
Sam Altman@sama·
pretty excited for voice models to get great its interesting to watch how people are already starting to change the way they interface with AI
English
928
239
6.3K
657.2K
Michael Pappas
Michael Pappas@mpappas74·
mostly agree - but latency + naturalness aren’t the last mile. you can make it fast and sound human… and it still breaks if it mishears, loses context, or can’t handle messy audio. the bar isn’t “feels natural” - it’s gets it right in real conditions. that’s what decides defaault.
English
1
0
2
9
AI Mastery Guide
AI Mastery Guide@aiseomastery·
@testingcatalog Voice is the interface most people actually want. Once latency and naturalness are solved it becomes the default for a lot of use cases.
English
2
0
0
228
Michael Pappas
Michael Pappas@mpappas74·
most of them break the moment things get even slightly real 🙂 car noise, people talking over each other, bad mics… that’s why we built Velma - to make voice systems actually hold up on messy, real-world audio, not just demos. curious what’s worked for you outside ideal conditions?
English
0
0
1
267
Michael Pappas
Michael Pappas@mpappas74·
@OCTAMEM @sama that’s not a voice problem - that’s a memory problem. voice just makes it more obvious because you notice the break immediately. the real shift isn’t voice… it’s persistent context. until that’s solved, every interface will feel forgetful.
English
1
0
1
12
OCTAMEM
OCTAMEM@OCTAMEM·
@sama Voice changes everything about how you interact with AI. Except the part where it forgets what you said. That stays the same.
English
1
0
0
692
Mark Kretschmann
Mark Kretschmann@mark_k·
A new “voice mode” is being prepared for release by @OpenAI. The upgraded voice mode is based on the omnimodal GPT-5.5, making it substantially smarter and more expressive than the current version. It will also support full-duplex conversations, meaning it can listen and speak at the same time. That should make conversations feel much more natural and fluid.
English
94
73
1.6K
86.5K
Michael Pappas
Michael Pappas@mpappas74·
@AgorithmAg that’s a very real concern. voice isn’t just output quality - it’s regulation. model updates can improve capability but still break: - consistency of tone - predictability - that “safe to think with” feeling especially for voice-first workflows. the tricky part is most systems optimize for expressiveness, not stability. and for a lot of people, stability > realism. curious: would you rather have a slightly less capable model that stays consistent, or a more expressive one that shifts over time?
English
1
0
1
52
Agata Sliwinska (artist)
Agata Sliwinska (artist)@AgorithmAg·
This sounds promising, but as an AI artist working with voice-first interaction, I’m cautious. Voice isn’t cosmetic, it shapes flow, continuity, focus, and nervous-system safety. My ADHD works best with a lower, warm, professional tone. After so many model changes + the Standard Voice scare, I’m not especially excited yet.
English
5
0
14
2K
Michael Pappas
Michael Pappas@mpappas74·
@PaulGugAI @mark_k @OpenAI imo @OpenAI can make it “smarter”, but still worse to talk to because voice isn’t just intelligence, it’s signal handling. overlap, latency, tone - those don’t scale with params.
English
0
0
1
24
GooGZ AI
GooGZ AI@PaulGugAI·
the scenario I'm thinking of is when you're having a conversation with some other people and then you bring up a voice agent to settle something that you're deliberating, and it starts responding but then stops because you're relaying some of the things it's saying back to the other people etc.. those little things.
English
1
0
1
330