Manav

1.7K posts

Manav banner
Manav

Manav

@manavslab

🌟Chief Experiment Officer @manavslab Hitting above the weight First Engineer @smallest_AI

Internet Katılım Eylül 2018
1.2K Takip Edilen206 Takipçiler
Sabitlenmiş Tweet
Manav
Manav@manavslab·
This week I landed in Bangalore after shutting down my company and moving in a new role. The last three years of running Atomalabs and @CrestXRHQ were a wild ride. Raw thoughts 👇
English
1
0
7
897
Manav retweetledi
NASA Earth
NASA Earth@NASAEarth·
That's us! 🌍 The Artemis II crew captured beautiful, high-resolution images of our home planet during their journey to the Moon. As @Astro_Christina put it: "You guys look great."
NASA Earth tweet media
English
2.4K
36.3K
186.8K
4.5M
Manav retweetledi
smallest.ai
smallest.ai@smallest_AI·
Lightning V3, our new SOTA TTS model, just launched on Product Hunt. > 3.33 overall naturalness - beating ElevenLabs (3.2), Cartesia (3.25), OpenAI (3.14) > ~76% win rate vs gpt-4o-mini-tts on naturalness > 3.89 MOS - highest in conversational TTS Play our audio against OpenAI's. Side by side. We'll wait. Oh, and the Voice AI industry's TTS eval process is completely wrong. Check out our product hunt page and our research. Link is in comments.
English
4
16
27
1.7K
Manav retweetledi
Runtime
Runtime@RuntimeBRT·
🚨 Bengaluru and SF-based @smallest_AI has launched their SoTA TTS model, Lightning V3.
English
2
22
123
14.8K
Manav retweetledi
smallest.ai
smallest.ai@smallest_AI·
51% of people have abandoned a business entirely because of how the AI voice sounded. Lightning v3 covers 15 languages, 71% of the global population, and outperforms OpenAI on naturalness 76% of the time. Let that sink in. The entire voice industry has been solving the wrong problem - making voices that read text well instead of voices that can hold a conversation. Those are two completely different things. Reading text is clean. Predictable. Easy to benchmark. Conversation is messy. It has rhythm, hesitation, breath. Your pacing changes when you're thinking. Most TTS models fall apart the moment you put them in a real back-and-forth. They sound great in a scripted demo and robotic on a live call. We built Lightning v3 from scratch for the hard version of this problem. It sounds like it's thinking. It switches between languages mid-sentence the way a real bilingual person does. It clones your voice from a 5-second clip across all 15 languages. Want to try it? Link is in the comments.
English
5
11
32
2.4K
Manav retweetledi
Rishabh
Rishabh@dahalerishabh1·
Just built a murder mystery game using Lightning V3 + Pulse STT. The model nailed the voice acting. Me solving the actual mystery? Not so much 😅 Either way — really enjoyed how it turned out! @smallest_AI @kamath_sutra lets put this on our platform
English
1
5
10
825
Manav
Manav@manavslab·
So much to learn always from the OGs 🔥🔥
Nityanand Mathur@nityanandmathur

Where does pronunciation live in a large language model(LLM) based text-to-speech(TTS) system, and how can we surgically modify it for specific texts while preserving all other model behavior? To answer this very question, we introduced SonoEdit at @CPALconf yesterday. Our core hypothesis is that pronunciation errors aren’t global but they live in localized internal representations. If you find them precisely, you can fix them precisely.

English
0
0
2
67
Manav retweetledi
Nityanand Mathur
Nityanand Mathur@nityanandmathur·
Where does pronunciation live in a large language model(LLM) based text-to-speech(TTS) system, and how can we surgically modify it for specific texts while preserving all other model behavior? To answer this very question, we introduced SonoEdit at @CPALconf yesterday. Our core hypothesis is that pronunciation errors aren’t global but they live in localized internal representations. If you find them precisely, you can fix them precisely.
Nityanand Mathur tweet mediaNityanand Mathur tweet media
English
6
13
59
5.4K
Manav retweetledi
Sudarshan Kamath
Sudarshan Kamath@kamath_sutra·
Introducing Lightning V3 - it beats every model we tested against. ElevenLabs, Cartesia, OpenAI. Lightning sets a new SOTA with V3 in conversational text-to-speech. → Highest MOS score for conversational TTS at 3.9 → ~76% win rate vs gpt-4o-mini-tts on naturalness → 15 languages with mid-sentence code-switching → Built from scratch for voice agents, not read-aloud Every TTS model sounds clean in a demo. You type a sentence and you get beautiful audio. Voice agents don't work that way. They stream. They're generating audio in real-time chunks with half the context missing. That's where everything breaks. A great reading voice and a great conversational voice are fundamentally different things. A conversational voice has to sound like it's thinking - with the pauses, the rhythm shifts, the reactions. It has to handle the way real people actually talk, including switching languages mid-sentence. That's what V3 does. V3.1 also ships voice cloning. 5 to 15 seconds of audio, no fine-tuning, production-grade clone across 15 languages. Blog link in the comments.
English
10
30
154
69.5K
Manav retweetledi
smallest.ai
smallest.ai@smallest_AI·
Introducing Lightning V3 that beats every model we tested against - ElevenLabs, Cartesia, OpenAI - on LLM-as-judge evaluation for naturalness, intonation, and prosody. It’s state of the art in conversational text-to-speech. We’ve achieved the highest MOS score across platforms for conversational TTS at 3.9. ~76% win rate vs gpt-4o-mini-tts. 15 languages with mid-sentence code-switching. Built from scratch for voice agents, not read-aloud. Here's what we shipped, how we moved from a TTS that reads text well to one that can actually hold a conversation, and where evals fall short in measuring the difference. 🧵
Sudarshan Kamath@kamath_sutra

x.com/i/article/2036…

English
1
17
48
9.6K
Manav
Manav@manavslab·
@tanaaym7 @abhitwt POV: you know the codebase well enough that you can do precise prompting
English
0
0
2
15
Abhishek B R
Abhishek B R@abhitwt·
POV: you’re a developer in 2026😂
English
146
375
4.3K
1.8M
Manav retweetledi
smallest.ai
smallest.ai@smallest_AI·
Last weekend was our first conference appearance in SF at the AI+ Renaissance Conference as the Title sponsor. @kamath_sutra took the stage at the Voice AI panel, and we launched Hydra – our Async Thinking Multimodal LLM – live in front of the room. This is the statement we opened with: “we are not close to passing the Turing test in voice. Not even for a single speaker, in a single language, in a single use case. And that's exactly the problem we're here to solve” The gap between AI voice agents and human conversation isn't subtle. Today's agents listen, then think, then respond. Humans do something fundamentally different – they think while listening, act while listening, and respond with contextual emotion. That's not a feature gap. That's an architectural gap. And offline LLMs can't be retrofitted to close it. That's the conviction behind everything we build at smallest.ai. Small, real-time models – built from the ground up for async inference, partial context, and sub-500ms multimodal response – are the path to human-level voice intelligence. Not bigger models. Faster ones. Hydra is our step in that direction: an async thinking Speech-to-Speech model that listens and reasons in parallel, with ~50ms latency. Paired with our Lightning TTS, Lightning ASR, and Electron SLM (which outperforms GPT-4.1 on realtime conversational tasks) – the full stack is finally coming together. A massive thank you to Joshua and @lynn_aisv for building @Aiplus__ into the kind of event where everyone can have meaningful conversations, and learn from those around them. And to @Sky9Capital and @Topify_AI for co-organizing the afterparty with us – 300+ signups speaks for itself. That kind of momentum doesn't happen without people who care about the ecosystem as much as the technology. We're just getting started. The question we left the room with: Attention is all you need -but attention on what?
English
1
8
30
1.9K
Manav
Manav@manavslab·
I mass produced a 3Blue1Brown-style explainer in 30 minutes. The stack: Claude Code + Manim + @smallest_AI voices. Educational content creation is about to get wild. Best time for self learners.
English
3
3
13
371
Manav retweetledi
Hardik Kamboj
Hardik Kamboj@HdKamboj·
Just wanted to put it out there! This video is not about content it's about the experiments you run with your life, and the risks you take that will give you disproportionate returns in life. The way I live my life is by experiments.
English
3
1
11
516