Sohan Show

63 posts

Sohan Show banner
Sohan Show

Sohan Show

@Sohan_Show

Building Gen AI Agents @meta Superintelligence | Ex Lead Dev: Agents @playaiofficial | Eng @TechStars and @ycombinator

USA Katılım Mayıs 2017
59 Takip Edilen98 Takipçiler
Sabitlenmiş Tweet
Sohan Show
Sohan Show@Sohan_Show·
Thank you @stevejang , you guys are awesome. Looking forward to building tech that people love.
steve jang@stevejang

Congratulations to @felfel @HammadH4 @keikumata and the entire @PlayAIOfficial team on their acquisition by @Meta Superintelligence Lab! All of us @kindredventures are thankful and honored to be part of their journey as their lead seed investor last year. I'm stoked for them to get all the GPUs and gigawatts they want now at Meta and continue their incredible speech model and voice agent work with Mark, Alex, Nat, and team. :) More below on their story: kindredventures.com/announcement/p…

English
1
0
5
442
Sohan Show retweetledi
kei
kei@keikumata·
proud to finally share this with the world! ive had so much fun building this with our team from the ground up. for me, pulling up Reels content via voice is quite magical. found so many good places for food on the go. excited to hear how everyone ends up using voice mode!
Meta Newsroom@MetaNewsroom

Today we’re introducing Meta AI Voice Conversations powered by Muse Spark that let you talk naturally to Meta AI (interrupt, switch topics, or swap languages), and as you talk, Meta AI can generate images and pull up recommendations from Reels, maps, and more. We’re also bringing live AI to the app, so you can point your camera at the world and ask about what you’re seeing in real time. about.fb.com/news/2026/04/i…

English
0
4
13
382
Elon Musk
Elon Musk@elonmusk·
Grok Voice is #1!
Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass@1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English
2.4K
5.6K
25.4K
8.4M
Sohan Show
Sohan Show@Sohan_Show·
Tons of agentic capabilities on the horizon. Things that shall blow your mind away. Keep an eye out on @Meta AI
English
0
0
1
37
Meta Newsroom
Meta Newsroom@MetaNewsroom·
Today we’re introducing Meta AI Voice Conversations powered by Muse Spark that let you talk naturally to Meta AI (interrupt, switch topics, or swap languages), and as you talk, Meta AI can generate images and pull up recommendations from Reels, maps, and more. We’re also bringing live AI to the app, so you can point your camera at the world and ask about what you’re seeing in real time. about.fb.com/news/2026/04/i…
English
94
160
1.1K
245.6K
Sohan Show retweetledi
kache
kache@yacineMTB·
you can outsource your thinking but you cannot outsource your understanding
English
256
3.8K
16.7K
2.4M
Bryan Johnson
Bryan Johnson@bryan_johnson·
go to bed right now i know the build is almost finished the eval can wait til morning the agent will still be failing tomorrow you won't figure out why it's hallucinating yes your coworker ships on 4 hrs of sleep they also hallucinate a lot off you go
English
448
344
7.6K
377.4K
Garry Tan
Garry Tan@garrytan·
It’s official, Gemini Live 2.5 voice agent is the best It’s smart, it’s fast, it has large enough context Coming to GBrain Voice shortly
English
106
71
1.9K
119.7K
Sohan Show
Sohan Show@Sohan_Show·
@garrytan Wait until we get to show the Voice AI agents stuff. Its gonna blow people's minds
English
0
0
0
242
Sohan Show retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Alexandr Wang tweet media
English
728
1.2K
10.4K
4.5M
Sohan Show
Sohan Show@Sohan_Show·
@bysardr LMAO. Learn how to not lie first. Don't worry, incompetency shows. Thanks for proving my point.
English
0
0
0
51
sardor rahmatulloev
sardor rahmatulloev@bysardr·
> quit @alexandr_wang’s team at meta and move to sf > join a hacker house @powelldotst > ship multiple products and struggle to get pmf > liquidate all assets to extend runway > win my first ever hackathon at yc > get an interview w/ @aaron_epstein > 11pm call from aaron “do you wanna do yc?” “let’s do yc” excited to be building @twolabsai with danyal
sardor rahmatulloev tweet media
English
26
5
168
11K
Sohan Show
Sohan Show@Sohan_Show·
@bysardr I see. Funny that your Functional team history never shows you worked on any MSL team or made any significant diffs. All I see is RecSys - Training from Aug 25 to Nov 26 haha. All good bro. I get it.
English
1
0
0
74
sardor rahmatulloev
sardor rahmatulloev@bysardr·
@Sohan_Show thanks, team reorged 3-4 times in the past 8 months when i was there. joined msl first, then under aidi pillar, and then quit at the end. Msl was under alexandr wang
English
1
0
1
184
Sohan Show
Sohan Show@Sohan_Show·
@Tech_girlll The best way to learn is take up a ridiculously ambitious project and build it out end to end. It might make you humble but the learning that you get from it is unparalleled.
English
0
0
0
49
Mari
Mari@Tech_girlll·
Why does newbies in tech go into frontend first before backend?
Mari tweet media
English
214
41
1.1K
72.7K
Orman Clark
Orman Clark@ormanclark·
make a designer cry in 3 words
English
474
11
388
150.1K