sarick

287 posts

sarick

@sarick

cofounder @psdnai, prev lead ai eng @storyprotocol

Palo Alto, CA Katılım Şubat 2025

382 Takip Edilen801 Takipçiler

Sabitlenmiş Tweet

sarick@sarick·22 Tem

I've spent a decade building AI systems in telco, logistics, finance, and healthcare. Each time, the issues trace back to the same problem: data. Training data is the most under-valued, under-coordinated input in the entire AI stack. It's fragmented and challenging to make compliant, and the people who create it often see none of the upside. Here is our take on the current landscape: - Compute is centralized and priced in (see: Nvidia $4T and AMD: $255B). - Models are open-sourcing and the competitive advantage of releasing new architectures is decreasing rapidly (see: OpenAI, Anthropic, xAI: worth $500B+ combined). - The only frontier left unsolved and unpriced? Data. This is validated by Meta's recent investment in Scale AI for $14B, leaving a huge gap for IP-cleared training data. At @storyprotocol, I led research on influence functions, specifically tackling the core problem of data attribution by measuring which datapoints were actually responsible for a model's outputs. It was my first step toward rethinking how we value data. Earlier this year, @SPChinchali and I started sketching a solution. What if contributors got recurring upside? What if every reuse paid forward? What if data worked like IP? That idea turned into @psdnai. Working at @StoryProtocol with @WhatTheLJW, a master of operations and strategy + @storysylee, a visionary leader with true outside-the-box thinking, helped shape this vision. Our initial focus is on physical AI, robotics, and audiovisual information. However, Poseidon is designed to excel in healthcare, biometrics, sensor data, and beyond. Because of the volume of data we are handling for the world's leading AI companies (yes, in the works), Poseidon would not be possible without @StoryProtocol's IP licensing infrastructure where registration is streamlined and royalties and derivatives are automatically tracked. If the data can't be scraped, we're building the stack to coordinate and license it. This mission is personal. It comes from a fundamental tension I've witnessed my entire career, from academic labs to industry. I saw medical AI learn from deeply personal patient data. I built models for telecom, finance, and logistics on the digital footprints and real-world actions of millions. The pattern was always the same: The data was the core asset, but it was never treated or priced as such. This is the market we're going after. More to come.

Poseidon@psdnai

AI is moving beyond the browser and into the real world. The bottleneck? Data. Today we’re announcing a $15M seed round led by @a16zcrypto to build infra that collects, curates, and licenses high-quality data for physical AI. Incubated by and built on @StoryProtocol.

English

135

10K

sarick@sarick·4d

Should have gone with rust 😭

Gergely Orosz@GergelyOrosz

This is either brilliant or scary: Anthropic accidentally leaked the TS source code of Claude Code (which is closed source). Repos sharing the source are taken down with DMCA. BUT this repo rewrote the code using Python, and so it violates no copyright & cannot be taken down!

Sunol, CA 🇺🇸 English

sarick@sarick·5d

what happens when your eng team brags about only using CC and haven't coded in months lol

Alex Volkov@altryne

PSA: If you've been running out of Claude session quotas on Max tier, you're not alone. Read this. Some insane Redditor reverse engineered the Claude binaries with MITM to find 2 bugs that could have caused cache-invalidation. Tokens that aren't cached are 10x-20x more expensive and are killing your quota. If you're using your API keys with Claude this is even worse. This is also likely why this isn't uniform, while over 500 folks replied to me and said "me too", many (including me) didn't see this issue. There are 2 issues that are compounded here (per Redditor, I haven't independently confirmed this) : 1s bug he found is a string replacement bug in bun that invalidates cache. Apparently this has to do with the custom @bunjavascript binary that ships with standalone Claude CLI. The workaround there is to use Claude with `npx @anthropic-ai/claude-code` 2nd bug is worse, he claims that --resume always breaks cache. And there doesn't seem to be a workaround there, except pinning to a very old version (that will miss on tons of features) This bug is also documented on Github and confirmed by other folks. I won't entertain the conspiracy theories there that Anthropic "chooses" to ignore these bugs because it gets them more $$$, they are actively benefiting from everyone hitting as much cached tokens as possible, so this is absolutely a great find and it does align with my thoughts earlier. The very sudden spike in reporting for this, the non-uniform nature (some folks are completely fine, some folks are hitting quotas after saying "hey") definitely points to a bug. cc @trq212 @bcherny @_catwu for visibility in case this helps all of us.

English

sarick@sarick·20 Mar

voice is increasing the surface area of software. coding went from keyboard → editor → compile to speak → intent → execution. everyone is hyper focused on the intent/execution agents, but completely ignoring the ground truth layer. the interface can change overnight, but the capability is hard-capped by the diversity of the audio data underneath it. you can't fake phonetic nuances with synthetic data.

Thariq@trq212

Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!

English

117

sarick@sarick·18 Mar

this fortune piece frames it as a data problem. i'd push further to the fact that it's a coordination problem. the data exists. dashcams, surgical suites, warehouses. the issue is nobody has built the incentive layer to unlock it at scale. that's the actual bottleneck. fortune.com/2026/03/06/ai-…

English

sarick@sarick·17 Mar

psdn.ai/blog/the-compl…

ZXX

sarick@sarick·17 Mar

your eval pipeline might have a major blind spot. when you isolate a speaker's channel in a dual-speaker recording, silence fills the gaps where the other person was talking. that silence is structural, but most ASR scoring metrics treat it as a transcription failure. we found this while building the poseidon score – a 15 point gap between single and dual-speaker scores that our human reviewers couldn't hear. the audio itself was fine. the metric was reacting to format, not quality. best-of-n preprocessing across six strategies closed the gap. full breakdown on this and more in our latest blog ↓

Poseidon@psdnai

Most teams collecting voice data optimize for volume over quality, partly because they’re measuring quality wrong. To help evaluate quality we created the Poseidon Score. When applied, single-speaker audio scored well while multi-speaker conversations scored worse. Why? ↓

English

2.1K

sarick@sarick·17 Mar

re: cnbc.com/2026/02/28/goo…

sarick@sarick·17 Mar

everyone wants to build the “android of robotics”, but the constraint isn’t the sdk. you can't simulate your way out of messy factories, bad lighting, and unpredictable humans. physical ai expands the surface area of software, which means it exposes the limits of synthetic data. the real world is the only training ground that matters, and the race to collect long-tail data is going to be a full contact sport.

English

200

sarick@sarick·11 Mar

dt and elevenlabs putting agents directly into live phone calls is a massive signal. when ai sits mid-conversation to translate or pull data on the fly, every human conversation becomes a programmable surface area. but the final boss here is latency and dialect recognition. the moment this actually works flawlessly for low-resource languages, global transaction costs collapse.

WIRED@WIRED

Deutsche Telekom, the German cell provider—which holds a majority stake in T-Mobile—is partnering with ElevenLabs to enable an AI assistant on all of its network’s calls in Germany. No app required. wired.com/story/deutsche…

English

287

sarick@sarick·11 Mar

theaiinsider.tech/2026/03/02/del…

ZXX

sarick@sarick·11 Mar

digital twins simulate the system. edge models run the operation. robots execute. the industries with the highest physical friction are the ones with the biggest incentives to adopt agentic workflows. once the physical-ai-to-edge loop actually works in warehouses and factories, an entire layer of enterprise ops consulting is just going to evaporate overnight.

English

140

sarick@sarick·10 Mar

cool system but let's be real: voice cloning from 5 seconds of audio is a parlor trick if the underlying model is trained on 99% english. you clone a voice but it still sounds like it's speaking through a wall when you ask it to do anything in a language outside the training distribution. the hard part nobody's solving: building models that actually understand phonetics and prosody for languages that aren't english. that's where the data moat lives.

English

Vaishnavi@_vmlops·9 Mar

This open-source AI can literally clone your voice Just discovered VoxCPM by OpenBMB & it’s wild It’s a tokenizer-free Text-to-Speech model that can generate natural, expressive speech & even clone a voice from a short audio clip What makes it interesting ◾️Context-aware speech – understands the meaning of text & adjusts tone & prosody automatically ◾️Zero-shot voice cloning – replicate someone’s voice from a short reference clip ◾️Real-time streaming TTS – fast enough for real applications ◾️Diffusion + LLM architecture for more realistic audio generation Even crazier: it was trained on 1.8M+ hours of bilingual speech data to improve realism & expressiveness OpenBMB This could power things like: • AI voice assistants • Audiobook generation • Content dubbing • Personalized AI avatars Repo - github.com/OpenBMB/VoxCPM

English

3.9K

sarick@sarick·10 Mar

memory isn't the bottleneck. everyone's obsessed with stateful agents but they're all building on top of models trained 98% on english internet. you can't synthetic your way out of that. try building a voice agent for swahili or marathi and watch it fall apart. the real problem nobody wants to admit: you need actual human data from actual humans speaking those languages. that costs money. labs would rather pretend it doesn't exist.

English

Dhravya Shah@DhravyaShah·1 Mar

I built the best memory for voice agents while sponsoring @YCombinator hackathon. Introducing: Voice AI + Memory! We built a deep @supermemory integration for @pipecat_ai, so now even your agents have great memory With user profiles, these are almost-instant latency as well :)

English

841

82.8K

sarick@sarick·9 Mar

physical AI runs on data. way more data than people realize. everyone talks about models and synthetic generation, but those pipelines still start with real-world signals. if your base dataset is thin, your system breaks the moment it leaves the lab. the real bottleneck isn’t models. it’s high-quality data that actually reflects how the world works. forbes.com/sites/sabbirra…

English

136

sarick@sarick·6 Mar

“He said 64 languages were spoken in his country of 15 million people and the government agency was building AIs for the public that fuse both American and Chinese AI technologies and their own large language datasets.” if countries invest in high-quality, rights-cleared datasets across regional languages, agriculture, and public health, they shape model behavior at the training layer. ai sovereignty doesn’t come from choosing a us or china api. it comes from controlling the data your models are trained on. theguardian.com/politics/2026/…

English

203

sarick@sarick·4 Mar

heard a great breakdown of the voice AI stack at an event with @Fin_ai and @cartesia. voice ai is becoming one of the biggest real deployments in ai. customer service alone is a $300B+ market and still massively under-automated. some takeaways: 1. customer service is the core market. buyers care about resolution rate, handle time, deflection, csat. 2. multilingual matters but it has to work in messy environments. cars, restaurants, call centers, interruptions. 3. voice systems are judged by their worst turn. one bad answer in a long call breaks the whole experience. 4. speech-to-speech is coming, but ~90% of production systems today still run cascade (speech → llm → speech).

English

273

sarick@sarick·3 Mar

Frontier APIs became the fastest source of high-quality training data, so competitors treated them that way. In a world where outputs are data and data is the moat, access control and provenance become core infrastructure. All roads lead to AI’s biggest bottleneck: Data.

Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English

198

sarick@sarick·3 Mar

Big congrats to the Poly AI team. 500M+ calls and enterprise deployments - that calibre of functionality is built on vast swathes of high-quality, domain-specific voice data. When agents can switch languages, verify identities, and handle payments reliably, you’re seeing the compounding effect of serious data infra behind the scenes.

PolyAI@polyaivoice

PolyAI has raised $200M from Nvidia, Khosla Ventures, and multiple top VCs. We're one of the fastest-growing companies in the UK, and we handle 500M+ calls for: • Marriott • PG&E • Gordon Ramsay's restaurants • And 3,000 more real deployments Which means that if you've ever called them, chances are you've talked to our voice agents. Every restaurant we onboard books thousands in revenue within 30 days. But how? Because PolyAI works 24/7, answering every call in <2 seconds, and we also: • switch between 45+ languages • handle payments & cancellations • verify identities • and even upsell your services If you want to try creating an agent with PolyAI, we built Agent Studio Lite to make it easy. Just enter any URL, and in 5 minutes it will analyze your website and build a working agent. We're opening early access to a limited number of people. Comment "PolyAI" and we'll add you to the waitlist and give you 3 months for free!

English

450

sarick@sarick·2 Mar

@GeminiApp the fact that i need a third party chrome extension just to organize my chats is insanity. do better @OfficialLoganK

English

368

Google Gemini@GeminiApp·27 Şub

The latest installment of Gemini Drops is here! Here’s a look at everything we shipped in February ↓

English

164

169

1.4K

4.4M

sarick@sarick·28 Şub

Voice is quickly becoming the default UI in growth markets. @theinformation's AI Agenda calls out the language bottleneck and includes Poseidon’s work building multilingual, production-ready audio datasets. For product teams, this means structured specs, verified contributors, automated QA, and data that can move straight into training and evaluation pipelines. theinformation.com/newsletters/ai…

English

255

Keşfet

@YCombinator @supermemory @pipecat_ai @Fin_ai @cartesia @elonmusk @BarackObama @taylorswift13