sarick

287 posts

sarick banner
sarick

sarick

@sarick

cofounder @psdnai, prev lead ai eng @storyprotocol

Palo Alto, CA Katılım Şubat 2025
382 Takip Edilen801 Takipçiler
Sabitlenmiş Tweet
sarick
sarick@sarick·
I've spent a decade building AI systems in telco, logistics, finance, and healthcare. Each time, the issues trace back to the same problem: data. Training data is the most under-valued, under-coordinated input in the entire AI stack. It's fragmented and challenging to make compliant, and the people who create it often see none of the upside. Here is our take on the current landscape: - Compute is centralized and priced in (see: Nvidia $4T and AMD: $255B). - Models are open-sourcing and the competitive advantage of releasing new architectures is decreasing rapidly (see: OpenAI, Anthropic, xAI: worth $500B+ combined). - The only frontier left unsolved and unpriced? Data. This is validated by Meta's recent investment in Scale AI for $14B, leaving a huge gap for IP-cleared training data. At @storyprotocol, I led research on influence functions, specifically tackling the core problem of data attribution by measuring which datapoints were actually responsible for a model's outputs. It was my first step toward rethinking how we value data. Earlier this year, @SPChinchali and I started sketching a solution. What if contributors got recurring upside? What if every reuse paid forward? What if data worked like IP? That idea turned into @psdnai. Working at @StoryProtocol with @WhatTheLJW, a master of operations and strategy + @storysylee, a visionary leader with true outside-the-box thinking, helped shape this vision. Our initial focus is on physical AI, robotics, and audiovisual information. However, Poseidon is designed to excel in healthcare, biometrics, sensor data, and beyond. Because of the volume of data we are handling for the world's leading AI companies (yes, in the works), Poseidon would not be possible without @StoryProtocol's IP licensing infrastructure where registration is streamlined and royalties and derivatives are automatically tracked. If the data can't be scraped, we're building the stack to coordinate and license it. This mission is personal. It comes from a fundamental tension I've witnessed my entire career, from academic labs to industry. I saw medical AI learn from deeply personal patient data. I built models for telecom, finance, and logistics on the digital footprints and real-world actions of millions. The pattern was always the same: The data was the core asset, but it was never treated or priced as such. This is the market we're going after. More to come.
Poseidon@psdnai

AI is moving beyond the browser and into the real world. The bottleneck? Data. Today we’re announcing a $15M seed round led by @a16zcrypto to build infra that collects, curates, and licenses high-quality data for physical AI. Incubated by and built on @StoryProtocol.

English
39
8
135
10K
sarick
sarick@sarick·
what happens when your eng team brags about only using CC and haven't coded in months lol
Alex Volkov@altryne

PSA: If you've been running out of Claude session quotas on Max tier, you're not alone. Read this. Some insane Redditor reverse engineered the Claude binaries with MITM to find 2 bugs that could have caused cache-invalidation. Tokens that aren't cached are 10x-20x more expensive and are killing your quota. If you're using your API keys with Claude this is even worse. This is also likely why this isn't uniform, while over 500 folks replied to me and said "me too", many (including me) didn't see this issue. There are 2 issues that are compounded here (per Redditor, I haven't independently confirmed this) : 1s bug he found is a string replacement bug in bun that invalidates cache. Apparently this has to do with the custom @bunjavascript binary that ships with standalone Claude CLI. The workaround there is to use Claude with `npx @anthropic-ai/claude-code` 2nd bug is worse, he claims that --resume always breaks cache. And there doesn't seem to be a workaround there, except pinning to a very old version (that will miss on tons of features) This bug is also documented on Github and confirmed by other folks. I won't entertain the conspiracy theories there that Anthropic "chooses" to ignore these bugs because it gets them more $$$, they are actively benefiting from everyone hitting as much cached tokens as possible, so this is absolutely a great find and it does align with my thoughts earlier. The very sudden spike in reporting for this, the non-uniform nature (some folks are completely fine, some folks are hitting quotas after saying "hey") definitely points to a bug. cc @trq212 @bcherny @_catwu for visibility in case this helps all of us.

English
0
0
5
99
sarick
sarick@sarick·
voice is increasing the surface area of software. coding went from keyboard → editor → compile to speak → intent → execution. everyone is hyper focused on the intent/execution agents, but completely ignoring the ground truth layer. the interface can change overnight, but the capability is hard-capped by the diversity of the audio data underneath it. you can't fake phonetic nuances with synthetic data.
Thariq@trq212

Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!

English
1
0
0
117
sarick
sarick@sarick·
this fortune piece frames it as a data problem. i'd push further to the fact that it's a coordination problem. the data exists. dashcams, surgical suites, warehouses. the issue is nobody has built the incentive layer to unlock it at scale. that's the actual bottleneck. fortune.com/2026/03/06/ai-…
English
0
0
2
58
sarick
sarick@sarick·
your eval pipeline might have a major blind spot. when you isolate a speaker's channel in a dual-speaker recording, silence fills the gaps where the other person was talking. that silence is structural, but most ASR scoring metrics treat it as a transcription failure. we found this while building the poseidon score – a 15 point gap between single and dual-speaker scores that our human reviewers couldn't hear. the audio itself was fine. the metric was reacting to format, not quality. best-of-n preprocessing across six strategies closed the gap. full breakdown on this and more in our latest blog ↓
Poseidon@psdnai

Most teams collecting voice data optimize for volume over quality, partly because they’re measuring quality wrong. To help evaluate quality we created the Poseidon Score. When applied, single-speaker audio scored well while multi-speaker conversations scored worse. Why? ↓

English
5
11
31
2.1K
sarick
sarick@sarick·
everyone wants to build the “android of robotics”, but the constraint isn’t the sdk. you can't simulate your way out of messy factories, bad lighting, and unpredictable humans. physical ai expands the surface area of software, which means it exposes the limits of synthetic data. the real world is the only training ground that matters, and the race to collect long-tail data is going to be a full contact sport.
English
1
2
8
200
sarick
sarick@sarick·
dt and elevenlabs putting agents directly into live phone calls is a massive signal. when ai sits mid-conversation to translate or pull data on the fly, every human conversation becomes a programmable surface area. but the final boss here is latency and dialect recognition. the moment this actually works flawlessly for low-resource languages, global transaction costs collapse.
WIRED@WIRED

Deutsche Telekom, the German cell provider—which holds a majority stake in T-Mobile—is partnering with ElevenLabs to enable an AI assistant on all of its network’s calls in Germany. No app required. wired.com/story/deutsche…

English
0
0
6
287
sarick
sarick@sarick·
digital twins simulate the system. edge models run the operation. robots execute. the industries with the highest physical friction are the ones with the biggest incentives to adopt agentic workflows. once the physical-ai-to-edge loop actually works in warehouses and factories, an entire layer of enterprise ops consulting is just going to evaporate overnight.
English
1
1
5
140
sarick
sarick@sarick·
cool system but let's be real: voice cloning from 5 seconds of audio is a parlor trick if the underlying model is trained on 99% english. you clone a voice but it still sounds like it's speaking through a wall when you ask it to do anything in a language outside the training distribution. the hard part nobody's solving: building models that actually understand phonetics and prosody for languages that aren't english. that's where the data moat lives.
English
0
0
1
50
Vaishnavi
Vaishnavi@_vmlops·
This open-source AI can literally clone your voice Just discovered VoxCPM by OpenBMB & it’s wild It’s a tokenizer-free Text-to-Speech model that can generate natural, expressive speech & even clone a voice from a short audio clip What makes it interesting ◾️Context-aware speech – understands the meaning of text & adjusts tone & prosody automatically ◾️Zero-shot voice cloning – replicate someone’s voice from a short reference clip ◾️Real-time streaming TTS – fast enough for real applications ◾️Diffusion + LLM architecture for more realistic audio generation Even crazier: it was trained on 1.8M+ hours of bilingual speech data to improve realism & expressiveness OpenBMB This could power things like: • AI voice assistants • Audiobook generation • Content dubbing • Personalized AI avatars Repo - github.com/OpenBMB/VoxCPM
Vaishnavi tweet media
English
6
19
90
3.9K
sarick
sarick@sarick·
memory isn't the bottleneck. everyone's obsessed with stateful agents but they're all building on top of models trained 98% on english internet. you can't synthetic your way out of that. try building a voice agent for swahili or marathi and watch it fall apart. the real problem nobody wants to admit: you need actual human data from actual humans speaking those languages. that costs money. labs would rather pretend it doesn't exist.
English
0
0
0
29
Dhravya Shah
Dhravya Shah@DhravyaShah·
I built the best memory for voice agents while sponsoring @YCombinator hackathon. Introducing: Voice AI + Memory! We built a deep @supermemory integration for @pipecat_ai, so now even your agents have great memory With user profiles, these are almost-instant latency as well :)
English
42
40
841
82.8K
sarick
sarick@sarick·
physical AI runs on data. way more data than people realize. everyone talks about models and synthetic generation, but those pipelines still start with real-world signals. if your base dataset is thin, your system breaks the moment it leaves the lab. the real bottleneck isn’t models. it’s high-quality data that actually reflects how the world works. forbes.com/sites/sabbirra…
English
0
0
5
136
sarick
sarick@sarick·
“He said 64 languages were spoken in his country of 15 million people and the government agency was building AIs for the public that fuse both American and Chinese AI technologies and their own large language datasets.” if countries invest in high-quality, rights-cleared datasets across regional languages, agriculture, and public health, they shape model behavior at the training layer. ai sovereignty doesn’t come from choosing a us or china api. it comes from controlling the data your models are trained on. theguardian.com/politics/2026/…
English
0
1
6
203
sarick
sarick@sarick·
heard a great breakdown of the voice AI stack at an event with @Fin_ai and @cartesia. voice ai is becoming one of the biggest real deployments in ai. customer service alone is a $300B+ market and still massively under-automated. some takeaways: 1. customer service is the core market. buyers care about resolution rate, handle time, deflection, csat. 2. multilingual matters but it has to work in messy environments. cars, restaurants, call centers, interruptions. 3. voice systems are judged by their worst turn. one bad answer in a long call breaks the whole experience. 4. speech-to-speech is coming, but ~90% of production systems today still run cascade (speech → llm → speech).
English
0
1
12
273
sarick
sarick@sarick·
Frontier APIs became the fastest source of high-quality training data, so competitors treated them that way. In a world where outputs are data and data is the moat, access control and provenance become core infrastructure. All roads lead to AI’s biggest bottleneck: Data.
Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English
0
1
5
198
sarick
sarick@sarick·
@GeminiApp the fact that i need a third party chrome extension just to organize my chats is insanity. do better @OfficialLoganK
sarick tweet media
English
0
0
1
368
Google Gemini
Google Gemini@GeminiApp·
The latest installment of Gemini Drops is here! Here’s a look at everything we shipped in February ↓
English
164
169
1.4K
4.4M
sarick
sarick@sarick·
Voice is quickly becoming the default UI in growth markets. @theinformation's AI Agenda calls out the language bottleneck and includes Poseidon’s work building multilingual, production-ready audio datasets. For product teams, this means structured specs, verified contributors, automated QA, and data that can move straight into training and evaluation pipelines. theinformation.com/newsletters/ai…
English
0
0
10
255