Pratik Karki

1.5K posts

Pratik Karki banner
Pratik Karki

Pratik Karki

@ai_evals

🇳🇵🇺🇸 co-founder https://t.co/Vz65cBoJXl | ex @google AI | building next-gen scalable oversight for AI systems | https://t.co/2EFb7GMpjC

San Francisco Beigetreten Mayıs 2025
720 Folgt493 Follower
Angehefteter Tweet
Pratik Karki
Pratik Karki@ai_evals·
Lot of new faces here this week, so a quick recap of who I am and what we actually do. I'm Pratik. We build human data infrastructure for frontier AI labs at Anthromind, the training and eval data the best models in the world are made of. The most surprising thing I've learned doing this is that the labs can't get enough good data, no matter what they're willing to pay. @seanZCai (the data guy) puts it plainly - he has never once watched a lab turn down genuinely good data over budget. The money is sitting in a pile, waiting. Budget was never the constraint. So why can't the best funded companies on earth get the data they actually need? Because great data was never a scale problem. They have the capital, the researchers, the compute. None of it produces a great dataset on its own. Quality comes from somewhere money can't reach. Iteration, feedback over months, and an almost unreasonable amount of rigor on every single item. Especially on the hard sets, where there's no answer key and the model is already failing most of them. And rigor is the first thing scale usually kills. A frontier lab is a machine built to scale, so the one input that makes data great is the exact thing the machine can't produce. The biggest data vendors hit the same wall from the other side. They grew into marketplaces of a million contractors, and the founder-level care behind their early work didn't survive the headcount. We're building Anthromind to scale the thing everyone else gives up. The rigor itself. Right now that still means us, the founders, are in the data, hands on every hard call. I can't tell you how many passes one item takes before it ships, because at some point we stopped counting. The whole bet is that this obsession lives in the infrastructure we're building, so it gets sharper as we grow instead of breaking. Everyone else is racing to spend more and label faster. We're building the thing that makes great data repeatable. That gap in the market is wide open right now.
Pratik Karki tweet media
English
2
0
5
546
Shrushti Raut
Shrushti Raut@codewithsushi·
Fitbit or whoop? Which one is better?
English
17
0
5
2.3K
Pratik Karki
Pratik Karki@ai_evals·
@Devi__Devs Still govt regulations historically tend to harm technological progress. Several economic models show that govt intervention is to the detriment of net social good.
English
0
0
0
1
Devi Devs
Devi Devs@Devi__Devs·
@ai_evals That delay is the DMA and data rules, not the AI Act. Apple held live translation in the EU over interoperability and on device data questions, not a high risk classification. Worth separating the two, since conflating them makes the AI Act sound broader than it actually is.
English
1
0
0
13
Pratik Karki
Pratik Karki@ai_evals·
Currently in Europe and learning about the fact that you cannot have AirPods that have live translation available because of the EU "anti-AI act". All of these bills and laws to halt progress are the most ridiculous things I've ever heard about. I pray that the U.S. does not go down the same path.
Garry Tan@garrytan

Bernie Sanders introduced a bill to seize 50% of any AI startup that crosses $200M in revenue. The same anti-prosperity bloc spent the year trying to ban startup acquisitions, blocking the only exit 85% of founders ever get. This is a war on building startups in America.

English
1
0
0
67
Pratik Karki retweetet
Midjourney
Midjourney@midjourney·
We're gonna do a Midjourney Medical AMA (ask me anything ) right here all afternoon. Post your questions below and we'll try to answer as many as we can! ❤️
English
833
169
4.2K
271.6K
Nurbol Sakenov
Nurbol Sakenov@nurbol_sakenov·
@ai_evals Enjoy! Check out lake Bled if you’re staying there longer
English
1
0
1
15
Pratik Karki
Pratik Karki@ai_evals·
First vacation in forever! The last international trip I took was in London 2 years for an AI conference. Feels good to finally unwind. P.S. Ljubljana Slovenia is great. Reminds me of Kathmandu with the surrounding mountains. They should be sister cities!
Pratik Karki tweet media
English
1
0
3
59
Pratik Karki
Pratik Karki@ai_evals·
My first Midjourney scan is here! Guess I really got the dog in me
Pratik Karki tweet media
English
0
0
0
23
Pratik Karki
Pratik Karki@ai_evals·
@nickfloats I remember working with the Midjourney team whilst I was at Google. They are 1 of 1. Crazy launch!
English
0
0
0
202
Nick St. Pierre
Nick St. Pierre@nickfloats·
What Midjourney is: - No investors, fully community-funded research lab - Revenue from image generation product funds all R&D - ~$100M in first 9 months, $200M by month 12, still growing - 8 active projects: 4 hardware, 4 software ----2 hardware products coming to market soon (consumer-purchasable) ----2 are large-scale machines DAVID HOLZ: Background and Philosophy - Grew up in Florida, parents in medicine, dad had a dental office on a sailboat - Physics and math background: drawn to the tension between predicting reality vs. absolute truth - Core thesis: the interaction between humans and technology is the biggest limitation, not compute power - Founded Leap Motion at 22: $10M in pre-orders in 48 hours from a website (not Kickstarter) - Built hand-tracking VR: 600M-parameter mixture-of-experts model, 2015, CPU cluster, pre-TensorFlow - Also shipped Northstar, an open-source AR headset - Left Leap Motion wanting a “home,” not a 100x return - Mentor Bill Warner told him he could bootstrap; he listened this time - Started Midjourney with ~$200K, called Google for 10,000 GPUs on trust alone THE SCANNER: Full Body Ultrasonic CT - First new whole-body medical imaging modality in ~50 years - Concept: “as powerful as an MRI, as casual as a trip to the spa” - No radiation, no magnets, no x-rays; safe for unlimited scans How it works: - 40 rings, each with 8,960 transducers (200 microns wide), totaling 358,000 elements - Fires ultrasonic waves at 100M times/second; sound travels through water at 1,481 m/s - Sensors resolve motion down to picometers (sub-atomic range) - Captures 17 GB/second of raw data; 806 TB per full scan reconstruction - 21 on-site servers, 2 petaflops of compute - Patient lowers into water at 4 cm/second; ~60 seconds for several hundred body slices - Produces sub-millimeter 3D maps of internal tissue Already outperforming MRI in some tissue boundary and muscle fiber detail on DAY ONE. 10x cheaper and 60x faster than MRI machines; scan cost effectively near zero. Gen 2 scanner planned by end of 2026; Gen 3 will use custom silicon. SCANNER vs. MRI: Key Differences - MRI: 60-minute tube, loud, requires sedation for children, expensive, radiation-adjacent. - This scanner: water immersion, 30-60 seconds, no sedation, no radiation, repeatable daily. - Current limitation: not yet FDA-cleared beyond body composition; no AI layer yet applied. - Already better than MRI in certain muscle/fiber/vein boundary resolution at day one THE MIDJOURNEY SPA: - First location: Union Square, San Francisco - 25,000 sq ft, 4 floors - Amenities: hot tubs, saunas, cold plunges, European spa features, gym - 9-10 full scanners on-site - Goal: open by end of 2027 Target: - 50,000 scanners globally, capable of 1 billion scans/month - 5,000 spa locations needed at ~10 scanners each - Estimated $20B capex to scale; Midjourney self-funds the first location - Payback period modeled at ~6 months per location ROADMAP & REGULATORY PATH: - FDA discussions already started; body composition on a clear path - Ascending approval ladder planned:Body composition (near-term, easy) - Sharing data with physicians - Doppler / blood flow imaging - Pregnancy / fetal imaging (ultrasound already approved; this is a natural extension) - Therapeutic applications (tendon/muscle healing, eventually incisionless surgery) AI not yet applied to imaging; planned as a layer once data volume grows LONG-TERM VISION: - Flag anomalies automatically, substitute some blood tests, enable daily health tracking PRICING: - No firm numbers yet; likely spa memberships plus walk-in and scan-only tiers; cost of scan itself is near zero - Data analysis: day one is body composition only; physician sharing gated on FDA progression - Form factors: current design is throughput-optimized (up/down elevator); bathtub and gym-sized variants possible later - Blood test substitution: sub-millimeter daily differentials with AI may eventually replace some tests; acknowledged as frontier science - Cancer destruction via focused ultrasound: technically possible, not on near-term roadmap NEXT STEPS: - Sign up for Midjourney Medical email list for research trial scan invitations - Visit midjourney.com/medical for jobs and updates (page now live) Gen 2 scanner presentation planned before end of 2026 More secret projects to be announced soon.
Nick St. Pierre tweet media
English
93
394
3.6K
245.5K
Pratik Karki
Pratik Karki@ai_evals·
It's so unbelievable that we're alive during the same timeline in which the greatest book series EVER, the lord of the rings, also managed to get the greatest film adaption ever as well. I love watching the behind the scenes from the box set occasionally to lift up my spirits.
English
0
0
0
7
Pratik Karki
Pratik Karki@ai_evals·
Yann LeCun (Meta’s fmr Chief AI Scientist and Turing Award winner) just called BS on the entire AGI dream. He and his co-authors argue that our obsession with Artificial General Intelligence rests on a myth: the idea that humans possess general intelligence. We don’t. Humans are highly specialized biological machines, shaped by evolution for a narrow slice of survival tasks. We only imagine our intelligence is general because we’re blind to everything we’re terrible at. Magnus Carlsen isn’t objectively great at chess. He’s just the best of a species that’s fundamentally bad at it. So, instead of chasing human-like generality, the field should pursue Superhuman Adaptable Intelligence (SAI): systems that adapt faster than any human, master economically important tasks at superhuman levels, and fill the enormous gaps where humans simply cannot perform. This hits foundation model labs especially hard. You can ship models that look flawless on benchmarks and still watch them fall apart in production. The missing piece is rarely more scale. It’s the specialized data and rigorous evaluation needed for the tasks that actually matter. That is the exact problem we solve at Anthromind. If you’re a foundation model lab and not a customer yet, let’s chat. We work with PhDs and domain specialists at the top of their fields to create the high-quality training and evaluation data that turns promising models into reliable, production-grade systems. Link to the paper in the comments. It’s one of the more important ones I’ve read this year!
Pratik Karki tweet media
English
1
1
4
331
Tanay Kothari
Tanay Kothari@tankots·
Calling all haters of @WisprFlow - give me your biggest issue with Wispr. Yes I will personally read through each and every comment and have our team right some wrongs.
English
815
19
920
264.7K
Pratik Karki
Pratik Karki@ai_evals·
Anything interesting happen during the 13.5 hours I was in the air? Seems like a whole country might go under due to a new report.
English
0
0
0
49
Pratik Karki
Pratik Karki@ai_evals·
First international trip in forever. Also thank you for live inflight sports TV !
Pratik Karki tweet media
English
0
0
1
25
Pratik Karki
Pratik Karki@ai_evals·
We create the data your models train and get evaluated on, built by PhDs and specialists at the top of their field, so your team doesn't have to. Start here: anthromind.com/sign-up
English
0
0
0
43
Pratik Karki
Pratik Karki@ai_evals·
A customer team we work with used AI to generate data another team is already closing deals on. Nobody checked if it was right. This is one of the most common things I'm seeing right now, and almost nobody is talking about it. A team gets a deadline, stands up an AI system to generate a few hundred records, and it works beautifully. Ships on time, fills every field, looks completely done. But here's the part that quietly turns into a problem. The team that generated the data owns it, and a completely different team downstream is already making real decisions on it. Only one of those two teams knows it came out of a rushed AI build, and it's not the one closing the deals. This is the most underrated risk in the whole AI rollout conversation. Because generating the data and being able to stand behind the data are two completely different jobs. AI is incredible at the first one. Under a deadline, almost nobody does the second one. So that second job is what we came in to do. And the first thing we told them is, "you can't validate all of it with one process, because the data doesn't have one single kind of truth." Here's the types of data we validated: 1/ Some of it has a hard, objective answer sitting right there in the source. You read it straight off and verify it automatically, instantly, at full scale. 2/ Some has a knowable answer you can look up. Names, dates, identifiers. You cross-reference it automatically against a trusted source of record. 3/ And then there's the most DANGEROUS bucket of all. The summaries, the classifications, the judgment and risk calls. There's no single right answer to check against here, only judgment, and this is the one place a real human expert actually belongs. That one split is the whole solution. Everything with a knowable answer gets checked automatically, across every record, in seconds. Your expert humans only ever touch the judgment calls, which is exactly where they're worth it. Run it that way and the whole thing flips. The data stops being a silent liability in someone else's pipeline, the team that owns it can finally stand behind every record, and the team downstream can close deals on it without ever wondering if it's right. That's what trustworthy AI at scale actually looks like. Just the perfect amount of human oversight makes all the difference.
Pratik Karki tweet media
English
1
0
0
67
Pratik Karki
Pratik Karki@ai_evals·
Grok is pretty f'ing fantastic. Lightning quick and does research on the fly. @nikitabier only recommendation is the context memory kinda sucks. I know a great foundation model company for long-term AI memory you should chat with. DM me if interested.
English
0
0
0
30
Pratik Karki
Pratik Karki@ai_evals·
@pchamal Good faith arguments do not register for people who are so idealogically captured! I'm happy to change my position when presented with a compelling case, but looking at the comments there are none.
English
1
0
1
178
Pukar C. Hamal 🏔🗽 🌁
This post made a lot of people who do not use their brain to think critically very very angry. Epithets, F bombs, Strawmanning, Bait and Switch — a kitchen sink of woefully weak virtue signaling parrots repeating words they have heard but have never applied reasoning tokens towards But alas not a single person could respond credibility to my argument: Do not work for a company if you cannot sit there and listen to the CEO of that company saying kind, supportive, well wishes to you during a graduation ceremony. It is a signal of moral bankruptcy, hypocrisy, and even theft of the opportunity from someone else who does not pretend to be a virtue signaling hack.
Pukar C. Hamal 🏔🗽 🌁@pchamal

If you are a @Stanford student who will be working at @Google or any of the Alphabet companies and also walked out of @sundarpichai’s speech, you should 100% have your offer rescinded As a @Google shareholder, I do not want individuals are actively working against value creation to be at the company

English
4
0
7
1.8K
Pratik Karki
Pratik Karki@ai_evals·
Me and my 35 SpaceX ipo shares
GIF
English
0
0
0
34