matt turk

1.4K posts

matt turk

@TurkMatthew

ML researcher @withprotegeai prev: ML @cleanlabAI @goodwatercap, Quant @coinbase & @goldmansachs, EECS @ucberkeley

New York, NY Katılım Mart 2012

2.1K Takip Edilen706 Takipçiler

Sabitlenmiş Tweet

matt turk@TurkMatthew·17 Mar

Excited to share that I’ve joined @withprotegeai as a Senior Machine Learning Researcher on the DataLab team. After 2 years at @CleanlabAI, working with the team was an incredibly formative experience. I’m deeply grateful for the chance I had to learn from them to work on data-centric AI with such thoughtful researchers and builders, and to contribute during a period that ultimately led to Cleanlab being acquired into @joinHandshake AI. I learned a tremendous amount about the importance of data quality, evaluation, and trustworthiness in modern AI systems to make them more accurate and reliable. Throughout my time there, my conviction only grew that the next major advances in AI will come not just from better models or more compute, but from better data. At DataLab, our goal is to treat the data layer of AI with the same scientific rigor that model labs apply to algorithms by building a dedicated research institution for AI data: designing high-fidelity datasets and multimodal benchmarks grounded in real-world scenarios, working closely with frontier labs on their hardest data challenges, and developing standardized ways, including “FICO scores for AI data”, to measure dataset quality, contamination, and benchmark reliability. Another important piece of this work is understanding how different kinds of data support different parts of the AI training stack. Reinforcement learning (RL) environments are a powerful form of training data that generate structured training tuples like (state, action, reward, next state) and are extremely useful for post-training optimization when the world can be simulated. But many of the highest-value domains for AI, including healthcare, enterprise workflows, and complex multimodal reasoning, cannot be faithfully simulated. Advancing models in these areas requires real-world datasets, carefully designed benchmarks, and domain-specific data for pre-training and mid-training adaptation. The idea behind DataLab is simple but important: every major leap in AI capability has historically followed a breakthrough in data (from ImageNet to large-scale web corpora). As models and compute continue to advance rapidly, closing the data gap, the gap between the data that AI systems need and the data that actually exists in usable form, may be one of the most important challenges for the field. Here is more info on some of the work the team has done so far: datalab.withprotege.ai

Bobby Samuels@BobbySamuels

x.com/i/article/2030…

English

868

matt turk@TurkMatthew·2d

Top-tier read calling out performative grind culture. “Great work has always demanded sacrifice and often brutal hours and I'm not disputing this. What I'm disputing is the direction. These people, many of them friends, have more economic freedom than any class in history and they've chosen, freely, to simulate the conditions of a Chinese assembly line and call it virtue.”

Will Manidis@WillManidis

x.com/i/article/2056…

English

106

matt turk retweetledi

OpenAI@OpenAI·2d

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

English

986

3.7K

26K

12.7M

matt turk@TurkMatthew·2d

@4JimLee @bevedoni @karpathy Karpathy will not lead the business side he will do what he does best and lead R&D (specifically accelerating pre-training research)

English

JL@4JimLee·2d

@bevedoni @karpathy To outflank Dario for the leadership role or use Elon's rentable compute without anyone capable of controlling Karpathy?... Ha ha ha, but honestly, at some point someone needs to lead the business side better than Dario.

English

3.4K

Doniyor@bevedoni·2d

maybe dumb question, but why didn't @karpathy join xAI?

English

158

502

593K

matt turk@TurkMatthew·3d

@vijaythirumalai This is the most unsurprising choice

English

Vijay Thirumalai@vijaythirumalai·3d

I simply cannot fathom this move Dario doesn’t look that charismatic to me atleast from his interviews Andrej could have joined 1/ Elon 2/ Zuck 3/ Google 4/ Apple 5/ MSFT Everyone would have given him a blank check and much bigger role and platform How was Dario able to hire him ?

Crypto India@CryptooIndia

BREAKING: OpenAI co-founder Andrej Karpathy joins Anthropic.

English

412

965

586.8K

matt turk retweetledi

Bleacher Report@BleacherReport·3d

STILL NOT OVER THIS SHOT 😭 WHO WAS WEMBY FEELING LIKE??

English

14.4K

177.3K

matt turk retweetledi

Goldie@dezgoldie·3d

okc/spurs could be the best playoff series since 2016 finals

English

294

25.3K

matt turk retweetledi

Everything Price Sufferer (but especially eggs)@agraybee·4d

I didn't know Helen of Troy could generate so much conflict.

English

1.4K

9.8K

82.5K

2.8M

matt turk@TurkMatthew·15 May

@phoebeyao Completely agree!

English

141

Phoebe Yao@phoebeyao·14 May

training data is starting to look like a zero knowledge proof problem. labs have to judge quality without seeing the full dataset or the QC pipeline behind it. vendors proxy quality with multi-rollout pass rates, small-model ablations, and downstream eval gains. but compute and iteration costs explode as environments and trajectories grow more complex. quality has no ceiling, and the best data is often the hardest to capture in a metric or explain in a writeup. huge alpha in making data quality more legible.

English

404

58.3K

matt turk retweetledi

ak0@annanay·14 May

UPDATE turns out that yes, nobody has heard of tmux.

Ishaan Sehgal@ishaansehgal

Your agents die when you close your laptop. We fixed that. Omnara Cloud Sandboxing is live. Close your laptop, the session keeps running in the cloud. Open it back up, you're right where you left off. Close the lid. Keep building.

English

103

3.4K

318.5K

matt turk retweetledi

Little Kevin 5, cpa@pootsobotka·13 May

Hey man sorry I’m busy this weekend i can’t chill - Friday night i gotta wait in the pizza line in the west village from 6-9 and then wait on line to get into the Spaniard from 9-12 - Saturday morning gotta wait on the blank st coffee line from 8-12 -Saturday afternoon New York or nowhere line from 12-4 - Saturday night i have the frozen yogurt line from 4-12 If you wanna hang Sunday morning I’m waiting on a bagel line from 9-1 let me know

English

101

189.2K

matt turk@TurkMatthew·12 May

@himanshustwts Agreed. Mid-training is also a newer term than pre/post training and is pretty much just smaller scale, more curated and capability targeted pre-training

English

himanshu@himanshustwts·12 May

reflecting on this again. "mid-training, pre-training, post-training -- these are all made-up words. there were perfectly fine words before them like "supervised learning" instead of "SFT." when a technology has a profound impact on society and people start asking questions that don't have well-defined answers, we make up names. there are definitions for these things, and people who can articulate them precisely are valued. but there's also value in zooming out and recognizing there's a lot of ways to skin a cat." cc @groundzero_twt / pod with Curtis from Handshake AI.

rohan anil@_arohan_

There is no pre-training, post-training, or test-time training. There are only priors, updates, constraints, and compute budgets. There is only TRAINING. Last several years we shipped the org chart to fundamental optimization science.

English

6.8K

matt turk@TurkMatthew·7 May

@CoachDanGo In the U.S. I would say this is not true (unfortunately)

English

107

Dan Go@CoachDanGo·7 May

Eating healthy is actually eating normally but most people think it’s dieting.

English

153

929

10.4K

202.8K

matt turk@TurkMatthew·6 May

@alex_whedon Congrats on the launch @alex_whedon

English

Alexander Whedon@alex_whedon·5 May

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

1.5K

2.9K

23K

12.7M

matt turk retweetledi

Adithya S K@adithya_s_k·5 May

Excited to release the Ultimate guide to RL environments! Definitions of RL environments differ wildly in the LLM era, so we spent the last month building several RL environments across 6 different frameworks, domains and complexities to map out which are easiest to build with and which can be scaled to 1000s.

English

158

1.2K

222.3K

matt turk retweetledi

Mads@europemaxxed·4 May

Americans see this and want coworking spaces and gyms

Alex Recouso@recouso

Okay guys, had a few cultural shocks in Spain: > Go to the gym, opens 10am on a Sunday > Go to work from a coworking, closed > Go to a coffee shop, no wifi Absolutely unthinkable in a barely productive economy like the US, yet alone UAE. Europe is a daylight museum.

English

353

7.2K

168.2K

matt turk@TurkMatthew·30 Nis

@oracles Lmao

André@oracles·29 Nis

Had a Jane Street interview in 2019. Round 8. Interviewer texts: 'Equinox Brookfield. 6 AM. Bring a calculator you won't use.' I show up. He's on the StairMaster reading a printout of the CBOE VIX term structure. Doesn't get off. Nods at the machine next to him. 'You see that guy on the rower? Goldman MD. Comes here every morning at 6:04. Leaves at 6:38. What's the implied vol on his arrival time?' 'I don't know his variance.' 'Sample size of one year, 250 sessions. Standard deviation is 90 seconds. Annualize it.' I do the math in my head. '90 seconds times sqrt(250). About 24 minutes annualized.' 'Wrong. You annualized like it's a return. Time-of-arrival doesn't compound. It's a Poisson process with drift. The correct answer is his arrival is more punctual than the 6 train. Now price me an option on whether he shows up tomorrow.' I think for a second. 'If he's been here 250 days in a row, base rate is 99.6%. But you have to adjust for his vacation schedule and probability of injury, call it 96%.' 'Strike?' '$10 if he shows, $0 if he doesn't.' 'I'll sell you that option for $9.40.' I think about it. 'No. Expected value is $9.60. You're underpricing by 20 cents.' 'Correct. Now why am I selling it to you?' I freeze. 'Because I just saw him limp on the way in. You're buying my information for 20 cents. You overpaid.' We get off the machines. Walk to the smoothie bar. He orders a $19 smoothie, doesn't drink it. 'Last question. The girl behind the counter makes 200 smoothies per morning. She has perfect information on who's actually here and who's faking it. Citadel guys leak their attendance to her every day for the price of a tip. If I gave you $50,000 to set up a market on which Citadel PM gets fired this quarter, what's your bid-ask?' 'Insider trading.' 'Wrong answer. There's no public security. Try again.' I think. 'I'd quote 8 to 12 percent on any given PM. Spread of 4 points to cover adverse selection. Tighten the spread for PMs I have data on.' 'Where do you get the data?' 'The smoothie girl.' 'Good. How much do you pay her?' '10% of P&L.' 'Wrong. You pay her a flat $200 a week. If you pay her on P&L she becomes your counterparty. Right now she's your data source. Don't conflate edges.' He hands me the untouched smoothie. 'Throw this out on Vesey Street, not in the building. The staff knows what gets wasted. Outside, it was consumed. Same smoothie, different signal.' I do it. Thursday I get the email. 'Offer rescinded. Your bid-ask on the PM market was too tight. 4 points doesn't cover the tail. The girl is a single point of failure and you didn't price her counterparty risk. Also you held the smoothie in your right hand. Right-handers throw with their right hand. Camera saw the hesitation.'

Deedy@deedydas

Jane Street made ~$40B in 2025 with 3,500 employees, a ~2x from the year before. At ~65-70% profit margin, that's $8M profit / employee, the highest for a 1000+ ppl company. High-frequency trading continues to be the most efficient money making engine. I want to share an old story about my Jane Street interview in 2014. Jane Street was known for hiring a lot of math, physics and CS olympiad winners from top universities and putting them through many rounds - including, for trading roles, a gauntlet of mental math. It was my 6th interview and my final round and I recall being asked "What is the next day after today in DD/MM/YYYY where all the digits are unique?" They'd toy with you and say "You can use a pencil and paper, if you want" but you knew that was an instant no. Painstakingly and as quickly as I could, I came to an answer. "How confident are you that this is correct on a 0-1 probability scale?" the interviewer said. "0.95", I blurted out, not fully knowing how to answer that. "Are you sure?" After thinking harder for a few more seconds, I realized I could've flipped the digits around to get a closer date. I gave the interviewer my answer. It was correct. "0.95 huh?" he chuckled. That's when I knew I failed. Note: fwiw, other companies that come close in efficiency are - Tether ($90M+ profit/emp) - Hyperliquid ($80M+ profit/emp) and on revenue: - Valve ($50M/emp) - OnlyFans ($37M/emp) - Craigslist ($14M/emp) - Anthropic ($12M/emp, run rate) - OpenAI ($8M/emp, run rate) For comparison, Nvidia is very efficient at scale and is $4.4M/emp.

English

120

2.8K

762.7K

matt turk retweetledi

Tristan@Tristan0x·27 Nis

Had a Jane Street interview in 2014 End of round 5. Interviewer says "round 6 will find you." Three weeks. Nothing. I'm in line at the Trader Joe's on 14th. Line snakes past the frozen aisle, the way it always does. I'm holding a bag of orange chicken and a four-pack of cold brew concentrate. Guy in front of me is in a Patagonia vest over a quarter-zip. Not turning around. Inching his cart forward every 30 seconds. Picks a box of something out of his own cart. Holds it up over his shoulder without turning around. "Ant on a corner of this box. Walks to the opposite corner along edges. How many shortest paths." "Six." "Now in 4 dimensions." I think. Tesseract. One edge per dimension, any order. "4!. Twenty-four." "Confidence." "0.85." "Correct on both counts." Offer Monday. $69k base (which I'm told is a 'cultural fit discount'). No bonus. No equity. No relocation. They will, however, allow me to name one (1) colocated server in their NY4 cage. I take my time. This is the only thing I'm being given. I submit "Steve." Already a Steve. I submit "Steve2." Discouraged naming pattern. I submit "Steven." Confusing with Steve. I submit "Big Steve." Big Steve exists in NY5. I submit "my Steve." Approved. Two weeks into onboarding, IT pings the eng channel: "my Steve is down." Three engineers respond at once asking which Steve. The thread spirals. Someone clarifies: "it's named 'my Steve.'" Someone replies "yes but whose." A VP joins the thread: "is this a possessive or a proper noun." Nobody knows. A meeting is called. The meeting is titled "re: my Steve." Nine engineers attend. The first ten minutes are spent establishing whether the meeting title refers to the server or to a Steve belonging to the meeting organizer. The meeting organizer is named Devesh. Devesh does not know a Steve. Offer rescinded Friday. Reason listed in the email as: "Introduced ambiguity into production naming taxonomy. Unrecoverable." I'm no longer allowed in the building. I am also, somehow, no longer allowed in the Trader Joe's on 14th. The doors don't open for me. I have tested this six times. The orange chicken is still in there. Three bags deep on the shelf, frozen, waiting. I think about it every day.

Deedy@deedydas

English

2.2K

366.2K

matt turk@TurkMatthew·27 Nis

@HAHazony @Cernovich It’s also related to the amount of novel experiences you experience per day. This rate is way higher as a kid when the world is not saturated to you yet - and a reason why purposely introducing novelty as an adult makes your life feel longer.

English

6.4K

H.A. Hazony@HAHazony·26 Nis

There is a psychological reason for this: When you are young, life seems relatively immense, because you have lived only a fraction of it. At 5, life is still 20x times larger than you. So your brain projects that life is going to be LOOOONG. And every new experience makes it seem like life is infinite, rich, and full of novelty. As you get older, life stops seeming infinite, because it becomes proportionaly not so large compared to you. When you pass 40, you understand the true size of life: eg. It is twice as big as I am today. That is something graspable. This is so horrifying a lot of people freak out ("Midlife-Crisis") Now, when you are older, nothing seems new, it's just the same old expierences you've had. Day in day out. And indeed, time itself is passing faster, because a "day" or a "year" is comparatively a much shorter span of time for 50 y/o than a 5 y/o. In sum, the older you get, the faster times passes, the smaller life seems, and the less meaningful it feels.

English

577

135.8K

Cernovich@Cernovich·26 Nis

Extremely concerning how quickly time moves the older you get.

English

474

1.1K

16.9K

1.6M

matt turk retweetledi

RCsWorld@RCsWrld·25 Nis

These 2 just went on a 10-0 run in a playoff game

English

140

9.7K

106.6K

2.3M

matt turk@TurkMatthew·21 Nis

@EvanLuthra Very old video

English

Evan Luthra@EvanLuthra·20 Nis

The Head of Claude Code at Anthropic said he hasn’t written code by hand in months. In 2 days he shipped 49 full features. All written 100% by AI. He just dropped a 30 min talk on exactly how he does it. Worth more than any $500 vibe coding course. Bookmark it:

English

183

779

9.2K

1.6M

Keşfet

@4JimLee @bevedoni @karpathy @vijaythirumalai @phoebeyao @himanshustwts @groundzero_twt @CoachDanGo