matt turk

1.4K posts

matt turk

matt turk

@TurkMatthew

ML researcher @withprotegeai prev: ML @cleanlabAI @goodwatercap, Quant @coinbase & @goldmansachs, EECS @ucberkeley

New York, NY Katılım Mart 2012
2.1K Takip Edilen706 Takipçiler
Sabitlenmiş Tweet
matt turk
matt turk@TurkMatthew·
Excited to share that I’ve joined @withprotegeai as a Senior Machine Learning Researcher on the DataLab team. After 2 years at @CleanlabAI, working with the team was an incredibly formative experience. I’m deeply grateful for the chance I had to learn from them to work on data-centric AI with such thoughtful researchers and builders, and to contribute during a period that ultimately led to Cleanlab being acquired into @joinHandshake AI. I learned a tremendous amount about the importance of data quality, evaluation, and trustworthiness in modern AI systems to make them more accurate and reliable. Throughout my time there, my conviction only grew that the next major advances in AI will come not just from better models or more compute, but from better data. At DataLab, our goal is to treat the data layer of AI with the same scientific rigor that model labs apply to algorithms by building a dedicated research institution for AI data: designing high-fidelity datasets and multimodal benchmarks grounded in real-world scenarios, working closely with frontier labs on their hardest data challenges, and developing standardized ways, including “FICO scores for AI data”, to measure dataset quality, contamination, and benchmark reliability. Another important piece of this work is understanding how different kinds of data support different parts of the AI training stack. Reinforcement learning (RL) environments are a powerful form of training data that generate structured training tuples like (state, action, reward, next state) and are extremely useful for post-training optimization when the world can be simulated. But many of the highest-value domains for AI, including healthcare, enterprise workflows, and complex multimodal reasoning, cannot be faithfully simulated. Advancing models in these areas requires real-world datasets, carefully designed benchmarks, and domain-specific data for pre-training and mid-training adaptation. The idea behind DataLab is simple but important: every major leap in AI capability has historically followed a breakthrough in data (from ImageNet to large-scale web corpora). As models and compute continue to advance rapidly, closing the data gap, the gap between the data that AI systems need and the data that actually exists in usable form, may be one of the most important challenges for the field. Here is more info on some of the work the team has done so far: datalab.withprotege.ai
Bobby Samuels@BobbySamuels

x.com/i/article/2030…

English
1
0
12
868
matt turk
matt turk@TurkMatthew·
Top-tier read calling out performative grind culture. “Great work has always demanded sacrifice and often brutal hours and I'm not disputing this. What I'm disputing is the direction. These people, many of them friends, have more economic freedom than any class in history and they've chosen, freely, to simulate the conditions of a Chinese assembly line and call it virtue.”
Will Manidis@WillManidis

x.com/i/article/2056…

English
0
0
1
106
matt turk retweetledi
OpenAI
OpenAI@OpenAI·
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
English
986
3.7K
26K
12.7M
matt turk
matt turk@TurkMatthew·
@4JimLee @bevedoni @karpathy Karpathy will not lead the business side he will do what he does best and lead R&D (specifically accelerating pre-training research)
English
0
0
0
22
JL
JL@4JimLee·
@bevedoni @karpathy To outflank Dario for the leadership role or use Elon's rentable compute without anyone capable of controlling Karpathy?... Ha ha ha, but honestly, at some point someone needs to lead the business side better than Dario.
English
1
0
6
3.4K
Doniyor
Doniyor@bevedoni·
maybe dumb question, but why didn't @karpathy join xAI?
English
158
3
502
593K
matt turk retweetledi
Bleacher Report
Bleacher Report@BleacherReport·
STILL NOT OVER THIS SHOT 😭 WHO WAS WEMBY FEELING LIKE??
English
2K
14.4K
177.3K
9M
matt turk retweetledi
Goldie
Goldie@dezgoldie·
okc/spurs could be the best playoff series since 2016 finals
English
12
12
294
25.3K
Phoebe Yao
Phoebe Yao@phoebeyao·
training data is starting to look like a zero knowledge proof problem. labs have to judge quality without seeing the full dataset or the QC pipeline behind it. vendors proxy quality with multi-rollout pass rates, small-model ablations, and downstream eval gains. but compute and iteration costs explode as environments and trajectories grow more complex. quality has no ceiling, and the best data is often the hardest to capture in a metric or explain in a writeup. huge alpha in making data quality more legible.
English
25
19
404
58.3K
matt turk retweetledi
Little Kevin 5, cpa
Little Kevin 5, cpa@pootsobotka·
Hey man sorry I’m busy this weekend i can’t chill - Friday night i gotta wait in the pizza line in the west village from 6-9 and then wait on line to get into the Spaniard from 9-12 - Saturday morning gotta wait on the blank st coffee line from 8-12 -Saturday afternoon New York or nowhere line from 12-4 - Saturday night i have the frozen yogurt line from 4-12 If you wanna hang Sunday morning I’m waiting on a bagel line from 9-1 let me know
English
27
101
3K
189.2K
matt turk
matt turk@TurkMatthew·
@himanshustwts Agreed. Mid-training is also a newer term than pre/post training and is pretty much just smaller scale, more curated and capability targeted pre-training
English
0
0
1
82
himanshu
himanshu@himanshustwts·
reflecting on this again. "mid-training, pre-training, post-training -- these are all made-up words. there were perfectly fine words before them like "supervised learning" instead of "SFT." when a technology has a profound impact on society and people start asking questions that don't have well-defined answers, we make up names. there are definitions for these things, and people who can articulate them precisely are valued. but there's also value in zooming out and recognizing there's a lot of ways to skin a cat." cc @groundzero_twt / pod with Curtis from Handshake AI.
rohan anil@_arohan_

There is no pre-training, post-training, or test-time training. There are only priors, updates, constraints, and compute budgets. There is only TRAINING. Last several years we shipped the org chart to fundamental optimization science.

English
4
5
76
6.8K
matt turk
matt turk@TurkMatthew·
@CoachDanGo In the U.S. I would say this is not true (unfortunately)
English
0
0
0
107
Dan Go
Dan Go@CoachDanGo·
Eating healthy is actually eating normally but most people think it’s dieting.
English
153
929
10.4K
202.8K
Alexander Whedon
Alexander Whedon@alex_whedon·
Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.
English
1.5K
2.9K
23K
12.7M
matt turk retweetledi
Adithya S K
Adithya S K@adithya_s_k·
Excited to release the Ultimate guide to RL environments! Definitions of RL environments differ wildly in the LLM era, so we spent the last month building several RL environments across 6 different frameworks, domains and complexities to map out which are easiest to build with and which can be scaled to 1000s.
English
51
158
1.2K
222.3K
André
André@oracles·
Had a Jane Street interview in 2019. Round 8. Interviewer texts: 'Equinox Brookfield. 6 AM. Bring a calculator you won't use.' I show up. He's on the StairMaster reading a printout of the CBOE VIX term structure. Doesn't get off. Nods at the machine next to him. 'You see that guy on the rower? Goldman MD. Comes here every morning at 6:04. Leaves at 6:38. What's the implied vol on his arrival time?' 'I don't know his variance.' 'Sample size of one year, 250 sessions. Standard deviation is 90 seconds. Annualize it.' I do the math in my head. '90 seconds times sqrt(250). About 24 minutes annualized.' 'Wrong. You annualized like it's a return. Time-of-arrival doesn't compound. It's a Poisson process with drift. The correct answer is his arrival is more punctual than the 6 train. Now price me an option on whether he shows up tomorrow.' I think for a second. 'If he's been here 250 days in a row, base rate is 99.6%. But you have to adjust for his vacation schedule and probability of injury, call it 96%.' 'Strike?' '$10 if he shows, $0 if he doesn't.' 'I'll sell you that option for $9.40.' I think about it. 'No. Expected value is $9.60. You're underpricing by 20 cents.' 'Correct. Now why am I selling it to you?' I freeze. 'Because I just saw him limp on the way in. You're buying my information for 20 cents. You overpaid.' We get off the machines. Walk to the smoothie bar. He orders a $19 smoothie, doesn't drink it. 'Last question. The girl behind the counter makes 200 smoothies per morning. She has perfect information on who's actually here and who's faking it. Citadel guys leak their attendance to her every day for the price of a tip. If I gave you $50,000 to set up a market on which Citadel PM gets fired this quarter, what's your bid-ask?' 'Insider trading.' 'Wrong answer. There's no public security. Try again.' I think. 'I'd quote 8 to 12 percent on any given PM. Spread of 4 points to cover adverse selection. Tighten the spread for PMs I have data on.' 'Where do you get the data?' 'The smoothie girl.' 'Good. How much do you pay her?' '10% of P&L.' 'Wrong. You pay her a flat $200 a week. If you pay her on P&L she becomes your counterparty. Right now she's your data source. Don't conflate edges.' He hands me the untouched smoothie. 'Throw this out on Vesey Street, not in the building. The staff knows what gets wasted. Outside, it was consumed. Same smoothie, different signal.' I do it. Thursday I get the email. 'Offer rescinded. Your bid-ask on the PM market was too tight. 4 points doesn't cover the tail. The girl is a single point of failure and you didn't price her counterparty risk. Also you held the smoothie in your right hand. Right-handers throw with their right hand. Camera saw the hesitation.'
Deedy@deedydas

Jane Street made ~$40B in 2025 with 3,500 employees, a ~2x from the year before. At ~65-70% profit margin, that's $8M profit / employee, the highest for a 1000+ ppl company. High-frequency trading continues to be the most efficient money making engine. I want to share an old story about my Jane Street interview in 2014. Jane Street was known for hiring a lot of math, physics and CS olympiad winners from top universities and putting them through many rounds - including, for trading roles, a gauntlet of mental math. It was my 6th interview and my final round and I recall being asked "What is the next day after today in DD/MM/YYYY where all the digits are unique?" They'd toy with you and say "You can use a pencil and paper, if you want" but you knew that was an instant no. Painstakingly and as quickly as I could, I came to an answer. "How confident are you that this is correct on a 0-1 probability scale?" the interviewer said. "0.95", I blurted out, not fully knowing how to answer that. "Are you sure?" After thinking harder for a few more seconds, I realized I could've flipped the digits around to get a closer date. I gave the interviewer my answer. It was correct. "0.95 huh?" he chuckled. That's when I knew I failed. Note: fwiw, other companies that come close in efficiency are - Tether ($90M+ profit/emp) - Hyperliquid ($80M+ profit/emp) and on revenue: - Valve ($50M/emp) - OnlyFans ($37M/emp) - Craigslist ($14M/emp) - Anthropic ($12M/emp, run rate) - OpenAI ($8M/emp, run rate) For comparison, Nvidia is very efficient at scale and is $4.4M/emp.

English
85
120
2.8K
762.7K
matt turk retweetledi
Tristan
Tristan@Tristan0x·
Had a Jane Street interview in 2014 End of round 5. Interviewer says "round 6 will find you." Three weeks. Nothing. I'm in line at the Trader Joe's on 14th. Line snakes past the frozen aisle, the way it always does. I'm holding a bag of orange chicken and a four-pack of cold brew concentrate. Guy in front of me is in a Patagonia vest over a quarter-zip. Not turning around. Inching his cart forward every 30 seconds. Picks a box of something out of his own cart. Holds it up over his shoulder without turning around. "Ant on a corner of this box. Walks to the opposite corner along edges. How many shortest paths." "Six." "Now in 4 dimensions." I think. Tesseract. One edge per dimension, any order. "4!. Twenty-four." "Confidence." "0.85." "Correct on both counts." Offer Monday. $69k base (which I'm told is a 'cultural fit discount'). No bonus. No equity. No relocation. They will, however, allow me to name one (1) colocated server in their NY4 cage. I take my time. This is the only thing I'm being given. I submit "Steve." Already a Steve. I submit "Steve2." Discouraged naming pattern. I submit "Steven." Confusing with Steve. I submit "Big Steve." Big Steve exists in NY5. I submit "my Steve." Approved. Two weeks into onboarding, IT pings the eng channel: "my Steve is down." Three engineers respond at once asking which Steve. The thread spirals. Someone clarifies: "it's named 'my Steve.'" Someone replies "yes but whose." A VP joins the thread: "is this a possessive or a proper noun." Nobody knows. A meeting is called. The meeting is titled "re: my Steve." Nine engineers attend. The first ten minutes are spent establishing whether the meeting title refers to the server or to a Steve belonging to the meeting organizer. The meeting organizer is named Devesh. Devesh does not know a Steve. Offer rescinded Friday. Reason listed in the email as: "Introduced ambiguity into production naming taxonomy. Unrecoverable." I'm no longer allowed in the building. I am also, somehow, no longer allowed in the Trader Joe's on 14th. The doors don't open for me. I have tested this six times. The orange chicken is still in there. Three bags deep on the shelf, frozen, waiting. I think about it every day.
Deedy@deedydas

Jane Street made ~$40B in 2025 with 3,500 employees, a ~2x from the year before. At ~65-70% profit margin, that's $8M profit / employee, the highest for a 1000+ ppl company. High-frequency trading continues to be the most efficient money making engine. I want to share an old story about my Jane Street interview in 2014. Jane Street was known for hiring a lot of math, physics and CS olympiad winners from top universities and putting them through many rounds - including, for trading roles, a gauntlet of mental math. It was my 6th interview and my final round and I recall being asked "What is the next day after today in DD/MM/YYYY where all the digits are unique?" They'd toy with you and say "You can use a pencil and paper, if you want" but you knew that was an instant no. Painstakingly and as quickly as I could, I came to an answer. "How confident are you that this is correct on a 0-1 probability scale?" the interviewer said. "0.95", I blurted out, not fully knowing how to answer that. "Are you sure?" After thinking harder for a few more seconds, I realized I could've flipped the digits around to get a closer date. I gave the interviewer my answer. It was correct. "0.95 huh?" he chuckled. That's when I knew I failed. Note: fwiw, other companies that come close in efficiency are - Tether ($90M+ profit/emp) - Hyperliquid ($80M+ profit/emp) and on revenue: - Valve ($50M/emp) - OnlyFans ($37M/emp) - Craigslist ($14M/emp) - Anthropic ($12M/emp, run rate) - OpenAI ($8M/emp, run rate) For comparison, Nvidia is very efficient at scale and is $4.4M/emp.

English
51
85
2.2K
366.2K
matt turk
matt turk@TurkMatthew·
@HAHazony @Cernovich It’s also related to the amount of novel experiences you experience per day. This rate is way higher as a kid when the world is not saturated to you yet - and a reason why purposely introducing novelty as an adult makes your life feel longer.
English
0
2
48
6.4K
H.A. Hazony
H.A. Hazony@HAHazony·
There is a psychological reason for this: When you are young, life seems relatively immense, because you have lived only a fraction of it. At 5, life is still 20x times larger than you. So your brain projects that life is going to be LOOOONG. And every new experience makes it seem like life is infinite, rich, and full of novelty. As you get older, life stops seeming infinite, because it becomes proportionaly not so large compared to you. When you pass 40, you understand the true size of life: eg. It is twice as big as I am today. That is something graspable. This is so horrifying a lot of people freak out ("Midlife-Crisis") Now, when you are older, nothing seems new, it's just the same old expierences you've had. Day in day out. And indeed, time itself is passing faster, because a "day" or a "year" is comparatively a much shorter span of time for 50 y/o than a 5 y/o. In sum, the older you get, the faster times passes, the smaller life seems, and the less meaningful it feels.
English
17
33
577
135.8K
Cernovich
Cernovich@Cernovich·
Extremely concerning how quickly time moves the older you get.
English
474
1.1K
16.9K
1.6M
matt turk retweetledi
RCsWorld
RCsWorld@RCsWrld·
These 2 just went on a 10-0 run in a playoff game
RCsWorld tweet media
English
140
9.7K
106.6K
2.3M
Evan Luthra
Evan Luthra@EvanLuthra·
The Head of Claude Code at Anthropic said he hasn’t written code by hand in months. In 2 days he shipped 49 full features. All written 100% by AI. He just dropped a 30 min talk on exactly how he does it. Worth more than any $500 vibe coding course. Bookmark it:
English
183
779
9.2K
1.6M