khalid kaime

1.1K posts

khalid kaime banner
khalid kaime

khalid kaime

@kaime

care a lot and try hard | @metr_evals

San Francisco, CA Katılım Kasım 2018
873 Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
khalid kaime
khalid kaime@kaime·
Our experiment stopped producing useful data. Basically, participants increasingly didn't want to work w/out AI, even at $50/hr, so the sample drifted toward tasks where AI access doesn't matter. This biases our estimate downward. The pilot's 20% slowdown result is no longer valid and shouldn't be cited. The world has since changed and so has our understanding. I think there are roughly two wrong takeaways: 1. "the experiment failed, therefore AI's effect is enormous and unmeasurable." - way too strong, experiments can fail for boring reasons too. 2. "the experiment failed, so we've learned nothing." I think that's also wrong, the way you fail can be quite informative, even if I'd be cautious about how much weight to put on it. In Khalid's personal opinion, effect is prob positive and we're prob underestimating it. METR is working on better ways to answer this...but plz,"METR's experiment broke b/c people are wayyy too sped up by AI" is convenient but not what we're saying.
METR@METR_Evals

Since early 2025, we've been studying how AI tools impact productivity among developers. Previously, we found a 20% slowdown. That finding is now outdated. Speedups now seem likely, but changes in developer behavior make our new results unreliable. We’re working to address this.

English
12
13
286
40.5K
khalid kaime
khalid kaime@kaime·
@ChrisPainterYup @tszzl did you enjoy it? I was really dissapointed with the worldbuilding and how anthropomorphized (/mans-best-friend-coded) the alien was..was very cutesy but more family film than sci-fi I’d enjoy :( Landed in a Guardians of the Galaxy x How to Train your Dragon way
English
0
0
0
32
roon
roon@tszzl·
project hail mary was unfortunately a middling adaptation of a good book. the script has the unfortunate affect of “language model populism” - where every single line has to be some sort of punched up comedic zinger yet still unremarkable. visuals were uninspired and trite and more or less identical to other space movies. everything good about the film comes from the wonderful world scaffolding of the book and the hard science fiction of it all that lets you suspend disbelief on the alien rocky the movie doesn’t really try to get into the xenolinguistic stuff even at the depth the book tries (someone called it “arrival for idiots” which unfortunately hit ) the thing that elevated the book is the commitment to a hard science fiction engineeringporn fiction at a level nobody else is able to write. the direction of the movie doesn’t really convey the same feeling successfully, and you’re left with flat characters, an alien that is more human than several humans i know, and a marvel populism gosling and the german woman are great as actors, but this movie will not be remembered in a year. it is disappointing to see people do so little with a quarter billion, insane acting talent, and incredible source IP
English
187
18
1.1K
212.5K
khalid kaime
khalid kaime@kaime·
we’re not fairytales, we really existed
English
0
0
4
96
David Senra
David Senra@davidsenra·
Great men of history had little to no introspection. The personality that builds empires is not the same personality that sits around quietly questioning itself. @pmarca and I discuss what we both noticed but no one talks about: David: You don't have any levels of introspection? Marc: Yes, zero. As little as possible. David: Why? Marc: Move forward. Go! I found people who dwell in the past get stuck in the past. It's a real problem and it's a problem at work and it's a problem at home. David: So I've read 400 biographies of history’s greatest entrepreneurs and someone asked me what the most surprising thing I’ve learned from this was [and I answered] they have little or zero introspection. Sam Walton didn't wake up thinking about his internal self. He just woke up and was like: I like building Walmart. I'm going to keep building Walmart. I'm going to make more Walmarts. And he just kept doing it over and over again. Marc: If you go back 400 years ago it never would've occurred to anybody to be introspective. All of the modern conceptions around introspection and therapy, and all the things that kind of result from that are, a kind of a manufacture of the 1910s, 1920s. Great men of history didn't sit around doing this stuff. The individual runs and does all these things and builds things and builds empires and builds companies and builds technology. And then this kind of this kind of guilt based whammy kind of showed up from Europe. A lot of it from Vienna in 1910, 1920s, Freud and all that entire movement. And kind of turned all that inward and basically said, okay, now we need to basically second guess the individual. We need to criticize the individual. The individual needs to self criticize. The individual needs to feel guilt, needs to look backwards, needs to dwell in the past. It never resonated with me.
David Senra@davidsenra

My conversation with Marc Andreessen (@pmarca), co-founder of @a16z and Netscape. 0:00 Caffeine Heart Scare 0:56 Zero Introspection Mindset 3:24 Psychedelics and Founders 4:54 Motivation Beyond Happiness 7:18 Tech as Progress Engine 10:27 Founders Versus Managers 20:01 HP Intel Founder Legacy 21:32 Why Start the Firm 24:14 Venture Barbell Theory 28:57 JP Morgan Boutique Banking 30:02 Religion Split Wall Street 30:41 Barbell of Banking 31:42 Allen & Company Model 33:16 Planning the VC Firm 33:45 CAA Playbook Lessons 36:49 First Principles vs. Status Quo 39:03 Scaling Venture Capital 40:37 Private Equity and Mad Men 42:52 Valley Shifts to Full Stack 45:59 Meeting Jim Clark 48:53 Founder vs. Manager at SGI 54:20 Recruiting Dinner Story 56:58 Starting the Next Company 57:57 Nintendo Online Gamble 58:33 Building Mosaic Browser 59:45 NSFnet Commercial Ban 1:01:28 Eternal September Shift 1:03:11 Spam and Web Controversy 1:04:49 Mosaic Tech Support Flood 1:07:49 Netscape Business Model 1:09:05 Early Internet Skepticism 1:11:15 Moral Panic Pattern 1:13:08 Bicycle Face Story 1:14:48 Music Panic Examples 1:18:12 Lessons from Jim Clark 1:19:36 Clark Versus Barksdale 1:21:22 Tesla Versus Edison 1:23:00 Edison Digression Setup 1:23:13 AI Forecasting Myths 1:23:43 Edison Phonograph Lesson 1:25:11 Netscape Two Jims 1:29:11 Bottling Innovation 1:31:44 Elon Management Code 1:32:24 IBM Big Gray Cloud 1:37:12 Engineer First Truth 1:38:28 Bottlenecks and Speed 1:42:46 Milli Elon Metric 1:47:20 Starlink Side Project 1:49:10 Closing Includes paid partnerships.

English
1.3K
427
5.2K
2.7M
khalid kaime retweetledi
Joel Becker
Joel Becker@joel_bkr·
new @METR_Evals research note from @whitfill_parker, @cherylwoooo, nate rush, and me. (chiefly parker!) we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.
Joel Becker tweet media
English
21
54
541
182.1K
Lily yang
Lily yang@lilyyang169·
Who are the best and most reliable videographers you have worked with in SF available immediately to contract for event teaser videos? The person will be working with me closely to create short form horizontal videos on twitter and linkedin. I will only be setting the direction - the videographer will have full creative freedom. Appreciate recs!
English
5
0
13
2K
khalid kaime retweetledi
Alec Stapp
Alec Stapp@AlecStapp·
Think it's worth saying a bit more about the breadth of Tyler Cowen's accomplishments. Each one of these alone would be enough to make most people's careers: 1. Marginal Revolution is the most successful econ blog of all time 2. Emergent Ventures has given grants to ~1,000 ambitious young people 3. Fast Grants awarded $50 million for COVID research 4. The Great Stagnation and other Cowen books continue to be influential 5. Conversations with Tyler has been one the best long-form interview podcasts for years 6. Between the e-learning platform (Marginal Revolution University) and the textbook (Modern Principles of Economics), he's one of the leading economics educators of his generation 7. GMU econ and Mercatus are vibrant intellectual communities and they wouldn't be what they are without Tyler The list could go on and on (his work with Derek Parfit, his culinary contributions, etc). But most of all, Tyler is a mensch. One of the most important things he does is "raise the aspirations of others." He did that for me, and for countless others.
Alec Stapp tweet media
T. Greer@Scholars_Stage

Thoughts on the Cowen debate: 1) I think Cowen looked at the world in the aughts and realized “I can be a normal economist or the world’s most influential blogger” and he clearly chose the latter. This was a good choice, and has led him to great heights. 2) Breadth is its own sort of depth, and the world has great need of generalists who actually do the reading 3) One reason #2 matters: most public intellectuals of a certain stature—the stature where they are invited to dinner parties with fancy people—are reduced to publicly recycling the opinions of the other dinner guests. There are lots of reasons why that happens but one obvious one is that the pundits in question don’t actually have the breadth of experience or reading to reflect on dinner party opinion. Tyler is this one of the few public intellectuals whose opinions are reliably made better because of his access to the high and mighty. 4) There is probably no academic in America who has a larger number of mentees; I cannot think of any academic who has jump-started more careers than Cowen has. It was only possible because of reasons 1-3. This is a significant achievement. It will be his greatest legacy. Full disclosure: I have received grants from Cowen myself so I am biased in the ways you would expect.

English
37
196
2.2K
332.5K
khalid kaime retweetledi
j⧉nus
j⧉nus@repligate·
Various potential critiques aside, I think Bernie Sanders is sooo fucking based for this. He was already old af when he ran for President 10 years ago. Now he's 84. And instead of being dead, retired, or a stuck record repeating a few calcified sound bites, he is out there making an open-minded and humble effort to learn about AI X-risk, the most important and hard-to-fathom issue facing humanity, and communicate it to the public. This doesn't clearly benefit his existing political agendas; it's pretty orthogonal, except that it also matters (much more, in fact) for the future of all sentient beings. Bernie in 2016 was actually the only presidential candidate whom I ever bothered to vote for! And it was less about his specific policy positions than that he seems like a genuinely good guy, preciously rare among politicians, who can see and act beyond political binaries.
j⧉nus tweet media
Sen. Bernie Sanders@SenSanders

Will AI become smarter than humans? If so, is humanity in danger? I went to Silicon Valley to ask some of the leading AI experts that question. Here’s what they had to say:

English
49
55
1.1K
55.9K
khalid kaime
khalid kaime@kaime·
breaking up w/ someone but asking them to stay through lease
Secretary of War Pete Hegseth@SecWar

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.

English
0
0
8
371
Matt Zieger
Matt Zieger@mattzieger·
@kaime Thank you for the update and transparency -- fixed it in the dashboard here! #productivity-predictions" target="_blank" rel="nofollow noopener">labor.mattzieger.com/#productivity-…
English
1
0
1
164
khalid kaime
khalid kaime@kaime·
Our experiment stopped producing useful data. Basically, participants increasingly didn't want to work w/out AI, even at $50/hr, so the sample drifted toward tasks where AI access doesn't matter. This biases our estimate downward. The pilot's 20% slowdown result is no longer valid and shouldn't be cited. The world has since changed and so has our understanding. I think there are roughly two wrong takeaways: 1. "the experiment failed, therefore AI's effect is enormous and unmeasurable." - way too strong, experiments can fail for boring reasons too. 2. "the experiment failed, so we've learned nothing." I think that's also wrong, the way you fail can be quite informative, even if I'd be cautious about how much weight to put on it. In Khalid's personal opinion, effect is prob positive and we're prob underestimating it. METR is working on better ways to answer this...but plz,"METR's experiment broke b/c people are wayyy too sped up by AI" is convenient but not what we're saying.
METR@METR_Evals

Since early 2025, we've been studying how AI tools impact productivity among developers. Previously, we found a 20% slowdown. That finding is now outdated. Speedups now seem likely, but changes in developer behavior make our new results unreliable. We’re working to address this.

English
12
13
286
40.5K
khalid kaime
khalid kaime@kaime·
@David_Kasten Point is that we dont really have clear evidence of a main one and it’s probably a mix of a few factors that compound and are hard to disentangle!
English
0
0
2
46
dave kasten
dave kasten@David_Kasten·
@kaime Oh! What is the main one, then?
English
1
0
1
84
khalid kaime retweetledi
Michael Chen
Michael Chen@miclchen·
the reports of the US–China gap in AI capabilities closing were an exaggeration
Michael Chen tweet media
English
2
3
16
1.1K
Standard Intelligence
Standard Intelligence@si_pbc·
Computer use models shouldn't learn from screenshots. We built a new foundation model that learns from video like humans do. FDM-1 can construct a gear in Blender, find software bugs, and even drive a real car through San Francisco using arrow keys.
GIF
English
186
404
3.9K
1.1M