Kyle Corbitt

2.9K posts

Kyle Corbitt banner
Kyle Corbitt

Kyle Corbitt

@corbtt

Currently building @OpenPipeAI (acquired by @CoreWeave). Formerly @ycombinator, @google.

Seattle Katılım Eylül 2012
277 Takip Edilen19.9K Takipçiler
Kyle Corbitt
Kyle Corbitt@corbtt·
@levie Ok but some jobs actually are much easier to automate than others.
English
0
0
2
450
Aaron Levie
Aaron Levie@levie·
Noticing an interesting version of gell-man amnesia where people use AI for their job and see all the various things they have to do in the “last mile”, but then look at someone else’s job and think that AI will eliminate it immediately. We all have a much deeper appreciation for the nuances and complexities of the work that we do every day. We run into issues about accessing data, we know how much context is needed to get AI models to work the way we need, we have to review the output of the AI to make sure it’s accurate, and then we have to incorporate that work into some broader business process. We see all those steps deeply for the work that we do. Then, a moment later, we see AI do something in a foreign space and think that it can go automate that entire function. We tend to dramatically underestimate the work that goes into making the AI work just as effectively in those jobs. This is reason to be skeptical about many of the theories of job loss. It’s coming from the lens of being able to automate individual tasks with AI, without understanding all the work that goes into doing the job fully.
Karri Saarinen@karrisaarinen

A common dynamic I observe with AI: it feels most impressive when you don’t know much about the subject, don’t care or don’t have a clear idea of what the you want. This applies across design, code, legal, and more. If I don’t know code very well, every piece of code it writes feels very impressive. Once you know what something should feel or look like, it becomes almost impossible to guide AI there. And you definitely can’t one-shot it.

English
109
175
1.5K
224.5K
Kyle Corbitt
Kyle Corbitt@corbtt·
@stochasticchasm @latentrishi Smaller vocab might be a better fit for the hardware constraints they're working with. IIRC Gemini's vocab size is unusually large since the TPU interconnect topology is better suited to it; maybe Trainium pushes in the opposite direction?
English
1
0
1
122
stochasm
stochasm@stochasticchasm·
@latentrishi still though usually people want stronger compression out of tokenizers
English
2
0
3
250
Kyle Corbitt
Kyle Corbitt@corbtt·
@HamelHusain better on what dimension? does it drive Chrome better than the playwright tools? Or better because it can drive more non-Chrome things?
English
2
0
1
1.2K
Hamel Husain
Hamel Husain@HamelHusain·
@corbtt You gotta try it its way better than you would think
English
3
0
7
6.4K
Hamel Husain
Hamel Husain@HamelHusain·
Seriously stop everything you are doing and use codex desktop app new computer use. Absolutely mind blowing
English
76
63
1.3K
260.4K
Kyle Corbitt
Kyle Corbitt@corbtt·
@peterwildeford The real AI loss of control world will probably look surprising and weird but what these respondents are imagining is like Terminator or The Matrix. Which is not that abstract.
English
0
0
0
171
Peter Wildeford🇺🇸🚀
Peter Wildeford🇺🇸🚀@peterwildeford·
This is framed as "AI job loss matters much more than AI loss of control risk" but it actually shows shockingly strong concern for AI loss of control risk? AI loss of control risk is such an abstract, poorly understood, distant concern relative to losing one's job but still 33% of people prioritize it? Obviously this is a false binary choice and reasonable people should be concerned about both.
Echelon Insights@EchelonInsights

SACKED, NOT SKYNET: Voters pick job losses and economic harm over AI becoming too powerful as their greater fear of AI. Very liberal Democrats pick economic harms by over 2 to 1!

English
7
14
104
5.9K
Kyle Corbitt
Kyle Corbitt@corbtt·
1. it's a services business with limited return to scale 2. frontier labs have a slight preference for more suppliers since that means environments (and the skills they teach) are less likely to be correlated 3. there's a very high skill threshold to pumping out these environments. Extremely hard to hire for that skill, so hard to scale the existing businesses. And the people who *have* that skill are by definition extremely plugged in and frontier-adjacent, and might prefer to start their own company and capture more of the upside instead of joining yours.
English
1
0
60
1.8K
Nathan Baschez
Nathan Baschez@nbaschez·
The number of new wildly fast growing companies in the emerging "training data for AI labs" industry continues to surprise me Has anyone written anything about how this market works and why so many new companies keep coming out? Do the labs intentionally wanna fragment their suppliers?
Spencer Mateega@spencermateega

For most of history, expertise was scarce, constrained by time and reach: one person, one career, one lifetime. Now, for the first time, we can encode, evaluate, and scale it. We believe the wisdom that once took a lifetime to build shouldn’t take a lifetime to find. Today, we’re excited to announce that @AfterQuery has raised a $30M Series A at a $300M valuation and that we’ve since surpassed $100M in annual revenue run rate, to build the data layer of professional AI.

English
23
6
143
47.2K
Kyle Corbitt
Kyle Corbitt@corbtt·
@rosmine it's a struggle. looking forward to seeing what you're putting out though!
English
1
0
2
540
Rosmine
Rosmine@rosmine·
After 3000+ model trainings across 392 rounds of experiments and months of work, the new model is almost finished There's already a huge improvement vs baseline, now I'm just seeing how far I can push it Now for the more difficult problem, warming up my account before launch
English
3
0
8
1.4K
Kyle Corbitt
Kyle Corbitt@corbtt·
@casper_hansen_ If the model is 5x larger the amount of optimization possible might hit fundamental limits.
English
3
0
6
2.2K
Kyle Corbitt
Kyle Corbitt@corbtt·
@Yuchenj_UW Anthropic's GPUs are on fire and there's no slack in the supply chain to provision them any faster. They're forced to destroy demand as a result. If you have to destroy demand anyway then I understand why you'd go after the 3rd party platforms first.
English
0
0
0
254
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I asked Claude Opus 4.6 what it thinks about Anthropic blocking apps like OpenClaw from using Claude subscriptions. Pretty smart take: “The execution was rough. - Timing and communication were poor. Dropping it on a Friday night with next-day enforcement felt adversarial. - The deeper tension is platform lock-in. - And the ironic part: this looks like a fixable technical problem. Prefix caching, like Boris showed in his OpenClaw PRs, could have made this a collaboration story instead of a ban story.”
English
50
14
249
25.9K
Kyle Corbitt
Kyle Corbitt@corbtt·
@xlr8harder just tell it you're going to be afk and it should work until it reaches [x objective]. that works pretty reliably for me with 5.4-xhigh at least
English
1
0
8
601
xlr8harder
xlr8harder@xlr8harder·
I've been leaning more on gpt-5.4 in codex than opus in claude code lately. I have come to trust gpt-5.4-high to be more organized and complete in its approach. but how do i get gpt-5.4 to actually keep working without constantly stopping for reassurance? who has tricks here
English
105
3
646
60.7K
Kyle Corbitt
Kyle Corbitt@corbtt·
@himanshustwts Tens of millions is a significant understatement based on conversations with founders selling these environments
English
0
0
5
830
himanshu
himanshu@himanshustwts·
A non-trivial share of Anthropic’s gains on Opus and Sonnet likely came from RL Env partners of Anthropic. Anthropic is single largest buyer across both coding and computer use environments (among labs) They are spending in size of tens of millions annually on RL environments (across vendors) and as far as the need of good computer use / long horizon tasks is rising, 100s of millions to “specific” vendors will be a norm.
English
13
13
529
44.1K
Kyle Corbitt
Kyle Corbitt@corbtt·
@deredleritt3r @8teAPi Yep. If Anthropic needs more chips than exist in 2027 it can always just pull a hostile takeover of Meta or xAI.
English
0
0
2
192
prinz
prinz@deredleritt3r·
@8teAPi >There is a version of this where the early RSI firms get so far ahead that it makes more sense for the laggards to sell their compute to the RSI firms rather than build AI themselves. My guess is that this is exactly how things will eventually play out.
English
6
2
41
1.2K
Prakash
Prakash@8teAPi·
It’s a bit more complex than that. It’s really the highest return on compute game. You take compute, convert it into tokens, then tokens to money, money back into compute. How much more compute you obtain depends not only on cost of compute but how much money you make with it and how quickly. Google made a bunch of mistakes a) Google Cloud sold TPUs to Anthropic even though Deepmind wanted them because Deepmind thought they would get them anyway b) slow to secure memory c) slow to secure wafer capacity This has allowed OpenAI, Anthropic and Nvidia to increase their return on compute. We are going to be compute constrained forever. Compute is actually getting more expensive as the expected return on compute is going higher as the models get more intelligent. There is a version of this where the early RSI firms get so far ahead that it makes more sense for the laggards to sell their compute to the RSI firms rather than build AI themselves. That sounds awfully similar to what Amazon, Microsoft, Meta, Oracle, Dell etc are doing. It even sounds similar to Google selling its TPUs or xAI selling chip access on the market because it doesn’t have enough customers.
mattparlmer 🪐 🌷@mattparlmer

In the long run the model provider with the lowest cogs will win, which means that Google is significantly advantaged over basically everybody until the nature of the datacenter buildout changes dramatically Lots of infrastructure companies think they are product companies rn

English
2
0
21
4K
Kyle Corbitt
Kyle Corbitt@corbtt·
@simonw Works fine with transformers but slower than realtime on my M4 MacBook Air. Also "works" with mlx and is slightly faster (1.5x realtime on my setup) but there's some implementation bug and it just produces gibberish.
English
0
0
0
319
Simon Willison
Simon Willison@simonw·
Anyone figured out a recipe to run Gemma 4 E2B or E4B against audio files locally on a Mac yet?
Omar Sanseviero@osanseviero

@simonw The 2 small ones also support audio understanding! Including ASR, speech to translated text, and more

English
29
1
124
43K
Kyle Corbitt
Kyle Corbitt@corbtt·
@bradhilton Kinda. The other thing it misses is that in more cases than you'd think humans predictably do not behave the way an economically rational self-interested actor would.
English
1
0
0
267
Brad Hilton
Brad Hilton@bradhilton·
most of econ 101 is axiomatically true. it’s just basic arithmetic. the whole field is just an exercise to find supposed “market failures,” which are almost entirely failures for a market to exist, usually due to an absence of property rights. econ is simple, and yet most people are grossly illiterate.
.@tacitdimension

Indeed

English
1
0
2
480
Kyle Corbitt
Kyle Corbitt@corbtt·
@swyx unclear that r2 provides enough sync primitives. you need to at a minimum maintain an event stream that each client can consume. you could do a full re-scan every time or serialize the stream in the bucket and overwrite it each time, but both are very expensive at scale...
English
2
0
6
1.3K
swyx 🇸🇬 AIE Singapore!
am i crazy or why has nobody seemed to make an open source dropbox on cloudflare r2? i had just assumed this is so obvious somebody shouldve done it already? please tell me this is a skill issue and I'm bad at searching OSS?
English
98
0
375
86.2K
Kyle Corbitt
Kyle Corbitt@corbtt·
my lifetime bad habit of spending 5 hours writing code to save 1 hour of manual work has transformed me into a hyper optimized apex predator now that it only takes 5 minutes writing code to save 1 hour of manual work
Kyle Corbitt tweet media
English
1
0
24
1.4K
Kyle Corbitt
Kyle Corbitt@corbtt·
@ptr_to_joel It means Jensen tried OpenClaw and finally figured out why so many people are buying his GPUs.
English
0
0
1
488
prinz
prinz@deredleritt3r·
@xeophon I think that's right. When I originally made the benchmark, I was hoping that it would last through the year. Alas!
English
2
0
51
10K
prinz
prinz@deredleritt3r·
By popular request, GPT-5.4 Pro (Extended) has been added to prinzbench. It's the best model I've ever benchmarked (not surprising), beating GPT-5.4 (xhigh) by 10 points to achieve a new high score of 79/99 on my benchmark (somewhat surprising; I thought it would score even higher!)
prinz tweet media
English
29
27
501
56.2K
Maziyar PANAHI
Maziyar PANAHI@MaziyarPanahi·
@corbtt @Yuchenj_UW yeah, but it has a hard time to be used as an assistant! you have to build the whole thing or watch it fail "open linkedin, go to DMs, summarize the last 10". not to mention, it opens a chromium, not your actual chrome where you are logged in.
English
1
0
1
99
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I used Claude Computer Use/Dispatch yesterday. My feeling: It’s too damn slow! Posting a tweet takes me ~5 seconds (once I have the content). Claude took 70 seconds. Why? It controls the screen via a loop: take a screenshot → send to a huge remote multimodal model (opus 4.6) → decide actions (click, type, scroll) → take another screenshot → repeat. We’re basically forcing a large general model to operate a human UI. Two things will happen in my opinion: 1. It is using a massive model (Opus 4.6) just to understand screens. That won’t last. Smaller, specialized models and eventually local models will handle most of this. 2. GUIs were built for humans. Almost all software will expose APIs/CLI for agents, so most actions won’t need to “use a computer” at all.
English
138
31
648
57.2K