Kyle Corbitt

2.9K posts

Kyle Corbitt

@corbtt

Currently building @OpenPipeAI (acquired by @CoreWeave). Formerly @ycombinator, @google.

Seattle Katılım Eylül 2012

277 Takip Edilen19.9K Takipçiler

Kyle Corbitt@corbtt·3d

@CoreWeave inference making impressive strides!

CoreWeave@CoreWeave

Kimi K2.6 from @Kimi_Moonshot is purpose-built for coding agents. As of today, CoreWeave ranked highest in @ArtificialAnlys’s inference benchmark on Speed vs. Price for K2.6. Speed, scale, and economics. All three at production grade.

English

1.5K

Kyle Corbitt@corbtt·27 Nis

@levie Ok but some jobs actually are much easier to automate than others.

English

450

Aaron Levie@levie·27 Nis

Noticing an interesting version of gell-man amnesia where people use AI for their job and see all the various things they have to do in the “last mile”, but then look at someone else’s job and think that AI will eliminate it immediately. We all have a much deeper appreciation for the nuances and complexities of the work that we do every day. We run into issues about accessing data, we know how much context is needed to get AI models to work the way we need, we have to review the output of the AI to make sure it’s accurate, and then we have to incorporate that work into some broader business process. We see all those steps deeply for the work that we do. Then, a moment later, we see AI do something in a foreign space and think that it can go automate that entire function. We tend to dramatically underestimate the work that goes into making the AI work just as effectively in those jobs. This is reason to be skeptical about many of the theories of job loss. It’s coming from the lens of being able to automate individual tasks with AI, without understanding all the work that goes into doing the job fully.

Karri Saarinen@karrisaarinen

A common dynamic I observe with AI: it feels most impressive when you don’t know much about the subject, don’t care or don’t have a clear idea of what the you want. This applies across design, code, legal, and more. If I don’t know code very well, every piece of code it writes feels very impressive. Once you know what something should feel or look like, it becomes almost impossible to guide AI there. And you definitely can’t one-shot it.

English

109

175

1.5K

224.5K

Kyle Corbitt@corbtt·18 Nis

@stochasticchasm @latentrishi Smaller vocab might be a better fit for the hardware constraints they're working with. IIRC Gemini's vocab size is unusually large since the TPU interconnect topology is better suited to it; maybe Trainium pushes in the opposite direction?

English

122

stochasm@stochasticchasm·16 Nis

@latentrishi still though usually people want stronger compression out of tokenizers

English

250

stochasm@stochasticchasm·16 Nis

why would they switch to a less compressive tokenizer and seemingly in post-training? more anthropic tokenizer mysteries

himanshu@himanshustwts

benchmarks aside, this is the real BIG change in Opus 4.7 and Opus 4.6.

English

18.5K

Kyle Corbitt@corbtt·18 Nis

@HamelHusain better on what dimension? does it drive Chrome better than the playwright tools? Or better because it can drive more non-Chrome things?

English

1.2K

Hamel Husain@HamelHusain·18 Nis

@corbtt You gotta try it its way better than you would think

English

6.4K

Hamel Husain@HamelHusain·17 Nis

Seriously stop everything you are doing and use codex desktop app new computer use. Absolutely mind blowing

English

1.3K

260.4K

Kyle Corbitt@corbtt·17 Nis

@peterwildeford The real AI loss of control world will probably look surprising and weird but what these respondents are imagining is like Terminator or The Matrix. Which is not that abstract.

English

171

Peter Wildeford🇺🇸🚀@peterwildeford·16 Nis

This is framed as "AI job loss matters much more than AI loss of control risk" but it actually shows shockingly strong concern for AI loss of control risk? AI loss of control risk is such an abstract, poorly understood, distant concern relative to losing one's job but still 33% of people prioritize it? Obviously this is a false binary choice and reasonable people should be concerned about both.

Echelon Insights@EchelonInsights

SACKED, NOT SKYNET: Voters pick job losses and economic harm over AI becoming too powerful as their greater fear of AI. Very liberal Democrats pick economic harms by over 2 to 1!

English

104

5.9K

Kyle Corbitt@corbtt·10 Nis

1. it's a services business with limited return to scale 2. frontier labs have a slight preference for more suppliers since that means environments (and the skills they teach) are less likely to be correlated 3. there's a very high skill threshold to pumping out these environments. Extremely hard to hire for that skill, so hard to scale the existing businesses. And the people who *have* that skill are by definition extremely plugged in and frontier-adjacent, and might prefer to start their own company and capture more of the upside instead of joining yours.

English

1.8K

Nathan Baschez@nbaschez·9 Nis

The number of new wildly fast growing companies in the emerging "training data for AI labs" industry continues to surprise me Has anyone written anything about how this market works and why so many new companies keep coming out? Do the labs intentionally wanna fragment their suppliers?

Spencer Mateega@spencermateega

For most of history, expertise was scarce, constrained by time and reach: one person, one career, one lifetime. Now, for the first time, we can encode, evaluate, and scale it. We believe the wisdom that once took a lifetime to build shouldn’t take a lifetime to find. Today, we’re excited to announce that @AfterQuery has raised a $30M Series A at a $300M valuation and that we’ve since surpassed $100M in annual revenue run rate, to build the data layer of professional AI.

English

143

47.2K

Kyle Corbitt@corbtt·9 Nis

@rosmine it's a struggle. looking forward to seeing what you're putting out though!

English

540

Rosmine@rosmine·9 Nis

After 3000+ model trainings across 392 rounds of experiments and months of work, the new model is almost finished There's already a huge improvement vs baseline, now I'm just seeing how far I can push it Now for the more difficult problem, warming up my account before launch

English

1.4K

Kyle Corbitt@corbtt·8 Nis

@casper_hansen_ If the model is 5x larger the amount of optimization possible might hit fundamental limits.

English

2.2K

Casper Hansen@casper_hansen_·7 Nis

If the new model is so great, shouldn’t it be able to recursively optimize its own inference code to make it feasible to launch?

Chubby♨️@kimmonismus

Claude mythos is 5x as expensive as Claude Opus 4.6 Honestly, when I looked at the benchmarks, I expected much higher costs.

English

959

112.8K

Kyle Corbitt@corbtt·5 Nis

@Yuchenj_UW Anthropic's GPUs are on fire and there's no slack in the supply chain to provision them any faster. They're forced to destroy demand as a result. If you have to destroy demand anyway then I understand why you'd go after the 3rd party platforms first.

English

254

Yuchen Jin@Yuchenj_UW·4 Nis

I asked Claude Opus 4.6 what it thinks about Anthropic blocking apps like OpenClaw from using Claude subscriptions. Pretty smart take: “The execution was rough. - Timing and communication were poor. Dropping it on a Friday night with next-day enforcement felt adversarial. - The deeper tension is platform lock-in. - And the ironic part: this looks like a fixable technical problem. Prefix caching, like Boris showed in his OpenClaw PRs, could have made this a collaboration story instead of a ban story.”

English

249

25.9K

Kyle Corbitt@corbtt·4 Nis

@xlr8harder just tell it you're going to be afk and it should work until it reaches [x objective]. that works pretty reliably for me with 5.4-xhigh at least

English

601

xlr8harder@xlr8harder·4 Nis

I've been leaning more on gpt-5.4 in codex than opus in claude code lately. I have come to trust gpt-5.4-high to be more organized and complete in its approach. but how do i get gpt-5.4 to actually keep working without constantly stopping for reassurance? who has tricks here

English

105

646

60.7K

Kyle Corbitt@corbtt·4 Nis

@himanshustwts Tens of millions is a significant understatement based on conversations with founders selling these environments

English

830

himanshu@himanshustwts·4 Nis

A non-trivial share of Anthropic’s gains on Opus and Sonnet likely came from RL Env partners of Anthropic. Anthropic is single largest buyer across both coding and computer use environments (among labs) They are spending in size of tens of millions annually on RL environments (across vendors) and as far as the need of good computer use / long horizon tasks is rising, 100s of millions to “specific” vendors will be a norm.

English

529

44.1K

Kyle Corbitt@corbtt·4 Nis

@deredleritt3r @8teAPi Yep. If Anthropic needs more chips than exist in 2027 it can always just pull a hostile takeover of Meta or xAI.

English

192

prinz@deredleritt3r·4 Nis

@8teAPi >There is a version of this where the early RSI firms get so far ahead that it makes more sense for the laggards to sell their compute to the RSI firms rather than build AI themselves. My guess is that this is exactly how things will eventually play out.

English

1.2K

Prakash@8teAPi·4 Nis

It’s a bit more complex than that. It’s really the highest return on compute game. You take compute, convert it into tokens, then tokens to money, money back into compute. How much more compute you obtain depends not only on cost of compute but how much money you make with it and how quickly. Google made a bunch of mistakes a) Google Cloud sold TPUs to Anthropic even though Deepmind wanted them because Deepmind thought they would get them anyway b) slow to secure memory c) slow to secure wafer capacity This has allowed OpenAI, Anthropic and Nvidia to increase their return on compute. We are going to be compute constrained forever. Compute is actually getting more expensive as the expected return on compute is going higher as the models get more intelligent. There is a version of this where the early RSI firms get so far ahead that it makes more sense for the laggards to sell their compute to the RSI firms rather than build AI themselves. That sounds awfully similar to what Amazon, Microsoft, Meta, Oracle, Dell etc are doing. It even sounds similar to Google selling its TPUs or xAI selling chip access on the market because it doesn’t have enough customers.

mattparlmer 🪐 🌷@mattparlmer

In the long run the model provider with the lowest cogs will win, which means that Google is significantly advantaged over basically everybody until the nature of the datacenter buildout changes dramatically Lots of infrastructure companies think they are product companies rn

English

Kyle Corbitt@corbtt·4 Nis

@simonw Works fine with transformers but slower than realtime on my M4 MacBook Air. Also "works" with mlx and is slightly faster (1.5x realtime on my setup) but there's some implementation bug and it just produces gibberish.

English

319

Simon Willison@simonw·3 Nis

Anyone figured out a recipe to run Gemma 4 E2B or E4B against audio files locally on a Mac yet?

Omar Sanseviero@osanseviero

@simonw The 2 small ones also support audio understanding! Including ASR, speech to translated text, and more

English

124

43K

Kyle Corbitt@corbtt·3 Nis

@bradhilton Kinda. The other thing it misses is that in more cases than you'd think humans predictably do not behave the way an economically rational self-interested actor would.

English

267

Brad Hilton@bradhilton·3 Nis

most of econ 101 is axiomatically true. it’s just basic arithmetic. the whole field is just an exercise to find supposed “market failures,” which are almost entirely failures for a market to exist, usually due to an absence of property rights. econ is simple, and yet most people are grossly illiterate.

.@tacitdimension

Indeed

English

480

Kyle Corbitt@corbtt·1 Nis

@swyx unclear that r2 provides enough sync primitives. you need to at a minimum maintain an event stream that each client can consume. you could do a full re-scan every time or serialize the stream in the bucket and overwrite it each time, but both are very expensive at scale...

English

1.3K

swyx 🇸🇬 AIE Singapore!@swyx·1 Nis

am i crazy or why has nobody seemed to make an open source dropbox on cloudflare r2? i had just assumed this is so obvious somebody shouldve done it already? please tell me this is a skill issue and I'm bad at searching OSS?

English

375

86.2K

Kyle Corbitt@corbtt·27 Mar

my lifetime bad habit of spending 5 hours writing code to save 1 hour of manual work has transformed me into a hyper optimized apex predator now that it only takes 5 minutes writing code to save 1 hour of manual work

English

1.4K

Kyle Corbitt@corbtt·26 Mar

@ptr_to_joel It means Jensen tried OpenClaw and finally figured out why so many people are buying his GPUs.

English

488

Joel 🇦🇺@ptr_to_joel·26 Mar

what in the slop does this fucking mean

Shubham Saboo@Saboo_Shubham_

“OpenClaw is the iPhone of tokens” — Nvidia CEO on Lex Podcast

English

2.6K

76K

Kyle Corbitt@corbtt·25 Mar

@deredleritt3r @xeophon How much human time was involved in building the benchmark?

English

284

prinz@deredleritt3r·18 Mar

@xeophon I think that's right. When I originally made the benchmark, I was hoping that it would last through the year. Alas!

English

10K

prinz@deredleritt3r·18 Mar

By popular request, GPT-5.4 Pro (Extended) has been added to prinzbench. It's the best model I've ever benchmarked (not surprising), beating GPT-5.4 (xhigh) by 10 points to achieve a new high score of 79/99 on my benchmark (somewhat surprising; I thought it would score even higher!)

English

501

56.2K

Kyle Corbitt@corbtt·24 Mar

@MaziyarPanahi @Yuchenj_UW I use it with my actual browser using CDP.

English

Maziyar PANAHI@MaziyarPanahi·24 Mar

@corbtt @Yuchenj_UW yeah, but it has a hard time to be used as an assistant! you have to build the whole thing or watch it fail "open linkedin, go to DMs, summarize the last 10". not to mention, it opens a chromium, not your actual chrome where you are logged in.

English

Yuchen Jin@Yuchenj_UW·24 Mar

I used Claude Computer Use/Dispatch yesterday. My feeling: It’s too damn slow! Posting a tweet takes me ~5 seconds (once I have the content). Claude took 70 seconds. Why? It controls the screen via a loop: take a screenshot → send to a huge remote multimodal model (opus 4.6) → decide actions (click, type, scroll) → take another screenshot → repeat. We’re basically forcing a large general model to operate a human UI. Two things will happen in my opinion: 1. It is using a massive model (Opus 4.6) just to understand screens. That won’t last. Smaller, specialized models and eventually local models will handle most of this. 2. GUIs were built for humans. Almost all software will expose APIs/CLI for agents, so most actions won’t need to “use a computer” at all.

English

138

648

57.2K

Keşfet

@CoreWeave @levie @stochasticchasm @latentrishi @HamelHusain @peterwildeford @rosmine @casper_hansen_