Cheap FLOPs
42 posts


kernelbench.com is live.
can frontier models write fast triton/cuda/cutlass/cute-dsl/ptx without cheating? i add kernelbench-hard to the existing kernelbench-v3 (which was built on top of KernelBench from @anneouyang et al.)
hard mode has: fp8 gemm, topk, sonic MoE fwd, KimiDeltaAttention, paged attention decode, kahan softmax, w4a16 gemm. all of these require deep understanding of the sm120 architecture (benchmarking happens on my local rtx pro 6000 blackwell setup).
gpt 5.5 xhigh and claude opus 4.7 max clear took the W here, but i was honestly surprised with kimi-k2.6 and deepseek v4 pro.
this is just the first public iteration. im open to constructive criticism (dm works)



English

@iScienceLuvr Cloudflare tunnel to a web app running on your computer with xterm.js
English

@scaling01 They always do this: deny, say it's a docs bug and then it happens like a week later
English

deleted both posts that went viral
Lisan al Gaib@scaling01
they have mythos at home but can't even update their docs ahhhh
English

@Hesamation Yes it's garbage, ChatGPT's is really annoying as well.. it used to be unlimited now it cuts you off after a few seconds and says you've gone over your limit but they'll still do it for you if you click again, then it works
English

@koylanai They are just pretending to have made a mistake so people think it's fixed.. and that they aren't purposefully screwing over the users. 'Whoops- we changed the system prompt to make a bunch more money!'
English

The frustrating part is that the Claude Code team, along with people deep in AI psychosis, have been gaslighting anyone who raises concerns about Claude Code's recent issues.
"your reasoning setting is wrong"
"oh that benchmark is wrong"
"we checked our code, nothing is wrong"
"skill issues"
"Hate" is a strong word, but when you're paying a lot of money for a product and it actually makes your job harder, to the point where people make you start questioning the quality of your own work, it really becomes a problem.
I'm glad they identified the issue, and I genuinely want them to succeed, along with Cursor, Codex, OpenCode, and others, we need more alternatives, but we also absolutely need open benchmarks.
anthropic.com/engineering/ap…

Muratcan Koylan@koylanai
WE DON'T HATE CLAUDE CODE ENOUGH WHY ARE WE PAYING THOUSANDS OF DOLLARS IF YOUR EVERY RELEASE IS MAKING THE HARNESS LESS USABLE?
English


@dedene @AnthropicAI I'm not convinced they really fixed the root issue still.. given all their recent pricing shenanigans I think they're engaging in subterfuge on multiple levels
English

@cheapflops @AnthropicAI Yeah, my weekly limit was due for tonight 21h anyway.
This single reset doesn't give me 6 weeks of poor usage back.
English

recommended reading. cool they are fixing things. but it's also a reason i switched away from CC. no control over the harness means having to wait for them to fix things.
the model didn't change. the harness did.
ClaudeDevs@ClaudeDevs
Thanks to the entire Claude community for giving feedback and continuing to build with us. Read the full post-mortem here: anthropic.com/engineering/ap…
English

@johntitorIII @bcherny They are - don't fall for Boris' gaslighting
English

@bcherny Boris, eu amo Claude Code. Mas cara.. eu to sentindo tanto o 4.7 e 4.6 100% lobotomizados. 😭
Português

We’ve been looking into recent reports around Claude Code quality issues, and just published a post-mortem on what we found.
ClaudeDevs@ClaudeDevs
Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.
English

@bcherny In his next tweet in the thread goes on to say the main thing people complained about is still an issue

English

@bcherny @DanielleFong You revoked pro users from Claude Code… seriously?
English

@bcherny You really don't lol.. you just gaslight people and then when the heat is on you retweet a bunch of BS to hide from folks
English

@dtfskinner @ClaudeDevs "We got caught throttling you and we fixed 3 things" (but it still sucks)
English

@ClaudeDevs You could’ve just said: “We got caught throttling you all”
English

@maxi_moxa @ClaudeDevs Fancy? Sounds disgusting. I don't want Dario's dogfood
English

@ClaudeDevs dogfooding is just a fancy way of saying we used it once and it didn't explode immediately
English








