Prathmesh Pandey

300 posts

Prathmesh Pandey banner
Prathmesh Pandey

Prathmesh Pandey

@file_mutex

Building Next-gen Coding Platform | Programmer | Xoogler

Mountain View, CA Bergabung Temmuz 2017
350 Mengikuti64 Pengikut
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
i was a top Go readability reviewer at google, often did thousands of locs a day. human code review being some "holy grail" is pure cope. i constantly saw "top decile" faang swes shipping obvious logic bugs their whole team somehow lgtm'd. your average faang engineer is a couple orders of magnitude stupider than a 2026 sota harness.
English
0
0
0
23
David Cramer
David Cramer@zeeg·
codex writes the most digusting code idk who's responsible for pre-training over there but you gotta flip the script
David Cramer tweet media
English
89
14
743
114.1K
Nicholas Griffin
Nicholas Griffin@ngriffin_uk·
@file_mutex @zeeg absolute rubbish 🤣 the level of arrogance and unfounded confidence that ai gives people is insane.
English
1
0
21
435
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
sota coding harnesses are fundamentally better at reasoning and reviewing code than any human. your belief that your biological brain can out-review them on logic isn't "accountability," it's just ego. reading every single line doesn't guarantee quality, it just adds the most error-prone, high-latency component back into the loop: you.
English
1
0
0
212
Dillon Mulroy
Dillon Mulroy@dillon_mulroy·
i’m not sure what your point is here, i build with ai all day long and build ai focused products at cloudflare, but i’m now where close to being naive enough to think these agents and models and produce quality on their own. i very actively steer, readjust, and direct them to get good out comes and that includes reading every single line they produce. why? because agents don’t remove accountability
English
3
0
38
533
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
@andrewqu Folks who have even basic instrumentation will note it right away.
English
0
0
0
23
Andrew Qu
Andrew Qu@andrewqu·
Hot take: a lot of people wouldn’t be able to tell the difference if they were randomly routed between gpt-5.5, opus-4.8, or fable-5 for their day to day work
English
305
45
1.6K
97.4K
Kirill Skrygan
Kirill Skrygan@kskrygan·
Real assessment of Fable5 from engineers around me: -somewhat better than Opus on OSS repos -about the same on closed-source repos -much more expensive So for real orgs, the value prop is pretty vague But sure, keep believing it was so powerful the government had to ban it
English
15
5
105
11.8K
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
@mattshumer_ That's the problem with unhardened harnesses if one need to wait for better models.
English
0
0
1
161
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
@zeeg hmm there are perhaps five people out there in the world, who can beat _the_ sota coding harness. no one will be able to read and understand that much code. dig in when required? sure. but know it in-n-out? most probably not.
English
14
0
2
4.3K
David Cramer
David Cramer@zeeg·
@file_mutex people who dont read the code are not serious people and it takes a serious person to ship production software
English
24
53
593
54.7K
David Crawshaw
David Crawshaw@davidcrawshaw·
Current status: data analysis and code analysis (and both combined!) with Fable. It appears unmatched at extracting insight from a mountain of code and logs. Then I take the last stanza it creates and hand it to another model for implementation.
English
3
0
28
2.4K
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
My MCP is roughly saving 50% on blended token consumption in codex and claude. That doesn't mean codex, claude can't build something similar but as a server owner their philosophy will be rooted in "seeing the maximum queries coming through".
Dan Robinson@danrobinson

If you’re proud of your really sophisticated skill or harness, try benchmarking it against a simple one-sentence prompt as a sanity check Codex, Claude Code, and ChatGPT Pro are really, really good

English
0
0
0
321
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
@AndrewCurran_ @bayeslord It def has big model smell. I have my reviewers on GPT 5.5 xhigh, and Opus 4.8 used to stumble through 10 revisions before getting past the reviewers. Fable takes 2-3 revisions.
English
0
0
1
232
Andrew Curran
Andrew Curran@AndrewCurran_·
@bayeslord I've been trying to tell people. And Fable isn't the new Mythos. And the new Mythos isn't what they have internally.
English
3
0
95
4.4K
bayes
bayes@bayeslord·
Fable is in fact Built Different
English
3
1
78
6.4K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@__paleologo This seems to really vary a lot. I’ve been surprised by how much mileage I’ve gotten so far with Fable across a variety of tasks. Granted, I have 22 Max accounts, but it’s not like I’ve blasted through all of them already either. I’m asking them to do really hard stuff, though.
English
6
0
25
5.1K
Gappy (Giuseppe Paleologo)
Clearly, Fable is doing a lot of work, and unleashing a ton of agents. To review a short technical note, it released 31 agents, coded simulations to verify my results, did "adversarial reviews". Eventually, it only made the assumptions slightly more rigorous. It is all good. For a four-page technical note+a little code, though, it consumed all my Pro session tokens, *plus* $17 worth of credits. It is ridiculously expensive. I have 20-page reports that are way more complex than this. I can see how Anthropic has entered the phase of market-clearing prices, yield management, and pre-IPO. I recall Boris Cherny saying in a podcast, "run Opus [4.6], not Sonnet. It's worth it". I feel comfortable saying that running your top-shelf model is *not* worth it anymore. Decreasing returns, on most tasks. Like in the real world, some people can be real smart, but real expensive.
English
58
58
1.1K
163.2K
Claude
Claude@claudeai·
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
English
5K
14.5K
104.6K
55.7M
finbarr
finbarr@finbarrtimbers·
As my entire feed is criticizing Anthropic, I think that the team there genuinely believes what they’re saying. It’s not a marketing/anticompetitive tactic. They genuinely believe these models are dangerous and that AI research should be slowed down.
English
101
12
406
94.6K
Prathmesh Pandey
Prathmesh Pandey@file_mutex·
@charliermarsh both will eventually end up measuring the same thing; as the latter tends to 100%, the former will tend to zero.
English
0
0
0
145
Charlie Marsh
Charlie Marsh@charliermarsh·
I think "percent of code read/reviewed by a human" is perhaps a more interesting metric than "percent of code authored by an agent"
English
16
6
161
7.7K