Prathmesh Pandey

296 posts

Prathmesh Pandey

@file_mutex

Building Next-gen Coding Platform | Programmer | Xoogler

Mountain View, CA เข้าร่วม Temmuz 2017

351 กำลังติดตาม64 ผู้ติดตาม

Prathmesh Pandey@file_mutex·13h

@dillon_mulroy @zeeg will do right after Cloudflare deletes all of the "AI" branding from their website @eastdakota

English

596

Dillon Mulroy@dillon_mulroy·14h

@file_mutex @zeeg delete your account

English

617

David Cramer@zeeg·1d

codex writes the most digusting code idk who's responsible for pre-training over there but you gotta flip the script

English

715

111.7K

Prathmesh Pandey@file_mutex·17h

@kskrygan on my codebase, fable was 10x smarter than opus...

English

120

Kirill Skrygan@kskrygan·1d

Real assessment of Fable5 from engineers around me: -somewhat better than Opus on OSS repos -about the same on closed-source repos -much more expensive So for real orgs, the value prop is pretty vague But sure, keep believing it was so powerful the government had to ban it

English

105

11.7K

Prathmesh Pandey@file_mutex·1d

@mattshumer_ That's the problem with unhardened harnesses if one need to wait for better models.

English

157

Matt Shumer@mattshumer_·1d

Assuming Anthropic is able to restore Fable in the next few days, there's literally zero point doing any meaningful work until it is back. What can be done in 100 hours with Opus can be done in 1 with Fable. Hopefully this is figured out quickly.

Anthropic@AnthropicAI

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

641

199

999.1K

Prathmesh Pandey@file_mutex·1d

@banteg have you tried looking at the brighter side?

English

106

banteg@banteg·1d

wow fuck you anthropic, i pay $200 to use the model for one day? a reset won't help here. completely frauded out.

Anthropic@AnthropicAI

English

221

16.3K

Prathmesh Pandey@file_mutex·1d

Et tu, $AMZN? Anthropic ditched Google for Amazon, just to have them get cheated on loll

NIK@ns123abc

🚨US government’s action to shut down Anthropic’s top AI models was actually triggered by an unnamed rival company claiming it could break Mythos’s security, not by China

English

577

Prathmesh Pandey@file_mutex·1d

@zeeg hmm there are perhaps five people out there in the world, who can beat _the_ sota coding harness. no one will be able to read and understand that much code. dig in when required? sure. but know it in-n-out? most probably not.

English

3.1K

David Cramer@zeeg·1d

@file_mutex people who dont read the code are not serious people and it takes a serious person to ship production software

English

488

37.5K

Prathmesh Pandey@file_mutex·1d

@davidcrawshaw Looks like you have been out of the '/loop'.

English

112

David Crawshaw@davidcrawshaw·1d

Current status: data analysis and code analysis (and both combined!) with Fable. It appears unmatched at extracting insight from a mountain of code and logs. Then I take the last stanza it creates and hand it to another model for implementation.

English

2.3K

Prathmesh Pandey@file_mutex·2d

My MCP is roughly saving 50% on blended token consumption in codex and claude. That doesn't mean codex, claude can't build something similar but as a server owner their philosophy will be rooted in "seeing the maximum queries coming through".

Dan Robinson@danrobinson

If you’re proud of your really sophisticated skill or harness, try benchmarking it against a simple one-sentence prompt as a sanity check Codex, Claude Code, and ChatGPT Pro are really, really good

English

251

Prathmesh Pandey@file_mutex·2d

@AndrewCurran_ @bayeslord It def has big model smell. I have my reviewers on GPT 5.5 xhigh, and Opus 4.8 used to stumble through 10 revisions before getting past the reviewers. Fable takes 2-3 revisions.

English

231

Andrew Curran@AndrewCurran_·2d

@bayeslord I've been trying to tell people. And Fable isn't the new Mythos. And the new Mythos isn't what they have internally.

English

4.4K

bayes@bayeslord·2d

Fable is in fact Built Different

English

6.4K

Prathmesh Pandey@file_mutex·2d

@doodlestein @__paleologo seems reasonable given that he says "pro" -- which gives 20x less usage than max.

English

192

Jeffrey Emanuel@doodlestein·2d

@__paleologo This seems to really vary a lot. I’ve been surprised by how much mileage I’ve gotten so far with Fable across a variety of tasks. Granted, I have 22 Max accounts, but it’s not like I’ve blasted through all of them already either. I’m asking them to do really hard stuff, though.

English

5.1K

Gappy (Giuseppe Paleologo)@__paleologo·2d

Clearly, Fable is doing a lot of work, and unleashing a ton of agents. To review a short technical note, it released 31 agents, coded simulations to verify my results, did "adversarial reviews". Eventually, it only made the assumptions slightly more rigorous. It is all good. For a four-page technical note+a little code, though, it consumed all my Pro session tokens, *plus* $17 worth of credits. It is ridiculously expensive. I have 20-page reports that are way more complex than this. I can see how Anthropic has entered the phase of market-clearing prices, yield management, and pre-IPO. I recall Boris Cherny saying in a podcast, "run Opus [4.6], not Sonnet. It's worth it". I feel comfortable saying that running your top-shelf model is *not* worth it anymore. Decreasing returns, on most tasks. Like in the real world, some people can be real smart, but real expensive.

English

1.1K

162.9K

Prathmesh Pandey@file_mutex·2d

@claudeai Has Fable not been trained to invoke MCP servers yet? I don't see it doing so. @AnthropicAI

English

Claude@claudeai·4d

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

English

14.5K

104.6K

55.6M

Prathmesh Pandey@file_mutex·4d

@jeremyphoward @finbarrtimbers If everyone else uses a frontier model except them, then they won't be the frontier much longer.

English

Jeremy Howard@jeremyphoward·4d

@finbarrtimbers If they believed that, they'd be doing the opposite of what they chose. x.com/jeremyphoward/…

Jeremy Howard@jeremyphoward

Easy solution to slow down recursive AI self improvement: - The lab with the top-ranked model must agree THEY must not use it for working on frontier AI - But everyone else should have access to it. By definition, this means the frontier doesn't advance.

English

104

8.5K

finbarr@finbarrtimbers·4d

As my entire feed is criticizing Anthropic, I think that the team there genuinely believes what they’re saying. It’s not a marketing/anticompetitive tactic. They genuinely believe these models are dangerous and that AI research should be slowed down.

English

101

406

94.5K

Prathmesh Pandey@file_mutex·4d

Fable has nothing to do with your ability to orchestrate agents.

Ed Zitron@edzitron

There it is

English

165

Prathmesh Pandey@file_mutex·4 Haz

@charliermarsh both will eventually end up measuring the same thing; as the latter tends to 100%, the former will tend to zero.

English

144

Charlie Marsh@charliermarsh·4 Haz

I think "percent of code read/reviewed by a human" is perhaps a more interesting metric than "percent of code authored by an agent"

English

161

7.7K

Prathmesh Pandey@file_mutex·4 Haz

In short, Anthropic is asking for IOCs-like distribution mechanism controlled by the US govt. What tech exists and who is allowed to share what with whom needs to be essentially controlled.

Andrew Curran@AndrewCurran_

Anthropic says Recursive Self Improvement is approaching faster than they expected. Quoting from the blog: 'What should we do? If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe. Without a global coordination mechanism, companies and governments will have to make difficult decisions about safety while under competitive and geopolitical pressures. We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner. A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates. None of this is necessarily impossible in principle—the world has built verification regimes for other complex technologies (e.g., the Intermediate-Range Nuclear Forces Treaty)—but those regimes took decades to build both the infrastructure and the trust. We don’t have that long. A unilateral pause by one lab, by contrast, is achievable immediately, but accomplishes much less: it would change who the front-runner is, but it would not create the wider deliberative process that is currently missing. In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.'

English

180

Prathmesh Pandey@file_mutex·4 Haz

@scaling01 was clear to anyone paying attention to Anthropic's public github PRs x.com/file_mutex/sta…

Prathmesh Pandey@file_mutex

My hunch was correct. Anthropic had been testing Mythos since February 24 -- and this model is literally a beast. I won't be surprised if you can point it an existing medium scale system, and it iteratively builds a faster version of it over few days. www-cdn.anthropic.com/53566bf5440a10…

English

696

Lisan al Gaib@scaling01·4 Haz

Anthropic is shipping 3.2x more code per person with Mythos nowadays than with Opus 4.5 around half a year ago

Anthropic@AnthropicAI

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

English

797

87.4K

Prathmesh Pandey@file_mutex·4 Haz

markets are sitting on multiple trillions in liquidity -- few billions are peanuts. do your dd

Chandra R. Srikanth@chandrarsrikant

Came and scooped up $45 Billion, with plans for another $40 Billion. Used the window just ahead of three mega IPOs: SpaceX, Anthropic and OpenAI.

English

Prathmesh Pandey@file_mutex·4 Haz

Devs burning $10K of compute on $200/mo plan are the biggest risk to model providers. Unlike sticky chat users, devs have zero loyalty and will easily migrate the second someone else drops a better coding model.

Peter Gostev@petergostev

Who is using more compute - 1b of ChatGPT users or 5m of Codex users?

English

ค้นพบ

@dillon_mulroy @zeeg @eastdakota @kskrygan @mattshumer_ @banteg @davidcrawshaw @AndrewCurran_