simobis

227 posts

simobis

@simobis23

Obsessed with AI progress 🚀

Entrou em Nisan 2023

4K Seguindo10 Seguidores

simobis@simobis23·21h

LLMs just hit a wall

English

simobis@simobis23·1d

I expect GPT-5.6 to be delayed

English

simobis@simobis23·1d

We need fable 5 without coding

English

simobis@simobis23·1d

Anthropic roadmap: Fable 5 → falls back to Opus 4.8 Opus 5 → falls back to Fable 5 Soon we'll have a circular fallback architecture.

English

simobis@simobis23·1d

Fable 5 is now basically a 100% fallback to Opus 4.8.

English

simobis@simobis23·1d

a model isn't released until it's open sourced.

English

simobis@simobis23·5d

Anthropic released Opus 4.8 just before Fable 5 so they can route questions they don't want their most powerful model answering to Opus 4.8 instead.

English

simobis@simobis23·5d

@ArtificialAnlys @AnthropicAI Tokens cost latency for running the benchmark?

English

1.2K

Artificial Analysis@ArtificialAnlys·5d

Claude Fable 5 launched today at #1 on the Artificial Analysis Intelligence Index, putting Anthropic nearly 5 points ahead of any other lab’s best model We supported @AnthropicAI with pre-release evaluation of Claude Fable 5. Claude Fable 5 scores 64.9 on the Artificial Analysis Intelligence Index, claiming the #1 rank overall. It is ~5 points ahead of the closest non-Anthropic model (GPT-5.5), and Anthropic models now occupy both of the top 2 places. Key takeaways for Claude Fable 5 (adaptive reasoning with max effort and Opus 4.8 as fallback model): ➤ New safety guardrails for Mythos-class models: Claude Fable 5 uses the same underlying model as Claude Mythos 5 for public usage, with additional guardrails for potentially-harmful cybersecurity, biology, chemistry, and distillation-related queries. We tested Fable 5 using Anthropic’s new ‘fallback’ mechanism, which can route safety-flagged messages to Claude Opus 4.8. Anthropic states that fallback occurs in fewer than 5% of sessions on average, and we recorded fallback routing in ~8% of tasks across the Intelligence Index (mostly in scientific questions from evaluations like GPQA, AA-Omniscience and Humanity’s Last Exam) ➤ State-of-the-art Intelligence: Claude Fable 5 takes the #1 position on the Artificial Analysis Intelligence Index, scoring 64.9 and setting the highest score on 5 of the 10 underlying benchmarks. On AA-Omniscience, our knowledge and hallucination benchmark, Fable 5 scores 40, +7 points over the previous leader, Gemini 3.1 Pro Preview, driven primarily by higher accuracy. We generally observe a strong relationship between AA-Omniscience accuracy and model size in open weights models, which suggests Fable 5 could be larger than previous public Anthropic models ➤ Frontier agentic capability: Claude Fable 5 is at the frontier across all three agentic evaluations in the Index: GDPval-AA (real-world work tasks), Terminal-Bench Hard (agentic coding), and Tau2-bench Telecom (tool use for customer service). Its GDPval-AA Elo of 1932 is a significant jump from the previous leader, Claude Opus 4.8, further extending Anthropic’s lead in agentic capabilities ➤ Leading HLE score, but refusal and fallback in 9% of tasks: Claude Fable 5 scores 53% on Humanity’s Last Exam, more than 7 points ahead of the next-best model, Claude Opus 4.8 (max). Fable 5 triggers safety guardrails on 9% of HLE tasks, falling back to Claude Opus 4.8. Including this fallback usage, running HLE with Fable 5 costs ~$2.2k, the highest of any model we have evaluated Key model details: ➤ Context window: Claude Fable 5 retains the same 1M token context window as Claude Opus 4.8 ➤ Price: Claude Fable 5 is priced at $10/$50 per 1M input/output tokens, 2x the token price of Claude Opus 4.8. The cache write/read price is $12.50/$1 per million tokens ➤ Availability: Claude Fable 5 is included in Pro, Max, Team, and seat-based Enterprise plans through June 22, consuming 2x Opus usage. From June 23, usage will require credits, with Anthropic saying it plans to restore subscription access once capacity allows

English

793

74.7K

simobis@simobis23·5d

@scaling01 Fallback with opus 4.8?

English

896

Lisan al Gaib@scaling01·5d

Claude 5 Fable ranking 1st on AAI

English

471

24.7K

simobis@simobis23·5d

For the first time, Anthropic releases a model that is SOTA across all benchmarks by a large margin

English

simobis@simobis23·5d

@scaling01 Is this benchmark correlated with SimpleBench

English

123

Lisan al Gaib@scaling01·5d

Mythos 5 is much better at spatial reasoning compared to previous Anthropic models

Lisan al Gaib@scaling01

Claude Mythos & Claude Fable System Card

English

120

7.3K

simobis@simobis23·6 Haz

Big tech's core AI bet has quietly shifted. Winning the model race still matters but it's no longer the only game. The real hedge: own the compute infrastructure. Lose the model war? You still win because whoever builds AGI will need your datacenter to run it.

English

simobis@simobis23·6 Haz

The real benchmark is ARR

English

simobis@simobis23·5 Haz

Some people don't just expect the AI bubble to burst. They're praying for it.

English

simobis@simobis23·5 Haz

@thetreygoff This is why Anthropic has an ARR of $45B: GPT is a professional camera. Claude is an iPhone. The tool that breaks the entry barrier always wins the bigger market.

English

815

Trey Goff@thetreygoff·5 Haz

I don’t know how to put into words why Claude Opus is so much better than GPT So I’ll try to explain with a bunch of examples instead:

English

780

393.1K

simobis@simobis23·4 Haz

@kimmonismus Same price of opus 4 :

English

419

Chubby♨️@kimmonismus·4 Haz

Get ready, friends. Anthropic appears to be preparing the release of its Mythos-level model. Pricing: $16 per 1M input tokens / $80 per 1M output tokens. The release is likely very close, possibly even in the same week as GPT-5.6. Competition is heating up again. Gemini 3.5 Pro is about to face serious pressure. It better be a banger.

sui ☄️@birdabo

‼️it seems Anthropic is ready to publicly launch a new version of Mythos, something better than Mythos Preview. a codenamed model “Oceanus” was given access to some red teamers yesterday according to @synthwavedd. it’s apparently been paused already, due to someone reselling access through a Chinese API proxy lmao 💀 Mythos pricing might also end up at with $16 Input, $80 Output according to @scaling01

English

138

134

2.3K

341.7K

simobis@simobis23·4 Haz

@scaling01 Anthropic’s real goal? Slow down AI progress. Best way they figured? Take the lead, kill off the competition, then hit the brakes.

English

711

Lisan al Gaib@scaling01·4 Haz

Anthropic: "We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development

Anthropic@AnthropicAI

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

English

604

65K

simobis@simobis23·4 Haz

@bindureddy A waste of compute

English

simobis@simobis23·3 Haz

@AnthropicAI This Anthropic report feels like preparation for Mythos. Notice the shift: Not "the model is dangerous." But "the danger comes from agentic scaffolding and deployment." An important distinction when you're about to release a far more capable model.

English

241

Anthropic@AnthropicAI·3 Haz

How well do the security community's techniques hold up against AI-enabled cyberattacks? We examined 832 malicious accounts and mapped their activity onto a longstanding database of tactics and techniques used by threat actors. Here's what we learned:anthropic.com/news/AI-enable…

English

144

163

1.2K

164.2K

simobis@simobis23·3 Haz

The moment you get used to a top-tier AI model, every slightly weaker model suddenly feels like absolute garbage.

English

Descobrir

@ArtificialAnlys @AnthropicAI @scaling01 @thetreygoff @kimmonismus @elonmusk @BarackObama @taylorswift13