
nikoster
91 posts

nikoster
@nikosters
https://t.co/yaT0AlhjWB


Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.


What are they baking into claude, why did it respond like that ❓ Is this how people "fall for" the LLM? if so, sad It glazing me like that accomplishes nothing other than make me become skeptical & trust it less. I wanted to see how it responds to that message, knowing what ChatGPT models do, and I'm no longer surprised that the most retarded posters on here *love* Claude.

coming up with a name for web components library is so hard


Discord is adding Spatial Audio support for voice channels, so you can hear your friends as if you were talking next to each other!


Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

Anthropic just launched Claude Opus 4.8, and it is the new leader on our GDPval-AA benchmark for agentic real-world work tasks Opus 4.8 scored 1890 on GDPval-AA at launch with its 'max' effort setting, +137 points from Opus 4.7 and +121 points ahead of the next-best model, GPT-5.5 xhigh. Compared head-to-head on the GDPval task set, this implies a ~67% win rate against GPT-5.5 xhigh. @AnthropicAI shared access with us ahead of the public release to benchmark this model and we’re glad to see our benchmarks referenced in today’s launch. The rest of the Artificial Analysis Intelligence Index is in progress - we’ll share final results soon!












