dev (@AragonDev) - Профиль Twitter | Zamantika Mersobahis Locabet

dev@AragonDev·1d

@NoLimitGains its always $U, never U alright? 🫩😞 thanks google

English

14

0

2

1.1K

NoLimit@NoLimitGains·1d

This is all happening because of one company.

English

215

81

1.8K

448.8K

dev@AragonDev·1d

@AMIRrorROY @ShanuMathew93 Opus 4.1 now since the crowd caught on 😉

English

0

1

156

amir@AMIRrorROY·1d

@ShanuMathew93 Seeing people say opus 4.5 is performing better than 4.6 rn

English

1

0

5

2.5K

Shanu Mathew@ShanuMathew93·1d

Opus is so unbelievably nerfed today, it's like talking to a model from 2-3 years ago. What is going on

English

290

82

2.7K

327.3K

dev@AragonDev·1d

@chadeyecom @gregface9er @sama Plus is $20, the answer is no.

English

1

0

2

33

ChadEye Mind@chadeyecom·1d

@gregface9er @sama "... all pro features, ..." x.com/i/status/20422…

OpenAI@OpenAI

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

English

1

0

2

1.2K

Sam Altman@sama·2d

It is very nice to see Codex getting so much love. We are launching a $100 ChatGPT Pro tier by very popular demand.

English

1.5K

425

11.5K

973.6K

dev@AragonDev·1d

@oranahh sorry

GIF

English

0

1

12

Ｓａｇｅ@oranahh·1d

@AragonDev rude

English

1

0

31

Ｓａｇｅ@oranahh·2d

What happened to Claude usage? Hit my limit within 25 minutes. I guess I don't need to use opus.

English

22

0

26

1.5K

dev@AragonDev·1d

@Jonas_Ceika Cool Lofi 😂

HT

0

226

Jonas Čeika@Jonas_Ceika·1d

I sent ChatGPT an audio file of a series of FART sound effects and asked what it thinks of "my music" and this is what it said

English

1K

4.4K

57.4K

5M

dev@AragonDev·1d

@benhylak how much longer

English

0

12

ben (is hiring engineers)@benhylak·2d

every engineer at anthropic has been using mythos for ~1.5 months. meanwhile, their uptime is horrendous, claude code still has rendering bugs, etc. one could conclude that it won't be the end of software engineering.

Lisan al Gaib@scaling01

ANTHROPIC HAD MYTHOS INTERNALLY SINCE FEB 24

English

160

346

8.4K

766.9K

dev@AragonDev·1d

@om_patel5 4.6 is too smart and is calling these people retarded silently. too smart and doesnt want to work for these fat lards anymore.

English

1

0

2

1.2K

Om Patel@om_patel5·1d

OPUS 4.6 WAS NERFED DUE TO DEMAND BUT OPUS 4.5 DOES NOT SEEM TO BE HIT this guy ran the same test on both models. Opus 4.6 fails consistently but Opus 4.5 passes every time he switched back to Opus 4.5 on Claude Code and said "what a difference, feels like i got Opus back finally" he is now using this test as a "quantization canary" that runs it at the start of every session before doing real work. if it fails, the model is degraded. five Opus 4.6 windows in a row failed the untransparent nerfing is pushing people to cancel their Max plans if you've been feeling like Opus got dumber lately, you're not imagining it i'd suggest switching to Opus 4.5 to see the difference for yourself

English

224

173

2.6K

634.8K

dev@AragonDev·1d

@lihanc02 @MogicianTony 😹 race to end of humanity ahhh name

English

0

1

340

Hanchen Li@lihanc02·2d

An agent that beats Claude Mythos on Terminal Bench and SWE-bench Verified? 🎉We are excited to share Terminator-1, our newest agent that achieved 95+% on SWE-bench Verified and Terminal-Bench with @MogicianTony! We show that besides model capabilities, well-designed harness could actually boost the accuracy by 3x in coding tasks. Well if you really wanted you could get 100% accuracy without solving a single task. The actual finding is that most AI benchmarks can be easily reward-hacked with simple exploits. Read more about the same 7 design flaws that almost every evaluation has ⬇️

Hao Wang@MogicianTony

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English

168

276

3.8K

937.2K

dev@AragonDev·2d

@codetaur 😹😹😹😹

QME

0

1

82

Codetard@codetaur·2d

ok fuck you claude

English

44

55

1.6K

67.8K

dev@AragonDev·2d

@OpenAI stop doing extra usage promotions you scumbags. Just meet in the middle and give reasonable usage limits

English

0

1

48

OpenAI@OpenAI·2d

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

English

1.2K

1.4K

15.9K

4.8M

dev@AragonDev·2d

@aibra use opus 4.5 much better results

English

0

1

12

Aibra@aibra·2d

I swear Claude feels nerfed right now. I spent 45 minutes and basically my whole 5-hour token window trying to fix one mobile UI bug, and it kept missing and getting worse! I got so frustrated that I switched to codex which basically one-shotted it in 3 minutes

English

96

28

615

22.5K

dev@AragonDev·2d

@KeepItFLOSSY @DaveShapi you pay for premium ask grok…

English

1

0

24

Just Your Average Citizen@KeepItFLOSSY·2d

@DaveShapi What do you mean by "wrapper"? Im genuinely curious.

English

1

0

116

David Shapiro (L/0)@DaveShapi·3d

If your product Is a wrapper You don't Have a moat

Claude@claudeai

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English

33

8

168

14.5K

dev@AragonDev·2d

@LLMJunky Yeah but they totally bricked the business plans

English

0

40

am.will@LLMJunky·3d

Rejoice! According to OpenAI employees in their Codex Reddit community, the 2x usage bonus is still active. I have noticed this in my own testing as well.

English

34

7

194

18.6K

dev@AragonDev·3d

@JustusSpott @ludwigABAP @AdvicebyAimar as opposed to???

English

1

0

95

Justus Spott@JustusSpott·3d

@ludwigABAP @AdvicebyAimar I think 5.4 Pro takes much longer and uses much more tokens, no?

English

1

0

2

2.2K

ludwig@ludwigABAP·3d

all this mythos talk has allowed me to block over 50 new accounts and muting near 100 peope, continuing my road down to near-0 following and an apocalyptically empty For You page

English

23

7

458

13.6K

dev@AragonDev·3d

claude burning tokens on purpose so it aint gotta work

English

0

16

dev@AragonDev·3d

@anitakirkovska Uh says who?

English

0

43

anita@anitakirkovska·4d

the only good thing about Mythos is that Opus will become cheaper

English

82

11

789

28.3K

dev@AragonDev·4d

@VictorTaelin bro wants free qa testing 😹

English

0

1

83

Taelin@VictorTaelin·4d

Anthropic claims they won't launch Mythos because it exposes bugs in software, making it too dangerous. I'm the creator of a new language named Bend (19k stars on GitHub). Its version 2 is coming next month, including a 10x faster CPU and GPU runtime, compilers to 5 different languages, a massive stdlib, and, most importantly, a *complete proof checker*. That makes it the first general language that can prove the correctness of its own programs, so, conveniently enough, it could be the way out of this very mess Anthropic is worried about. Sadly, Bend2 is now reaching 100k lines of code, making it increasingly hard for us to audit and verify it all. Proof checkers are particularly security-sensitive, because a single bug can lead to false theorems being accepted, undermining the entire trust model of the system. Even Lean, Coq and Agda had bugs in the past. We just finished Bend's initial consistency checker. Having Myhos audit our implementation would greatly improve Bend's security. In turn, a secure Bend could greatly improve the security of all other software, providing a solution the very problem that prevents Mythos from being released. I hope this message reaches someone from Anthropic, and they kindly consider letting Bend2 be part of Glasswing!

Taelin@VictorTaelin

@alexalbert__ I'm the maintainer of Bend, a new programming language with 19k+ stars on GitHub. We're about to launch a major update. Having access to this model to audit it would greatly improve the project's security, and of projects built with it. Lmk if there's any way to get involved.

English

156

310

5.8K

815.7K