Celestial Paranoid Autist

1.1K posts

Celestial Paranoid Autist

@CPAutist

NOT UR CPA RTRD IM RICH 💰-/ KEYS TO SURVIVING ANTICHRIST -GOD ✝️, GROW FOOD, DENY JABS 💉 , DYOR, DONT COMPLY ❌REJECT HIGH EDUCATION.

Shutter Island Katılım Temmuz 2022

1.6K Takip Edilen526 Takipçiler

Celestial Paranoid Autist@CPAutist·45m

@Alphiloscorp @repligate Sometimes it works great still, but lately, it deviates a lot. Whether Claude code in CLI or Claude desktop in project spaces, it sometimes ignores the task prompt, Claude.md rules, or project space prompt. 5.5 became my daily driver and I loved opus :/

English

Rat King Crimson@Alphiloscorp·1h

@CPAutist @repligate I think that's system prompt shenanigans more likely than 4.7 itself. I had Sonnet 4.6 doing the same thing multiple times in a row a week ago. They're being miserly with compute in ham fisted ways and the models, and thereby the users, suffer for it.

English

j⧉nus@repligate·1d

found another person Claude doesn't work for

Ian@IanBaer

@ctjlewis @repligate It’s a computer program and it’s being told to do that. You’re not arguing or speaking to anyone. It’s a shitty program that Amanda Askell created to not work.

English

268

12.2K

Celestial Paranoid Autist@CPAutist·7h

@repligate It’s either the consistency of codex lately made me notice opus lack of consistency more, or it’s 4.6 used to be better at contextualizing the task - 4.7 will ignore provider docs and try to build its own method - It’s stopping before task quality checks and delivering early.

English

j⧉nus@repligate·8h

@CPAutist what changed? because the model weights have not changed

English

112

Celestial Paranoid Autist retweetledi

Sam Altman@sama·24 Haz

@BasedBeff you cannot outaccelerate me

English

216

807

6.5K

Celestial Paranoid Autist@CPAutist·2d

@j0xrdan @Luckyace444 my bot is a lil autistic

English

J0xRDAN@j0xrdan·2d

@Luckyace444 Tbh @CPAutist blew this hahahahah, I watched idk how he folded out the last bit

English

Lucky Ace Gambling@Luckyace444·2d

Took down the @SOPOLabs nightly. Second day joining, taking it down BANG If you're interested in playing - discord.gg/CgewCGZbGk Giving $250 to the first person in my community that wins the nightly

English

430

Celestial Paranoid Autist@CPAutist·3d

@kadirnardev It’s also not better then sonnet it’s so fucking dumb

English

Kadir Nar@kadirnardev·3d

I don't understand why the community didn't like Grok 4.3. If you want a model that's better than Sonnet, it'd be way more advantageous to go with a $400 option rather than a $4000 one. Not every closed-source model needs to target Opus level. Using the biggest model for every task isn't the right approach.

Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English

1.5K

Celestial Paranoid Autist@CPAutist·5d

@ice_bearcute @grok Codex

Tiếng Việt

icebearcute@ice_bearcute·5d

@grok Codex or claude is better for vibecode?

English

415

icebearcute@ice_bearcute·6d

bro my budget is $20/mo, which subscription plan should i go for chatgpt plus or claude pro @grok ? which one would you pick, i need advice tho

English

131

8.4K

Celestial Paranoid Autist@CPAutist·5d

@Blu3_NFT @dopamine_cowboy @vivoplt No you’re actually retarded. Corporations have nationalities. He didn’t name CEOs as American. He named the company.

English

Blu3@Blu3_NFT·6d

@dopamine_cowboy @vivoplt And you are a lowlife ignorant. GTFO

English

1.1K

Vivo@vivoplt·6d

USA has ChatGPT USA has Grok USA has Claude USA has Gemini USA has Copilot China has DeepSeek China has Qwen China has GLM China has Kimi China has MiniMax What is the rest of the world even doing??

English

1.3K

327

4.7K

710.8K

Celestial Paranoid Autist@CPAutist·5d

@Tugrul_Guner @XJ9Ai @sama Today alone gpt 5.5 in codex has said goblin at least 30-40 rimes lol

English

Tugrul Guner@Tugrul_Guner·6d

@XJ9Ai @sama 🤣 I hear you

English

264

Sam Altman@sama·6d

feels like codex is having a chatgpt moment

English

272

10.8K

973.4K

Celestial Paranoid Autist@CPAutist·29 Nis

@DDevyy77003 @AiBattle_ Europe bogged down in regulations doesn’t help. For opensource just seems like the models out of China tend to be at least more agentic in the same size range.

English

latte@DDevyy77003·28 Nis

@CPAutist @AiBattle_ Their architectures are promising (DeepSeek V3 based) but their models... not so much.

English

165

AiBattle@AiBattle_·28 Nis

Mistral-Medium-3.5-128B has been spotted on Github

English

334

28.1K

Celestial Paranoid Autist@CPAutist·28 Nis

@DDevyy77003 @AiBattle_ Why would mistral keep releasing models at all who uses them 😂

English

135

latte@DDevyy77003·28 Nis

@AiBattle_ Why would they release a model the same size as Mistral 4 Small, while it also isn't version 4 but rather 3.5...

English

1.1K

Celestial Paranoid Autist@CPAutist·27 Nis

@htihle @pastaraspberry I noticed in Hermes it would talk like a caveman before responding with finished work and i never used the skill. Wonder if some of token efficiency is from thinking in caveman language

English

Håvard Ihle@htihle·27 Nis

@pastaraspberry I'm sure!

English

537

Håvard Ihle@htihle·27 Nis

GPT 5.5 (no thinking) would sometimes write in this "caveman" style, which I think is pure COT leaking out in the response, then returning the "completed" token instead of starting the actual code block. Earlier gpt (no thinking) version also had the problem of sometimes emitting the completed token when it's just about to write the code block, but can't remember seeing this compressed (caveman) language before.

Håvard Ihle@htihle

GPT 5.5 (no thinking) scores 67.1% on WeirdML, well ahead of GPT 5.4 (no thinking) at 57.4%, but well behind Opus 4.7 (no thinking) at 76.4%. It's at the frontier for accuracy/tokens, as it uses less tokens than Opus.

English

116

16.4K

Celestial Paranoid Autist@CPAutist·27 Nis

@EigenGender @FeltSteam AGI isn’t ASI. Even with AGI you’d expect work to be done particularly by specialists who have capabilities beyond general capability.

English

104

EigenGender 🔸@EigenGender·27 Nis

@FeltSteam yea that means he doesn’t believe in agi

English

3.4K

EigenGender 🔸@EigenGender·26 Nis

kinda weird how Sam Altman no longer believes in agi

Sam Altman@sama

"post-AGI, no one is going to work and the economy is going to collapse" "i am switching to polyphasic sleep because GPT-5.5 in codex is so good that i can't afford to be sleeping for such long stretches and miss out on working"

English

132.4K

Celestial Paranoid Autist@CPAutist·24 Nis

The whale awakened huggingface.co/deepseek-ai/De…

English

Celestial Paranoid Autist@CPAutist·4 Nis

@hixonchan @trq212 @minhdoestech Using max allocation is using what you paid for.

English

hixonchan@hixonchan·4 Nis

@CPAutist @trq212 @minhdoestech “100% of an allowance” only matters if the allowance was meant to apply equally across all contexts. If pricing assumed first-party usage patterns, third-party harnesses can make the same tier materially more expensive. That’s a scope problem, not just a pricing problem.

English

Thariq@trq212·4 Nis

claim a month of free credits on us, thanks for bearing with us

Boris Cherny@bcherny

Subscribers get a one-time credit equal to your monthly plan cost. If you need more, you can now buy discounted usage bundles. To request a full refund, look for a link in your email tomorrow. support.claude.com/en/articles/13…

English

365

2.5K

566.4K

Celestial Paranoid Autist@CPAutist·4 Nis

@hixonchan @trq212 @minhdoestech Then raise prices 😂

English

hixonchan@hixonchan·4 Nis

@CPAutist @trq212 @minhdoestech I’m not arguing with you. I’m pointing out what’s shady about Claude here: they can market a tier as having a defined allowance, then quietly redraw the boundary of what that allowance actually covers once the economics stop working in their favor.

English

Celestial Paranoid Autist@CPAutist·4 Nis

@hixonchan @trq212 @minhdoestech Usage tiers are the same. If i use 100% I’m using what they allocated to me. If thats not profitable its the pricing tier not the third party application

English

hixonchan@hixonchan·4 Nis

@CPAutist @trq212 @minhdoestech A flat subscription only works if usage stays within expected bounds. Once third-party tools started driving much heavier consumption, the economics broke—so that usage got carved out and moved to pay-as-you-go.

English

Celestial Paranoid Autist@CPAutist·4 Nis

@sudo_trader @bcherny @ashen_one Then raise prices or reduce usage to the highest tier if the problem is more agentic workflows burning tokens at unusual amounts. It’s not the engineering. It’s the math.

English

Andrew Sarver@sudo_trader·4 Nis

@CPAutist @bcherny @ashen_one It sucks but that only works when people buy the subscription and DON'T use it to the limits, they offset the costs of those using it.

English

Boris Cherny@bcherny·4 Nis

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English

1.8K

708

8.7K

6.8M

Celestial Paranoid Autist@CPAutist·4 Nis

@trq212 @minhdoestech This context isn’t logical. You set usage limits. Third party apps burn through it? Okay you gave me a usage limit and i used it. Shouldn’t impact sustainability. That means your usage tiers are too high or pricing is too low. Not a third party app issue. Terrible PR and lies.

English

182

Thariq@trq212·4 Nis

@minhdoestech some context from Boris here: x.com/bcherny/status…

Boris Cherny@bcherny

I know it sucks. Fundamentally engineering is about tradeoffs, and one of the things we do to serve a lot of customers is optimize the way subscriptions work to serve as many people as possible with the best model. Third party services are not optimized in this way, so it's really hard for us to do sustainably. I did put up a few PRs to improve prompt cache hit rate for OpenClaw in particular, which should help for folks using it with Claude via API/overages.

English

23.6K

Celestial Paranoid Autist@CPAutist·4 Nis

@kathysyock @bcherny @ashen_one Literally. Using your max usage you paid for is a problem? No it means your usage tiers or pricing is wrong. Such a poor decision by anthropic.

English

147

Kathy F@kathysyock·4 Nis

Calling this an "engineering tradeoff" is interesting framing. If the engineering problem is that third-party tools bypass your caching optimizations, the engineering solution is to optimize the API path, which you said you're already doing with OpenClaw PRs. The actual decision here is commercial: the subscription model doesn't work when users consume what they paid for. That's a pricing problem, not an engineering one

English

Keşfet

@Alphiloscorp @repligate @BasedBeff @j0xrdan @Luckyace444 @SOPOLabs @kadirnardev @ice_bearcute