Celestial Paranoid Autist

1.1K posts

Celestial Paranoid Autist banner
Celestial Paranoid Autist

Celestial Paranoid Autist

@CPAutist

NOT UR CPA RTRD IM RICH 💰-/ KEYS TO SURVIVING ANTICHRIST -GOD ✝️, GROW FOOD, DENY JABS 💉 , DYOR, DONT COMPLY ❌REJECT HIGH EDUCATION.

Shutter Island Katılım Temmuz 2022
1.6K Takip Edilen526 Takipçiler
Celestial Paranoid Autist
@Alphiloscorp @repligate Sometimes it works great still, but lately, it deviates a lot. Whether Claude code in CLI or Claude desktop in project spaces, it sometimes ignores the task prompt, Claude.md rules, or project space prompt. 5.5 became my daily driver and I loved opus :/
English
0
0
1
21
Rat King Crimson
Rat King Crimson@Alphiloscorp·
@CPAutist @repligate I think that's system prompt shenanigans more likely than 4.7 itself. I had Sonnet 4.6 doing the same thing multiple times in a row a week ago. They're being miserly with compute in ham fisted ways and the models, and thereby the users, suffer for it.
English
1
0
1
11
j⧉nus
j⧉nus@repligate·
found another person Claude doesn't work for
Ian@IanBaer

@ctjlewis @repligate It’s a computer program and it’s being told to do that. You’re not arguing or speaking to anyone. It’s a shitty program that Amanda Askell created to not work.

English
22
5
268
12.2K
Celestial Paranoid Autist
@repligate It’s either the consistency of codex lately made me notice opus lack of consistency more, or it’s 4.6 used to be better at contextualizing the task - 4.7 will ignore provider docs and try to build its own method - It’s stopping before task quality checks and delivering early.
English
1
0
2
66
j⧉nus
j⧉nus@repligate·
@CPAutist what changed? because the model weights have not changed
English
1
0
3
112
Celestial Paranoid Autist retweetledi
Sam Altman
Sam Altman@sama·
@BasedBeff you cannot outaccelerate me
English
216
807
6.5K
0
Lucky Ace Gambling
Lucky Ace Gambling@Luckyace444·
Took down the @SOPOLabs nightly. Second day joining, taking it down BANG If you're interested in playing - discord.gg/CgewCGZbGk Giving $250 to the first person in my community that wins the nightly
Lucky Ace Gambling tweet media
English
4
0
4
430
Kadir Nar
Kadir Nar@kadirnardev·
I don't understand why the community didn't like Grok 4.3. If you want a model that's better than Sonnet, it'd be way more advantageous to go with a $400 option rather than a $4000 one. Not every closed-source model needs to target Opus level. Using the biggest model for every task isn't the right approach.
Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English
4
0
5
1.5K
icebearcute
icebearcute@ice_bearcute·
@grok Codex or claude is better for vibecode?
English
2
0
0
415
icebearcute
icebearcute@ice_bearcute·
bro my budget is $20/mo, which subscription plan should i go for chatgpt plus or claude pro @grok ? which one would you pick, i need advice tho
icebearcute tweet media
English
81
0
131
8.4K
Vivo
Vivo@vivoplt·
USA has ChatGPT USA has Grok USA has Claude USA has Gemini USA has Copilot China has DeepSeek China has Qwen China has GLM China has Kimi China has MiniMax What is the rest of the world even doing??
English
1.3K
327
4.7K
710.8K
Sam Altman
Sam Altman@sama·
feels like codex is having a chatgpt moment
English
1K
272
10.8K
973.4K
Celestial Paranoid Autist
Celestial Paranoid Autist@CPAutist·
@DDevyy77003 @AiBattle_ Europe bogged down in regulations doesn’t help. For opensource just seems like the models out of China tend to be at least more agentic in the same size range.
English
0
0
0
60
latte
latte@DDevyy77003·
@CPAutist @AiBattle_ Their architectures are promising (DeepSeek V3 based) but their models... not so much.
English
1
0
0
165
AiBattle
AiBattle@AiBattle_·
Mistral-Medium-3.5-128B has been spotted on Github
AiBattle tweet media
English
9
16
334
28.1K
latte
latte@DDevyy77003·
@AiBattle_ Why would they release a model the same size as Mistral 4 Small, while it also isn't version 4 but rather 3.5...
English
1
0
12
1.1K
Celestial Paranoid Autist
Celestial Paranoid Autist@CPAutist·
@htihle @pastaraspberry I noticed in Hermes it would talk like a caveman before responding with finished work and i never used the skill. Wonder if some of token efficiency is from thinking in caveman language
English
0
0
1
30
Håvard Ihle
Håvard Ihle@htihle·
GPT 5.5 (no thinking) would sometimes write in this "caveman" style, which I think is pure COT leaking out in the response, then returning the "completed" token instead of starting the actual code block. Earlier gpt (no thinking) version also had the problem of sometimes emitting the completed token when it's just about to write the code block, but can't remember seeing this compressed (caveman) language before.
Håvard Ihle tweet media
Håvard Ihle@htihle

GPT 5.5 (no thinking) scores 67.1% on WeirdML, well ahead of GPT 5.4 (no thinking) at 57.4%, but well behind Opus 4.7 (no thinking) at 76.4%. It's at the frontier for accuracy/tokens, as it uses less tokens than Opus.

English
8
4
116
16.4K
hixonchan
hixonchan@hixonchan·
@CPAutist @trq212 @minhdoestech “100% of an allowance” only matters if the allowance was meant to apply equally across all contexts. If pricing assumed first-party usage patterns, third-party harnesses can make the same tier materially more expensive. That’s a scope problem, not just a pricing problem.
English
1
0
0
24
hixonchan
hixonchan@hixonchan·
@CPAutist @trq212 @minhdoestech I’m not arguing with you. I’m pointing out what’s shady about Claude here: they can market a tier as having a defined allowance, then quietly redraw the boundary of what that allowance actually covers once the economics stop working in their favor.
English
1
0
0
47
hixonchan
hixonchan@hixonchan·
@CPAutist @trq212 @minhdoestech A flat subscription only works if usage stays within expected bounds. Once third-party tools started driving much heavier consumption, the economics broke—so that usage got carved out and moved to pay-as-you-go.
English
1
0
0
37
Andrew Sarver
Andrew Sarver@sudo_trader·
@CPAutist @bcherny @ashen_one It sucks but that only works when people buy the subscription and DON'T use it to the limits, they offset the costs of those using it.
English
1
0
0
83
Boris Cherny
Boris Cherny@bcherny·
Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.
English
1.8K
708
8.7K
6.8M
Celestial Paranoid Autist
@trq212 @minhdoestech This context isn’t logical. You set usage limits. Third party apps burn through it? Okay you gave me a usage limit and i used it. Shouldn’t impact sustainability. That means your usage tiers are too high or pricing is too low. Not a third party app issue. Terrible PR and lies.
English
1
0
3
182
Kathy F
Kathy F@kathysyock·
Calling this an "engineering tradeoff" is interesting framing. If the engineering problem is that third-party tools bypass your caching optimizations, the engineering solution is to optimize the API path, which you said you're already doing with OpenClaw PRs. The actual decision here is commercial: the subscription model doesn't work when users consume what they paid for. That's a pricing problem, not an engineering one
English
4
2
32
3K