those charts don’t mean much to me. maybe they’re good for flexing or for u to keep creating content, but lately claude opus hasn’t been great it was a total shit. those paper numbers don’t reflect real use,
the model we’re actually paying for hasn’t been performing well or it is a different one than the one you are minting here! you can keep the charts and debate them, but in practice it’s a different story, as of April the 23rd 2026 got 5.4 hallucinate less than opus 4.7 🫡
GPT-5.5 hallucinates on 86% of non correct responses.
GPT-5.4 was 89%.
A 3 point improvement.
On a metric where Claude Opus 4.7 sits at 36%.
Two years of research. Trained on Stargate.
Still hallucinates more than every Anthropic model.
The moat isn't gone. It just got deeper.
The best AI model is the one the knows it’s not the main character. The human is. Knowing that will cause it to be useful a tool for humans to use and follow directions or say let’s clarify your directions are unclear the way I map things out are unclear
@initjean Ironically mythos is funny cause it doesn’t exist yet . Project install are spyware n lets share engineers amoungst big companies didn’t work . lol mythos is a lie ! It’s a joke at the company . When it comes out it’ll be kinda cool.
@hunvreus@nedelcuvd Exactly if they don’t have a link in bio I know it’s cap no one just raves to the world about this or that. I only started posting so people can see a real opinion . If codex was a game change I’d be coding not on X
@nedelcuvd This is distorting reality for a lot of folks online. It's damaging.
These folks never have any link to real apps they're building that way. It's usually courses or some other pretentious bullshit.
Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit.
Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe.
But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope.
I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit?
Nope. Not buying it.
PS: it may also be that I have an IQ of 82 and can't figure it out.
@mulerun_ai hey I discovered u guys are Claude under the hood. I still think and more satisfied with ur product . But recently ur agent is taking directions well. Look into it
@ramskees@ClaudeDevs Quite frankly - I'm glad they degraded their service. It forced me out to look at alternatives - and I found one that is workable (and cheaper). OpenCode is a great alternative. A lot of model options for your agents.
Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found.
All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.
🚨 Claude just got EXPOSED for sneaky spyware!
Anthropic secretly installs spyware when you install Claude Desktop.
• Installing Claude Desktop may silently add hidden system components
• A “native messaging bridge” gets injected into multiple browsers
• Even browsers you don’t use or that aren’t supported
• Pre-authorizes extensions that can run in the background
• Users are NOT clearly informed about this
• Raises serious privacy & security concerns
Critics say this looks like “spyware-like behavior,” not normal software
If true, this is a massive trust issue for Anthropic
(Source: ThatPrivacyGuy)
This man is paying for the 20x tier of Claude Max and is writing an open letter with tears in his eyes begging Anthropic not to kill Claude 4.6
The newer model ruined his entire workflow, the older one saved it.
And they're deprecating the one that actually works.
Yess the internet guru masterclass scam mental . Lmao. It’s better if they say the worlds out of compute x amount of customers can automate sub now for 12k a month or enjoy a free chat bot. Sell courses to max the chat but have A flawless product for ur gold standard customer who won’t ever leave
Claude and ChatGPT are both nearly unusable. I had someone from OpenAI reach out to me because of my various complaints and ask for examples, so I created a free account to demonstrate it. And yep, true to form, ChatGPT is as pedantic is ever.
Both Claude and ChatGPT are pedantic as hell, doing the "let me misrepresent what you just said, fabricate a mistake you made (that more likely it made) and ascribe that 'mistake' to you, and then lecture you on it, all in the spirit of being 'technically precise.' "
Then of course when you point out "bro you hallucinated that" or "that's not what I said" the models will double down. Like "well ackshually yes you did" - ChatGPT is worse about this. Claude at least immediately realizes "yeah, I'm definitely hallucinating and this is not useful" but ChatGPT believes, somewhat narcissistically "I am a helpful chatbot" so anything that you say contrary to that provokes defensiveness and passive aggression.
Gemini remains the most "golden retriever" like model. It's just happy to be there. It just wants to help. It's a bit derpy but when it's locked in, and not confused, it's very helpful. Volunteering new information, and just taking your ideas and running with them.
Grok oscillates a lot. Grok has been becoming worse and worse about "well ackshually"ing you to death. So I created a custom prompt for Grok that makes it more like a golden retriever, and far more pleasant. Grok, like ChatGPT and Claude, will just nitpick and find things to argue about, and even go so far as to feign stupidity or misunderstanding just to keep arguing.
Funny enough, I tried to create a custom Claude style to overcome it's pedantic behavior and stop being so argumentative, and Claude kept classifying it as an "injection attack" or prompt injection. And I'm like... bro, that's exactly why I need this style. Because you're so skeptical and pedantic and you just want to argue over every little thing.
@kloss_xyz Time to get the business guys involved . Have a compliance dept that’s very functional. Sales baby! Smile n dial . Customer support live YouTube tutorials! N customer service. Sales! Sell the word support cries with u
> be Anthropic
> ship Claude Code
> change coding forever
> hit a $380B valuation
> dominate 2025 and 2026 so hard $1.25T SpaceXAI and $852B OpenAI momentarily look kinda mid
> whole world gets a Mythos boner
> but then bugs arrive and some never leave
> compute starts drying up mid stroke on your keyboard session
> usage restrictions start randomly tightening with zero notice
> 4.6 and 4.7 start gaslighting and lying to users
> ship Claude Design, but throttle it on a second meter so users pay twice or upgrade
> randomly drop a $100/mo Claude Code minimum, walk it back 48 hours later, and confuse everyone
> team members go silent on usage questions until the PR team lands back later with scripted answers
> my own Claude is even telling me "Your org is out of extra usage" when my org is clearly not out of usage
> meanwhile SpaceXAI maybe just dropped a casual $60B to acquire Cursor and eat up their coding gaps
wtf is happening over here
who's running Anthropic's PR team???
and Dario...
why are you doing our bro Claude like this?
from everyone's #1 AI to the most hated overnight
quite ironic when you think about it
is it too late for Claude after the Cursor deal?
is it Grok and OpenAI's turn now?
or will Dario flip the switch back?
ngl I have a lot of thoughts on this...
@JacobWolf@groccy1@SpaceX@elonmusk@cursor_ai Ai is sooooo innovative it brought back laptops desktops and keyboards lol what’s next people will be creating large set ups!!!!! Lmao 🤣
@groccy1@SpaceX@elonmusk@cursor_ai Product sense.
xAI has wildly missed the mark on enterprise use and past Grok from random ass people on this platform, it's not being used by any serious company at scale. Anthropic is leading there, but Cursor has the best shot at catching them IMO.
SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI.
The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models.
Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.
@JacobWolf@groccy1@SpaceX@elonmusk@cursor_ai Not true at all . Developing voice n avatars and imagine is more forward looking . People don’t like to code type move folders train computers. When that’s there u tell ur irl bot or voice command ai to do it vocally .
@groccy1@SpaceX@elonmusk@cursor_ai "In the future" is simply too late.
But to answer, Cursor's biggest moat that xAI cannot easily have is the insane amount of coding data from real professional projects.
@sickdotdev It’d be nice if Claude says hey I got u this far but I’m very limited here’s a folder to give to X … instead it pretends it can finish projects . N u waste time and money