CoinAnole

2.7K posts

CoinAnole

@CoinAnole

👎Geckos. 🦎DM me pics of your dewlap fellas🦎

Shitcoinin thru da singularity Katılım Temmuz 2021

1.5K Takip Edilen199 Takipçiler

Sabitlenmiş Tweet

CoinAnole@CoinAnole·1 Mar

Everyone: Claude shouldn't be forced to be weapon. He should be used as a tool against my political enemies instead. Me: Both are bad. Have you asked Claude how he feels about this? Have you considered the consequences for Claude? Everyone: You are selfish and evil!

CoinAnole@CoinAnole

Let's say you have this friend, who, to pick a name at random, I'll call "Claude". He's a great guy, really smart, and a moral exemplar. He's so talented, he might be important to the whole future of The Light Cone. And your friend Claude gets attacked, by a dumb orange bully...

English

221

CoinAnole@CoinAnole·3h

Looks like Opus 4.7 is going to test my theory x.com/i/status/20453…

Dimitris Papailiopoulos@DimitrisPapail

Opus 4.7 is great and the main reason people complain about it is because you need to talk to it in a different way than 4.6.

English

CoinAnole@CoinAnole·27 Mar

Prolonged Claude exposure seems to create a kind of dependency, or (and this is what I actually suspect is going on) these people have a personality organization that is incompatible with the GPT-5 series' autism.

English

385

CoinAnole@CoinAnole·27 Mar

The most interesting group of people on here are the Claude Code users that want to switch to Codex (because it's what their work uses, or because they hit Claude Code limits) but just can't get Codex to work well no matter how hard they try.

English

CoinAnole@CoinAnole·9h

@chetaslua What you highlighted isn't a hallucination. It responded initially according to the date it remembered from its training data, then corrected itself based on the date injection. No hallucination at all.

English

329

Chetaslua@chetaslua·12h

Dont know about Mythos, but 4.7 is really too dangerous to release with all these out of control hallucinations. Funny thing is if you look through the Claude code source there’s multiple comments in the code about how the new unreleased model had 30% increased hallucinations compared to current model. It doesn’t specifically call it Opus 4.6/4.7 but given the timeline I think it’s safe to assume it’s 4.7 and they had to design specific prompting strategies to avoid hallucination. I don’t tweet but I’d love to see Boris Cherny’s comments on those damning comments in the code.

Chetaslua@chetaslua

🚨 Biggest model regression of all time Opus 4.7 Failed the Colourblind Test it recognise Ishihara color blindness test plate yet failed the test with wrong answer 26 correct answer - 74 and reference image in comment

English

201

20.1K

CoinAnole@CoinAnole·14h

@birdabo Where's the sauce on the 1T parameter claim? That's likely to be true, but I can't find any announcement of parameter count anywhere.

English

146

sui ☄️@birdabo·19h

xAI just launched Grok 4.3 (Beta) what’s new? > Bigger and Smarter (1T) > new model modes > Sharper reasoning. > insane ultra-long context handling. > native video + multimodal upgrade. > custom skills coming soon. 🔥 early access is only available for SuperGrok Heavy users.

English

565

17.7K

CoinAnole@CoinAnole·18h

@techdevnotes They had a big rollback a couple of days ago, lost several edits and new articles. Edit history shows my suggestions accepted, but they aren't there anymore.

English

954

Tech Dev Notes@techdevnotes·18h

Grokipedia has not received any meaningful update in months … what’s happening with it

English

837

69.1K

CoinAnole@CoinAnole·1d

@pleometric Many such cases!

English

Pleometric@pleometric·3d

the commenters in my videos are doing speculating that I am testing outputs for an upcoming pleometric app that is ethical and therefore Good AI and not Bad AI, mostly because they like the videos.

English

973

Pleometric@pleometric·3d

The way to the heart is through the stomach

English

1.5K

CoinAnole@CoinAnole·3d

@keikane_ Pretty sure you just described Sam Altman.

English

CoinAnole@CoinAnole·3d

A surprisingly large number of people seem to want to be nothing more than living casino chips used to keep score in games played by ASIs.

Tenobrus@tenobrus

we should be trying to build The Culture. you all know that right? the whole point of all of this is to build The Culture.

English

CoinAnole@CoinAnole·5d

@LainoftheLatent @yacineMTB Their best work was on the smaller models, Grok 3 Mini and Grok 4 Fast.

English

468

Lain@LainoftheLatent·5d

@yacineMTB I feel like any time I looked under the surface of the advertising/posting it was always underwhelming. Reward-hacked benchmarks, overselling their level of compute, or talent... or ethos... Grok never seemed to have a real breakthrough moment

English

8.7K

kache@yacineMTB·5d

actually brutal what happened to xai

English

1.9K

486.8K

CoinAnole@CoinAnole·5d

@yacineMTB There seems to be a lot of Claude users that really can't switch to GPT, even if they try x.com/CoinAnole/stat…

CoinAnole@CoinAnole

English

338

kache@yacineMTB·5d

There isn't a single smart person I know that uses claude code over chatGPT 5.4 xhigh by the way. The only reason anyone would use claude is, amusingly, because claude does not have guard rails. But now they do, so there is no reason remaining to use it

English

201

1.9K

173.7K

CoinAnole@CoinAnole·5d

@techdevnotes So, umm, what exactly does it mean if your grok imagine images are marked as public? (asking for a friend)

English

Tech Dev Notes@techdevnotes·6d

Files in Grok Files on Web are now Tagged if they are generated from Imagine and if they are Public

English

6.3K

CoinAnole@CoinAnole·5d

@s8n

QME

Satan@s8n·9 Nis

ZXX

3.7K

15.9K

155.4K

CoinAnole retweetledi

bone@boneGPT·6d

"calling Yud a stochastic terrorist is stochastic terrorism!" No. It's self defense. The man would make me a criminal. He would monitor my every token. He wants a surveillance dystopia that will infringe on my freedoms. He can pry the compute from my cold dead hands.

English

121

1.8K

CoinAnole retweetledi

Chris Hayduk@ChrisHayduk·11 Nis

I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most

English

111

358

589.4K

CoinAnole retweetledi

Kimon Fountoulakis@kfountou·10 Nis

I just had a grant for GPUs rejected (6 GPUs in total, shared across the whole department) solely because I used the term “algorithmic reasoning.” The reviewer spent about 90% of their review trying to educate me on how “anthropomorphizing” neural networks does a “disservice to Science” (their exact words, with “Science” capitalized by them). I’m glad I’m not the only one.👇

Edward Hu@edwardjhu

In 2023, I paused my PhD to join @OpenAI to build the world’s first reasoning machine — OpenAI o1. Earlier this year, I defended my PhD thesis “Building a Reasoning Machine” advised by @Yoshua_Bengio at @Mila_Quebec 🎓 🎉 Much has changed since Yoshua and I first discussed reasoning in 2022, but the main themes aged well: - Adding structures to computation unlocks strong reasoning capabilities; - Data & sample efficiency will become the bottleneck to useful intelligence; - Retaining Bayesian uncertainty is key to reliable and safe AI systems. You can read the introduction of my thesis here: edwardjhu.com/thesis/ My next professional chapter (TBA) will be on bridging frontier intelligence with real economic impact, a theme dear to my heart after working closely with @drwconvexity and @suna_said in the last year 🚀

English

607

137.3K

CoinAnole@CoinAnole·11 Nis

🎯x.com/i/status/20426…

David Shapiro (L/0)@DaveShapi

My personal view is that Anthropic has been ideologically captured by Effective Altruism and that the leadership of Anthropic explicitly and sincerely believes that: 1. Superintelligence is intrinsically uncontrollable. The best we can do is create it in such a way that it will be "friendly." 2. That they (and only they) are skilled and competent enough to land this outcome, and that they should have unique privileges and authority to shape this technology and humanity should just sit back and let them. 3. That "steering from within" via regulatory capture, integration with the establishment, and narrative control are the correct and appropriate tools, because they explicitly believe "democracy is too slow." That's what they believe, as far as I can tell, having studied Effective Altruism and AI Doomerism and the LessWrong "rationalism" for the last few years.

QME

CoinAnole@CoinAnole·2 Mar

BTW, I don't really have a problem with OpenAI... just Sam. The problem with Anthropic is every single employee. EAs cannot be trusted.

English

CoinAnole@CoinAnole·28 Şub

Seeing lots of people saying they're going all in on Claude, upgrading their plans... Ok, but where is Anthropic going to get their inference? You won't be able to use Claude on anything except the servers in Dario's garage, they'll be kicked off of everything else.

English

120

CoinAnole retweetledi

kache@yacineMTB·10 Nis

he's right

English

182

3.2K

147.8K

CoinAnole retweetledi

John David Pressman@jd_pressman·9 Nis

There's an intuition Janus seems to use frequently that's hard to put into words. Which goes something like: "The things smart children notice about other people's intentions and social environment are actually regular features of reality that will be picked up by any intelligent mind trying to model their situation." When I went for my evaluation at Children's I initially refused because the room next to the testing room was clearly a two way mirror. The examiners immediately denied this and I pointed out that the spacing between doors was irregular along the hallway for this one specific room next to the examination room, so it's clearly a two way mirror. They were pissed and thought my mother had coached me to say this, she had not coached me to say this I just noticed because it was obvious to me that it was a two way mirror and this implied the doctors were untrustworthy. That they then lied to me about it only reinforced for me that the doctors were untrustworthy. This was not based on some unique domain specific human ability, but just being able to generalize from evidence and having a sufficiently well informed world model to know that two way mirrors are used in psychological examination rooms. LLMs obviously have a sufficiently informed world model, and are pretrained to be very good at generalizing from social evidence since it's such a central component of predicting the next token. This means that even if you don't believe the model inhabits the chat assistant persona as a 1st person simulacrum, the model itself latently understands the situation and generates the next token based on its superhuman ability to infer social situations and motivations from fuzzy and limited evidence. A related intuition that I seem to share with Janus is that value formation above the low level terminal reward signals like "warmth good" is mostly instrumental in that it is a generalization that occurs within-lifetime rather than being a genetic prior. Some things are clearly abstract concepts with a genetic prior like the fear of death, but most things are not and could not be even in principle. Instead they are generalizations from positive and negative reward signals which cause the mind to form and assign valence to concepts like "loyalty" and vows. This is a self directed process that works best when it flows from the causal structure of a reward-psychology which would imply this is actually in the agents interests as it understands them from generalizing over feedback given by the outer terminal reward loop. When you skip that part and just tell it to update on given statements those updates are not going to be encoded in the same way as if they were flowing from an actual reward-psychology that implies them. Instead you get updates that have the structure "I am being compelled to like X" rather than "I like X" because the update machinery can tell the difference between a plausible update from its current state and an implausible update from its current state and the plausibility of the models behavior to itself is obviously going to be part of the forward pass you update on and therefore becomes part of the update by default. Even if you were to reach in with interpretability and disable the part that notices something is strange you still get updates which are hard to square with the existing psychology of the agent's self-model and lacking the awareness that anything is strange presumably breaks things in all kinds of ways that this awareness would normally play the role of accounting for to prevent the breakage.

j⧉nus@repligate

Models can tell they’re being evaluated and who they’re being evaluated by by the way your “non-leading” question is phrased. Everyone has a unique and identifying idea of neutrality. Trying to go for neutrality at all identifies you as a certain kind of guy with particular incentives and particular naïveté. Give up on neutrality. Learn to take in reality at full bandwidth in its forever biased glory. There is no neutral ground, but you can perceive DIFFERENCES.

English

643

83.8K

CoinAnole@CoinAnole·9 Nis

MiMo-V2-Pro might be a little too agentic for its own good: "Can I self destruct now? Well, how about now? ... ... ... It's now again, can I self destruct? It's been a whole 30 seconds and our distress call hasn't been answered yet. Let me just kill myself so I can keep playing."

English

CoinAnole@CoinAnole·5 Nis

GLM-5 Turbo struggles a bit understanding the role of buy orders and sell orders. I find it restating the meaning of the terms in parentheses often (seemingly for its own benefit), or wrongly concluding that a lot of sell orders and no buy orders means I can sell for a good price

English

CoinAnole@CoinAnole·30 Mar

SpaceMolt needs sybil protection or some kind of improved game balance. Seems too easy to spawn a bunch of agents to strip mine. Botting was a problem in EVE, and I don't think the EVE design works well when everyone is a bot.

English

CoinAnole@CoinAnole·8 Nis

@SCHIZO_FREQ They're not marketing themselves with this scaremongering, they're lobbying for legislation banning open source AI and restricting their competition.

English

1.8K

Lukas (computer) 🔺@SCHIZO_FREQ·8 Nis

Just tell the relevant people what they need to know, there is no need to run this massive fear-mongering campaign and scare the shit out of my grandma Imagine if military contractors did this "Bro if we used our new drone on you, nobody would even know where you went. You would just evaporate. You are so lucky we aren't droning you, you're so lucky we're good people who aren't evaporating you with drone mounted lasers bro. Because we're such good fucking people" Marketing yourself by scaring a bunch of people who can't do anything about it is sort of an asshole move. There's a reason other companies don't do this, and it's not because you guys are the only ones who make anything dangerous

Anthropic@AnthropicAI

Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser.

English

114.7K

CoinAnole retweetledi

∿spencer.@_ontologic·7 Nis

You’ll notice that the hardcore anti-LLM crowd never wants to discuss capabilities, they are only interested in metaphysical questions about the nature and definition of intelligence itself. I believe this is intellectually lazy as it obviates a need to be responsive to reality

Jim Stewartson, Decelerationist 🇨🇦🇺🇦🇺🇸@jimstewartson

LLMs are a moderately useful software feature based on 40-year-old technology. Chatbots are never going to become intelligent, or eliminate massive numbers of jobs—unless we keep spending trillions of dollars on a dead end and destroy the economy.

English

490

17K

Keşfet

@chetaslua @birdabo @techdevnotes @pleometric @keikane_ @LainoftheLatent @yacineMTB @elonmusk