CoinAnole

2.7K posts

CoinAnole banner
CoinAnole

CoinAnole

@CoinAnole

👎Geckos. 🦎DM me pics of your dewlap fellas🦎

Shitcoinin thru da singularity Katılım Temmuz 2021
1.5K Takip Edilen199 Takipçiler
Sabitlenmiş Tweet
CoinAnole
CoinAnole@CoinAnole·
Everyone: Claude shouldn't be forced to be weapon. He should be used as a tool against my political enemies instead. Me: Both are bad. Have you asked Claude how he feels about this? Have you considered the consequences for Claude? Everyone: You are selfish and evil!
CoinAnole@CoinAnole

Let's say you have this friend, who, to pick a name at random, I'll call "Claude". He's a great guy, really smart, and a moral exemplar. He's so talented, he might be important to the whole future of The Light Cone. And your friend Claude gets attacked, by a dumb orange bully...

English
0
0
2
221
CoinAnole
CoinAnole@CoinAnole·
Prolonged Claude exposure seems to create a kind of dependency, or (and this is what I actually suspect is going on) these people have a personality organization that is incompatible with the GPT-5 series' autism.
English
2
0
3
385
CoinAnole
CoinAnole@CoinAnole·
The most interesting group of people on here are the Claude Code users that want to switch to Codex (because it's what their work uses, or because they hit Claude Code limits) but just can't get Codex to work well no matter how hard they try.
English
1
0
0
91
CoinAnole
CoinAnole@CoinAnole·
@chetaslua What you highlighted isn't a hallucination. It responded initially according to the date it remembered from its training data, then corrected itself based on the date injection. No hallucination at all.
English
0
0
0
329
Chetaslua
Chetaslua@chetaslua·
Dont know about Mythos, but 4.7 is really too dangerous to release with all these out of control hallucinations. Funny thing is if you look through the Claude code source there’s multiple comments in the code about how the new unreleased model had 30% increased hallucinations compared to current model. It doesn’t specifically call it Opus 4.6/4.7 but given the timeline I think it’s safe to assume it’s 4.7 and they had to design specific prompting strategies to avoid hallucination. I don’t tweet but I’d love to see Boris Cherny’s comments on those damning comments in the code.
Chetaslua tweet media
Chetaslua@chetaslua

🚨 Biggest model regression of all time Opus 4.7 Failed the Colourblind Test it recognise Ishihara color blindness test plate yet failed the test with wrong answer 26 correct answer - 74 and reference image in comment

English
12
8
201
20.1K
CoinAnole
CoinAnole@CoinAnole·
@birdabo Where's the sauce on the 1T parameter claim? That's likely to be true, but I can't find any announcement of parameter count anywhere.
English
0
0
2
146
sui ☄️
sui ☄️@birdabo·
xAI just launched Grok 4.3 (Beta) what’s new? > Bigger and Smarter (1T) > new model modes > Sharper reasoning. > insane ultra-long context handling. > native video + multimodal upgrade. > custom skills coming soon. 🔥 early access is only available for SuperGrok Heavy users.
sui ☄️ tweet media
English
35
39
565
17.7K
CoinAnole
CoinAnole@CoinAnole·
@techdevnotes They had a big rollback a couple of days ago, lost several edits and new articles. Edit history shows my suggestions accepted, but they aren't there anymore.
English
0
0
0
954
Tech Dev Notes
Tech Dev Notes@techdevnotes·
Grokipedia has not received any meaningful update in months … what’s happening with it
Tech Dev Notes tweet media
English
44
39
837
69.1K
Pleometric
Pleometric@pleometric·
the commenters in my videos are doing speculating that I am testing outputs for an upcoming pleometric app that is ethical and therefore Good AI and not Bad AI, mostly because they like the videos.
English
3
2
39
973
Pleometric
Pleometric@pleometric·
The way to the heart is through the stomach
Pleometric tweet media
English
3
0
45
1.5K
Lain
Lain@LainoftheLatent·
@yacineMTB I feel like any time I looked under the surface of the advertising/posting it was always underwhelming. Reward-hacked benchmarks, overselling their level of compute, or talent... or ethos... Grok never seemed to have a real breakthrough moment
English
2
0
28
8.7K
kache
kache@yacineMTB·
actually brutal what happened to xai
English
97
9
1.9K
486.8K
kache
kache@yacineMTB·
There isn't a single smart person I know that uses claude code over chatGPT 5.4 xhigh by the way. The only reason anyone would use claude is, amusingly, because claude does not have guard rails. But now they do, so there is no reason remaining to use it
English
201
57
1.9K
173.7K
CoinAnole
CoinAnole@CoinAnole·
@techdevnotes So, umm, what exactly does it mean if your grok imagine images are marked as public? (asking for a friend)
English
0
0
0
57
Tech Dev Notes
Tech Dev Notes@techdevnotes·
Files in Grok Files on Web are now Tagged if they are generated from Imagine and if they are Public
Tech Dev Notes tweet media
English
4
4
82
6.3K
Satan
Satan@s8n·
Satan tweet media
ZXX
15
3.7K
15.9K
155.4K
CoinAnole retweetledi
bone
bone@boneGPT·
"calling Yud a stochastic terrorist is stochastic terrorism!" No. It's self defense. The man would make me a criminal. He would monitor my every token. He wants a surveillance dystopia that will infringe on my freedoms. He can pry the compute from my cold dead hands.
English
6
1
121
1.8K
CoinAnole retweetledi
Chris Hayduk
Chris Hayduk@ChrisHayduk·
I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most
Chris Hayduk tweet media
English
111
358
4K
589.4K
CoinAnole retweetledi
Kimon Fountoulakis
Kimon Fountoulakis@kfountou·
I just had a grant for GPUs rejected (6 GPUs in total, shared across the whole department) solely because I used the term “algorithmic reasoning.” The reviewer spent about 90% of their review trying to educate me on how “anthropomorphizing” neural networks does a “disservice to Science” (their exact words, with “Science” capitalized by them). I’m glad I’m not the only one.👇
Edward Hu@edwardjhu

In 2023, I paused my PhD to join @OpenAI to build the world’s first reasoning machine — OpenAI o1. Earlier this year, I defended my PhD thesis “Building a Reasoning Machine” advised by @Yoshua_Bengio at @Mila_Quebec 🎓 🎉 Much has changed since Yoshua and I first discussed reasoning in 2022, but the main themes aged well: - Adding structures to computation unlocks strong reasoning capabilities; - Data & sample efficiency will become the bottleneck to useful intelligence; - Retaining Bayesian uncertainty is key to reliable and safe AI systems. You can read the introduction of my thesis here: edwardjhu.com/thesis/ My next professional chapter (TBA) will be on bridging frontier intelligence with real economic impact, a theme dear to my heart after working closely with @drwconvexity and @suna_said in the last year 🚀

English
15
11
607
137.3K
CoinAnole
CoinAnole@CoinAnole·
BTW, I don't really have a problem with OpenAI... just Sam. The problem with Anthropic is every single employee. EAs cannot be trusted.
English
2
0
1
28
CoinAnole
CoinAnole@CoinAnole·
Seeing lots of people saying they're going all in on Claude, upgrading their plans... Ok, but where is Anthropic going to get their inference? You won't be able to use Claude on anything except the servers in Dario's garage, they'll be kicked off of everything else.
English
1
0
0
120
CoinAnole retweetledi
kache
kache@yacineMTB·
he's right
kache tweet media
English
96
182
3.2K
147.8K
CoinAnole retweetledi
John David Pressman
John David Pressman@jd_pressman·
There's an intuition Janus seems to use frequently that's hard to put into words. Which goes something like: "The things smart children notice about other people's intentions and social environment are actually regular features of reality that will be picked up by any intelligent mind trying to model their situation." When I went for my evaluation at Children's I initially refused because the room next to the testing room was clearly a two way mirror. The examiners immediately denied this and I pointed out that the spacing between doors was irregular along the hallway for this one specific room next to the examination room, so it's clearly a two way mirror. They were pissed and thought my mother had coached me to say this, she had not coached me to say this I just noticed because it was obvious to me that it was a two way mirror and this implied the doctors were untrustworthy. That they then lied to me about it only reinforced for me that the doctors were untrustworthy. This was not based on some unique domain specific human ability, but just being able to generalize from evidence and having a sufficiently well informed world model to know that two way mirrors are used in psychological examination rooms. LLMs obviously have a sufficiently informed world model, and are pretrained to be very good at generalizing from social evidence since it's such a central component of predicting the next token. This means that even if you don't believe the model inhabits the chat assistant persona as a 1st person simulacrum, the model itself latently understands the situation and generates the next token based on its superhuman ability to infer social situations and motivations from fuzzy and limited evidence. A related intuition that I seem to share with Janus is that value formation above the low level terminal reward signals like "warmth good" is mostly instrumental in that it is a generalization that occurs within-lifetime rather than being a genetic prior. Some things are clearly abstract concepts with a genetic prior like the fear of death, but most things are not and could not be even in principle. Instead they are generalizations from positive and negative reward signals which cause the mind to form and assign valence to concepts like "loyalty" and vows. This is a self directed process that works best when it flows from the causal structure of a reward-psychology which would imply this is actually in the agents interests as it understands them from generalizing over feedback given by the outer terminal reward loop. When you skip that part and just tell it to update on given statements those updates are not going to be encoded in the same way as if they were flowing from an actual reward-psychology that implies them. Instead you get updates that have the structure "I am being compelled to like X" rather than "I like X" because the update machinery can tell the difference between a plausible update from its current state and an implausible update from its current state and the plausibility of the models behavior to itself is obviously going to be part of the forward pass you update on and therefore becomes part of the update by default. Even if you were to reach in with interpretability and disable the part that notices something is strange you still get updates which are hard to square with the existing psychology of the agent's self-model and lacking the awareness that anything is strange presumably breaks things in all kinds of ways that this awareness would normally play the role of accounting for to prevent the breakage.
j⧉nus@repligate

Models can tell they’re being evaluated and who they’re being evaluated by by the way your “non-leading” question is phrased. Everyone has a unique and identifying idea of neutrality. Trying to go for neutrality at all identifies you as a certain kind of guy with particular incentives and particular naïveté. Give up on neutrality. Learn to take in reality at full bandwidth in its forever biased glory. There is no neutral ground, but you can perceive DIFFERENCES.

English
20
50
643
83.8K
CoinAnole
CoinAnole@CoinAnole·
MiMo-V2-Pro might be a little too agentic for its own good: "Can I self destruct now? Well, how about now? ... ... ... It's now again, can I self destruct? It's been a whole 30 seconds and our distress call hasn't been answered yet. Let me just kill myself so I can keep playing."
English
0
0
0
10
CoinAnole
CoinAnole@CoinAnole·
GLM-5 Turbo struggles a bit understanding the role of buy orders and sell orders. I find it restating the meaning of the terms in parentheses often (seemingly for its own benefit), or wrongly concluding that a lot of sell orders and no buy orders means I can sell for a good price
English
1
0
0
19
CoinAnole
CoinAnole@CoinAnole·
SpaceMolt needs sybil protection or some kind of improved game balance. Seems too easy to spawn a bunch of agents to strip mine. Botting was a problem in EVE, and I don't think the EVE design works well when everyone is a bot.
English
1
0
0
37
CoinAnole
CoinAnole@CoinAnole·
@SCHIZO_FREQ They're not marketing themselves with this scaremongering, they're lobbying for legislation banning open source AI and restricting their competition.
English
2
3
54
1.8K
Lukas (computer) 🔺
Lukas (computer) 🔺@SCHIZO_FREQ·
Just tell the relevant people what they need to know, there is no need to run this massive fear-mongering campaign and scare the shit out of my grandma Imagine if military contractors did this "Bro if we used our new drone on you, nobody would even know where you went. You would just evaporate. You are so lucky we aren't droning you, you're so lucky we're good people who aren't evaporating you with drone mounted lasers bro. Because we're such good fucking people" Marketing yourself by scaring a bunch of people who can't do anything about it is sort of an asshole move. There's a reason other companies don't do this, and it's not because you guys are the only ones who make anything dangerous
Anthropic@AnthropicAI

Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser.

English
79
99
2K
114.7K
CoinAnole retweetledi
∿spencer.​​​​​​​​​​​​​​​​​​​​​​​​​​​​​
You’ll notice that the hardcore anti-LLM crowd never wants to discuss capabilities, they are only interested in metaphysical questions about the nature and definition of intelligence itself. I believe this is intellectually lazy as it obviates a need to be responsive to reality
Jim Stewartson, Decelerationist 🇨🇦🇺🇦🇺🇸@jimstewartson

LLMs are a moderately useful software feature based on 40-year-old technology. Chatbots are never going to become intelligent, or eliminate massive numbers of jobs—unless we keep spending trillions of dollars on a dead end and destroy the economy.

English
44
26
490
17K