charlie

17.4K posts

charlie

@chaly644

Busy working

California, USA Katılım Haziran 2012

4.7K Takip Edilen270 Takipçiler

charlie@chaly644·1m

@flowersslop A titan against a titan!!!

GIF

Català

Flowers ☾@flowersslop·1h

Imagine if Spud is better than Mythos

English

115

3.5K

charlie@chaly644·3m

@mattshumer_

GIF

QME

Matt Shumer@mattshumer_·26m

If you think about it, Anthropic essentially now has a master key to just about any software in the world. In some ways, they now have more power than governments.

Matt Shumer@mattshumer_

This is absolutely fucking terrifying. Anthropic's rumored Mythos model is real. And it's so powerful that they can't release it to the public. We're beyond benchmarks now. This model, in the wrong hands, is a cyberweapon capable of mass destruction.

English

11.5K

charlie@chaly644·37m

@jdegoes @AndyXAndersen How's the COPE these days?

GIF

English

John A De Goes@jdegoes·2h

@AndyXAndersen It sounds like you believe you can teach me something about AI agents.

English

John A De Goes@jdegoes·20h

No paper is needed for this fact, it is self-evident to those paying attention. Yet my mentions are filled with people claiming LLMs are capable of de novo reasoning. They're not. But they are awfully good at convincing people they are.

Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

6.7K

charlie@chaly644·38m

@jdegoes You got COPE 🤣

GIF

English

charlie@chaly644·52m

@Noahpinion That'd be great

GIF

English

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion·1h

When OpenAI comes out with a similar model, it might decide to go ahead and release it, in an attempt to grab market share back from Anthropic. Doing this might break the internet.

Peter Wildeford🇺🇸🚀@peterwildeford

ANTHROPIC: "Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available"

English

113

14K

charlie@chaly644·3h

@MinuteMovies3 @gavinpurcell Oh? How so? Elaborate

English

Minute Movies@MinuteMovies3·5h

@gavinpurcell I don't believe it's a bubble but this is the strongest bubble indicator to date

English

Gavin Purcell@gavinpurcell·5h

so.... here's where we are in ai now

Jeremy Nguyen ✍🏼 🚢@JeremyNguyenPhD

Milla Jovovich (actress from The Fifth Element) created a world-beating Claude memory system with @bensig?! - 100% on LongMemEval — first perfect score ever recorded. Free and 100% open source. Github link in the quoted post from Ben. I'm keen to hear how it works for you.

English

1.4K

charlie@chaly644·3h

@davidzmorris LOL 🤣 Keep deluding yourself. You're an NPC. You're not an ai lab. You're not an institutional investor. You have no say in how these PRIVATE companies spend their money. Any questions?

English

David Z. Morris@davidzmorris·8h

"The people pitching me have thought about that objection to their pitch, surely. They're smart and important!" is an amazing way to talk yourself into getting conned. It's helpful (if sad) when predicate victims just lay it right out there

charlie@chaly644

@MerrynSW That question has already been answered by the people that actually matter. Why do you think they're spending TRILLIONS on ai??? 🤣

English

304

charlie@chaly644·17h

@dumi_ZAR @Rainmaker1973 Serial Killer Whale

English

Baby Boy@dumi_ZAR·1d

@Rainmaker1973 Is there a video of a killer whale eating a polar bear ?

English

501

Massimo@Rainmaker1973·1d

Striking encounter with a killer whale in Antarctica [📹Richard Sidey]

English

122

1.8K

92.7K

charlie@chaly644·17h

@connors_touch @tszzl @mrgunn It's the new climate scam. Fake outrage.

English

Touch Connors@connors_touch·17h

@tszzl @mrgunn So that Data Center in Morrow County, Oregon thats poisoning people, how does that better mankind? msn.com/en-us/news/us/… Or the massive amounts of energy and water they consume watersecuritynewswire.com/infrastructure… msn.com/en-us/news/us/…

English

486

roon@tszzl·19h

just so everyone is clear: this is evil. you are justified in thinking it’s morally bad. tons of apologetics happening for bad people. if you think behavior like this is just desserts for the tech industry due to some hobbyhorse you have, you have gone insane

Resist Wire@ResistWire

BREAKING: 13 shots fired into home of Indianapolis councilor; note reading “No data centers” left at scene.

English

620

414

441.9K

charlie@chaly644·21h

@mar3tus @libisumbrae @MolloyLaurence @MerrynSW Token efficiency increases 10X each year.

English

Jacob Martus@mar3tus·22h

@chaly644 @libisumbrae @MolloyLaurence @MerrynSW All of the big labs are subsidizing token cost rn.

English

Merryn Somerset Webb@MerrynSW·3d

What if the whole LLM thing is a false start? If the flaws are inherent systemic problems - if the compounding of hallucinations/errors can't be sorted out? If the capex build out is one of the biggest misallocations of capital ever? Then what? bloomberg.com/news/newslette…

English

353

375

2.7K

1.2M

charlie@chaly644·22h

@GaryMarcus Human error rates are greater !

English

Gary Marcus@GaryMarcus·22h

This ML Prof told me that the hallucination rate for frontier reasoning LLMs is “next to nil” And then gave me data, only after I pushed him, showing a best-case rate of 4.6% (which of course is benchmark specific). 4.6% is not “next to nil”. Imagine if your accountant hallucinated 4.6% of the time. Or worse, your pilot.

Aran Nayebi@aran_nayebi

@Kasparov63 @GaryMarcus Have you had a chance to try the latest reasoning models? You'll see their hallucination rate is next to nil. In fact, there’s a big difference between frontier reasoning models & the base LLMs that're freely available to the public, see e.g. here: x.com/aran_nayebi/st…

English

317

49.4K

charlie@chaly644·22h

@ThePrimeagen Five years from now?

GIF

English

230

ThePrimeagen@ThePrimeagen·22h

The downfall of Nvidia begins? Long live dedicated silicon

Anthropic@AnthropicAI

We've signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models.

English

128

109

4.5K

517K

charlie@chaly644·22h

@TheZvi No. Citation please

English

Zvi Mowshowitz@TheZvi·1d

Do you remember when he previously got asked this same question of why people should trust him, and instead of a PR speech he straight up said 'you shouldn't'?

Mike Allen@mikeallen

👀 I asked @sama why people should trust HIM to be at the forefront of AI's powers "I think almost everybody involved in our industry feels the gravity of what we're doing ... We also think it's very important that no one person is making the decisions by themselves"

English

116

11.3K

charlie@chaly644·23h

@KatieMiller LOL You have zero standing to complain 🤣

English

Katie Miller@KatieMiller·1d

Why did it take them 24 hours to issue this word salad?

Jessica Lessin@Jessicalessin

Quite the joint statement from Altman and Friar to @theinformation: “We are fully aligned that durable access to compute is at the core of OpenAI’s strategy and a key differentiator as we scale. We have both been directly involved in every consequential compute decision over the past year plus. The $122 billion round locks in the capacity to scale compute aggressively and positions us to become the core infrastructure layer for AI, translating that advantage into sustained leadership across research and products, and making it possible for people around the world and businesses, big and small, to just build things.” theinformation.com/articles/opena…

English

171

13.7K

charlie@chaly644·23h

@buccocapital Put me all in on OpenAI and Chatgpt That's the attitude of a WINNER 🏆

GIF

English

196

BuccoCapital Bloke@buccocapital·1d

Paul Graham, 18 years ago: "You could parachute Sam Altman into an island full of cannibals and come back in 5 years and he'd be the king" At this point I think you should stop being surprised he'll do whatever it takes to try to win.

English

113K

charlie@chaly644·23h

@MolloyLaurence @libisumbrae @MerrynSW Don't need to believe. The numbers speak for themselves. Facts don't care about your feelings

English

Laurence Molloy@MolloyLaurence·23h

@chaly644 @libisumbrae @MerrynSW If you believe this, I have a data centre to sell you.

English

charlie retweetledi

Cody Allred@CodyAlanAllred·1d

@joni_askola Meanwhile he just did a podcast with Jensen Huang and several others this month 😂 Sure, no one talks to him anymore...

English

338

68.4K

charlie@chaly644·23h

@JohnKoesS @ViralOps_ Denial

English

John Koes@JohnKoesS·2d

@ViralOps_ This looks really bad and I guarantee you not a single anime fan would watch this ever.

English

1.2K

31.2K

ViralOps@ViralOps_·2d

they still say Ai is NOT the real art, then explain this one piece clip. this normally would have cost them $500,000,000. and Ai just made it within a week in under $500. kizaru shows will start getting BETTER from here with AI big anime studios should be AFRAID of what comes next you can access seedance 2 pro on @MartiniArt_

English

723

1.3K

11.4K

1.5M

charlie@chaly644·1d

@MonicaCrowley You would

English

Monica Crowley@MonicaCrowley·1d

I VOTED FOR THIS

Rapid Response 47@RapidResponse47

"You called the Iranians 'crazy bastards...'" @POTUS: "True." "What is your response to critics who say—" @POTUS: "I don't care about critics." 🔥🔥🔥

English

397

741

3.9K

41.1K

charlie@chaly644·1d

@SimonHoiberg @dvassallo Brilliant strategy 👏🙄 ⬇️

English

Simon Høiberg@SimonHoiberg·1d

@dvassallo There is at least a high correlation. When I see a reply that smells like AI, and then see something like "AI automation agency" in the bio, it's an instant block.

English

316

Daniel Vassallo@dvassallo·1d

Unfortunately if I see the word "AI" in someone's bio, I assume all their posts are automated AI nonesense.

English

8.2K

Keşfet

@flowersslop @mattshumer_ @jdegoes @AndyXAndersen @Noahpinion @MinuteMovies3 @gavinpurcell @davidzmorris