Marc Bee

1.3K posts

Marc Bee banner
Marc Bee

Marc Bee

@marcbeaupre

Creator.

Katılım Ocak 2009
400 Takip Edilen151 Takipçiler
Marc Bee
Marc Bee@marcbeaupre·
@garrytan AI LOC inflation is not a constant multiplier. In practice, it’s more like an exponent. More LOC begets even more LOC. The real challenge when working with AI agents is to bend the curve.
English
0
0
0
59
Marc Bee
Marc Bee@marcbeaupre·
@MatthewJBar Taiwan and the ‘9 dash line’. Literally if they just chilled out and were a good neighbour, I’d love them
English
0
0
3
195
Matthew Barnett
Matthew Barnett@MatthewJBar·
The core case for chip export controls rests on the idea that we should treat China as our enemy. But I just disagree with that view. We have very little reason to treat China as an enemy. We'd gain a lot by trading with them, but risk losing a lot by being aggressive to them.
Peter Wildeford🇺🇸🚀@peterwildeford

Jensen here is frustrating and wrong. The man wrote off billions so of course he opposes controls. 1. Mythos is a ~10T parameter model trained on Nvidia Blackwell. Despite Jensen's best efforts, China doesn't have Blackwell chips thanks to export controls. Huawei's best chip delivers 1/3 the per-chip performance, at 2.5x the power cost, with yields >12x worse. Jensen calling Mythos "fairly mundane capacity" that's "abundantly available in China" is just plainly false. 2. Dwarkesh is right that the compute ratio matters geopolitically. Maintaining a capability lead during the critical window — even 12-18 months — is the whole point of controls. The difference between China running a thousand vs. a million offensive AI agents is huge. Jensen dodges this entirely. 3. Jensen can't simultaneously argue "controls failed because China innovated anyway" (DeepSeek) AND "we must sell to China or they'll leave our ecosystem." If they'll innovate regardless, selling chips doesn't buy the loyalty he claims. 4. Jensen's ecosystem stickiness point (x86, Arm) is his strongest argument, but it cuts against him: the world is already locked into CUDA. Selling Nvidia chips to China doesn't deepen that - it just gives China better hardware while they build Huawei alternatives regardless.

English
13
3
87
19K
MAKS 25 🇺🇦👀
MAKS 25 🇺🇦👀@Maks_NAFO_FELLA·
🇭🇺👀 Magyar: “Before we get started, let me just point out how strange this is. The last time I was invited on public media was more than a year and a half ago. It took an unprecedented mandate from over 3.3 million Hungarians for the leader of the strongest party to finally be allowed back on air. We will immediately suspend this lying news service. After we form the government, one of our very first tasks will be to shut down this factory of lies and build a real, independent public broadcaster — one where the opposition finally has a voice too.”
English
125
1.7K
13.6K
520.1K
Marc Bee
Marc Bee@marcbeaupre·
@Noahpinion YOU would say that? I thought that you preferred Singapore’s model
English
0
0
1
215
Marc Bee
Marc Bee@marcbeaupre·
@NathanpmYoung Checked and there isn’t. I guess it’s hard to calculate because there are different model providers for each model.
English
0
0
0
4
Marc Bee
Marc Bee@marcbeaupre·
@NathanpmYoung Is there a spend leaderboard? Could multiply token use by $ / 1M tokens to calculate it. I think that $ spent is a better metric than tokens. Economic value = spend. raw token use isn’t worthless, i just think that it’s less indicative of utility
English
1
0
0
52
Marc Bee
Marc Bee@marcbeaupre·
@akarlin Nick 40% Adam 40% Hal 10% Other 10%
English
0
0
1
34
Marc Bee
Marc Bee@marcbeaupre·
Is micro-drone tourism of North Korea less ethically questionable? Yes, I think it is. Who’s building this?
English
0
0
0
4
Aella
Aella@Aella_Girl·
@ClaireSilver as in they may crave sex so much they will start raping humans?
English
30
2
132
15.2K
Marc Bee
Marc Bee@marcbeaupre·
@ModeledBehavior The attention wh*re needs his attention. Call me crazy, but I do not believe that the US president should be in the news every day 🤷‍♂️
English
0
0
0
12
Adam Ozimek
Adam Ozimek@ModeledBehavior·
There's a Liberation Day feel to all this. Are we going to do something crazy every April now?
English
4
1
49
4.9K
Marc Bee
Marc Bee@marcbeaupre·
@peterwildeford They’ll release once OAI catches up. ATM, they have the luxury of time. Genuinely hard to tell how much of their concern is real vs optics. It’s an impressive flex to not release
English
0
0
0
215
Richard Woodruff 🇺🇦
Richard Woodruff 🇺🇦@frontlinekit·
Yesterday, they announced that they had resumed oil shipments at Ust-Luga, so today, we hit it again 🔥🔥🔥 For the fifth time! The Ust-Luga port is exploding, Good Morning everyone 🥳🥳🥳
English
182
1.5K
9.6K
85.7K
Marc Bee
Marc Bee@marcbeaupre·
@KHoholenko You already have the list of Refineries and the list of ships. I think a list of ports would be another great one to have <3. It seems like the port destruction is having a fantastic effect!
English
0
0
0
2
Marc Bee
Marc Bee@marcbeaupre·
@KHoholenko Have you thought about creating a chart for ports? List all of Russia's oil export marine ports by size and then have the dates that each was struck.
English
1
0
0
15
Matthew Barnett
Matthew Barnett@MatthewJBar·
There's a strange disconnect in how people talk about school. Most people insist it's very important for kids to stay focused in school and learn the material, yet most adults admit to having forgotten nearly everything they learned in school beyond the basics.
English
42
8
253
59.3K
Peter Wildeford🇺🇸🚀
Peter Wildeford🇺🇸🚀@peterwildeford·
I think you should be able to earn new Claude spinner verbs like Boy Scout merit badges
English
4
3
69
3.2K
Marc Bee
Marc Bee@marcbeaupre·
@astridwilde1 @Stone_Tao Prompt, re-prompt, and refactor. Sometimes I'll say: ~"I discarded your changes. Try again with fewer lines".
English
0
0
0
8
Astrid Wilde 🌞
Astrid Wilde 🌞@astridwilde1·
@Stone_Tao There are no shortcuts You just have to read the code And prompt better to not output 1000s of lines of slop
English
1
0
13
449
Marc Bee
Marc Bee@marcbeaupre·
@Noahpinion Is every military that isn't centered around F35s obsolete? Was that a wise financial decision? You've said before that they're great and not a mistake, but were they 'worth' it?
English
0
0
0
157
Marc Bee
Marc Bee@marcbeaupre·
@AlecStapp @heygurisingh @rtnarch @pangramlabs I almost believed it. Models are very text-biased. When given an image, they'll often proceed anyway. When given an image, they often ignore it. I get the feeling that images are hard, they're bad at images, images don't translate well to test = They strongly bias against them
English
0
0
0
47
Guri Singh
Guri Singh@heygurisingh·
Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)
Guri Singh tweet media
English
289
858
4.3K
687.8K