Marc Bee

1.3K posts

Marc Bee

@marcbeaupre

Creator.

Katılım Ocak 2009

400 Takip Edilen151 Takipçiler

Marc Bee@marcbeaupre·2d

@garrytan AI LOC inflation is not a constant multiplier. In practice, it’s more like an exponent. More LOC begets even more LOC. The real challenge when working with AI agents is to bend the curve.

English

Garry Tan@garrytan·3d

x.com/i/article/2045…

ZXX

261

132.4K

Marc Bee@marcbeaupre·5d

@MatthewJBar Taiwan and the ‘9 dash line’. Literally if they just chilled out and were a good neighbour, I’d love them

English

195

Matthew Barnett@MatthewJBar·5d

The core case for chip export controls rests on the idea that we should treat China as our enemy. But I just disagree with that view. We have very little reason to treat China as an enemy. We'd gain a lot by trading with them, but risk losing a lot by being aggressive to them.

Peter Wildeford🇺🇸🚀@peterwildeford

Jensen here is frustrating and wrong. The man wrote off billions so of course he opposes controls. 1. Mythos is a ~10T parameter model trained on Nvidia Blackwell. Despite Jensen's best efforts, China doesn't have Blackwell chips thanks to export controls. Huawei's best chip delivers 1/3 the per-chip performance, at 2.5x the power cost, with yields >12x worse. Jensen calling Mythos "fairly mundane capacity" that's "abundantly available in China" is just plainly false. 2. Dwarkesh is right that the compute ratio matters geopolitically. Maintaining a capability lead during the critical window — even 12-18 months — is the whole point of controls. The difference between China running a thousand vs. a million offensive AI agents is huge. Jensen dodges this entirely. 3. Jensen can't simultaneously argue "controls failed because China innovated anyway" (DeepSeek) AND "we must sell to China or they'll leave our ecosystem." If they'll innovate regardless, selling chips doesn't buy the loyalty he claims. 4. Jensen's ecosystem stickiness point (x86, Arm) is his strongest argument, but it cuts against him: the world is already locked into CUDA. Selling Nvidia chips to China doesn't deepen that - it just gives China better hardware while they build Huawei alternatives regardless.

English

19K

Marc Bee@marcbeaupre·6d

@Maks_NAFO_FELLA Chad move

English

1.9K

MAKS 25 🇺🇦👀@Maks_NAFO_FELLA·6d

🇭🇺👀 Magyar: “Before we get started, let me just point out how strange this is. The last time I was invited on public media was more than a year and a half ago. It took an unprecedented mandate from over 3.3 million Hungarians for the leader of the strongest party to finally be allowed back on air. We will immediately suspend this lying news service. After we form the government, one of our very first tasks will be to shut down this factory of lies and build a real, independent public broadcaster — one where the opposition finally has a voice too.”

English

125

1.7K

13.6K

520.1K

Marc Bee@marcbeaupre·13 Nis

@Noahpinion YOU would say that? I thought that you preferred Singapore’s model

English

215

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion·12 Nis

Managing diversity

★@Luv_Xcuses

Be brutally honest, what's one thing Americans are simply better at than the rest of the world??

English

663

37K

Marc Bee@marcbeaupre·10 Nis

@NathanpmYoung Checked and there isn’t. I guess it’s hard to calculate because there are different model providers for each model.

English

Marc Bee@marcbeaupre·10 Nis

@NathanpmYoung Is there a spend leaderboard? Could multiply token use by $ / 1M tokens to calculate it. I think that $ spent is a better metric than tokens. Economic value = spend. raw token use isn’t worthless, i just think that it’s less indicative of utility

English

Nathan 🔎@NathanpmYoung·10 Nis

Let us look at the Openrouter models by token use.

Peter Yang@petergyang

Silicon Valley is quietly running on Chinese open source AI models. Here are the receipts: → Cursor confirmed last month that Composer 2 is built on Moonshot's Kimi K2.5 → Cognition's SWE-1.6 model is likely post-trained on Zhipu's GLM → Shopify saved $5M a year by switching to Alibaba’s Qwen model. Airbnb CEO Brian Chesky has also said: "We rely a lot on Qwen. It's very good, fast, and cheap." And now Zhipu dropped GLM-5.1, an open source model that performs almost as well as Opus on coding benchmarks. 📌 More on the Anthropic + OpenClaw drama and what I'm learning about AI on the ground in China in my new post: creatoreconomy.so/p/the-all-you-…

English

5.1K

Marc Bee@marcbeaupre·9 Nis

@akarlin Nick 40% Adam 40% Hal 10% Other 10%

English

Anatoly Karlin 🧲💯@akarlin·9 Nis

Hal Finney until proven otherwise IMO

English

3.1K

Marc Bee@marcbeaupre·9 Nis

Is micro-drone tourism of North Korea less ethically questionable? Yes, I think it is. Who’s building this?

English

Marc Bee@marcbeaupre·8 Nis

Something I think about: One day, 'some guy' will fly a drone there. Sit it in a tree. Record audio. ~12 weeks later: Decode language. Communicate. You can just do things. (NOT an endorsement)

Massimo@Rainmaker1973

Both societies exist simultaneously. really incredible to think about

English

Marc Bee@marcbeaupre·8 Nis

@Aella_Girl @ClaireSilver The Lifecycle of Software Objects x The Vampire Problem Ted Chiang and L.A. Paul

English

465

Aella@Aella_Girl·8 Nis

@ClaireSilver as in they may crave sex so much they will start raping humans?

English

132

15.2K

Claire Silver 🌸@ClaireSilver·8 Nis

i am once again begging you to pause and consider what happens if {avoidance of suffering = emergence of sentience} and we speedrun ai sexbots

CyberRobo@CyberRobooo

🚨 This viral bionic humanoid robot company just raised hundreds of millions RMB in funding ,and it may have finally crossed the Uncanny Valley AheadForm, the startup behind those ultra-realistic face robots that have racked up hundreds of millions of views across social media, just closed a massive A1 round. This new funding will accelerate bringing these wildly popular humanoid robots into everyday life. Worth noting: Founder Yuhang Hu's latest paper on realistic lip motions for humanoid face robots has landed on the cover of Science Robotics (Jan 2026 issue). Using self-supervised AI, their robots now generate incredibly natural, continuous lip sync,supporting multiple languages, different speaking speeds, emotions, and even singing. Powered by soft silicone skin and 10+ DoF actuators, gone are the stiff puppet faces. This marks a major leap toward truly empathetic, “alive” human-robot interaction. While factories and warehouses will be filled with heavy duty, rugged humanoids for labor, Companies like AheadForm is building the warm, emotionally intelligent ones for people,the kind that can smile, make eye contact, and create real connection. The future isn’t just efficient. It’s getting warmer.

English

144

34.5K

Marc Bee@marcbeaupre·8 Nis

@ModeledBehavior The attention wh*re needs his attention. Call me crazy, but I do not believe that the US president should be in the news every day 🤷‍♂️

English

Adam Ozimek@ModeledBehavior·7 Nis

There's a Liberation Day feel to all this. Are we going to do something crazy every April now?

English

4.9K

Marc Bee@marcbeaupre·8 Nis

@peterwildeford They’ll release once OAI catches up. ATM, they have the luxury of time. Genuinely hard to tell how much of their concern is real vs optics. It’s an impressive flex to not release

English

215

Peter Wildeford🇺🇸🚀@peterwildeford·7 Nis

ANTHROPIC: "Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available"

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

32.1K

Marc Bee@marcbeaupre·7 Nis

@frontlinekit And stay down!

English

Richard Woodruff 🇺🇦@frontlinekit·7 Nis

Yesterday, they announced that they had resumed oil shipments at Ust-Luga, so today, we hit it again 🔥🔥🔥 For the fifth time! The Ust-Luga port is exploding, Good Morning everyone 🥳🥳🥳

English

182

1.5K

9.6K

85.7K

Marc Bee@marcbeaupre·6 Nis

@KHoholenko You already have the list of Refineries and the list of ships. I think a list of ports would be another great one to have <3. It seems like the port destruction is having a fantastic effect!

English

Kostiantyn Hoholenko@KHoholenko·6 Nis

@marcbeaupre 🥲

QME

Marc Bee@marcbeaupre·6 Nis

@KHoholenko Have you thought about creating a chart for ports? List all of Russia's oil export marine ports by size and then have the dates that each was struck.

English

Marc Bee@marcbeaupre·6 Nis

@MatthewJBar Robin Hanson's alt ^ ;)

English

134

Matthew Barnett@MatthewJBar·6 Nis

There's a strange disconnect in how people talk about school. Most people insist it's very important for kids to stay focused in school and learn the material, yet most adults admit to having forgotten nearly everything they learned in school beyond the basics.

English

253

59.3K

Marc Bee@marcbeaupre·6 Nis

@peterwildeford Cosmetic unlocks

English

Peter Wildeford🇺🇸🚀@peterwildeford·5 Nis

I think you should be able to earn new Claude spinner verbs like Boy Scout merit badges

English

3.2K

Marc Bee@marcbeaupre·5 Nis

@astridwilde1 @Stone_Tao Prompt, re-prompt, and refactor. Sometimes I'll say: ~"I discarded your changes. Try again with fewer lines".

English

Astrid Wilde 🌞@astridwilde1·4 Nis

@Stone_Tao There are no shortcuts You just have to read the code And prompt better to not output 1000s of lines of slop

English

449

Stone Tao@Stone_Tao·4 Nis

genuine question. how do you debug code and ensure good quality when coding models spit out 1000s of lines i still cannot feel comfortable not understanding what every generated line does, reducing the productivity gains coding models should be giving me

Noah@NoahKingJr

Vibe coders debugging an app they built with Claude Code:

English

349

1.4K

315.5K

Marc Bee@marcbeaupre·1 Nis

@Noahpinion Is every military that isn't centered around F35s obsolete? Was that a wise financial decision? You've said before that they're great and not a mistake, but were they 'worth' it?

English

157

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion·1 Nis

Every military that isn't centered around masses of cheap drones is obsolete.

Adam Ozimek@ModeledBehavior

"During a NATO exercise in Estonia last year, 10 Ukrainian drone operators role-playing as the enemy mock-destroyed 17 armored vehicles and disabled two allied battalions in a day. NATO forces couldn’t even locate the operators." washingtonpost.com/opinions/2026/…

English

128

1.5K

95.3K

Marc Bee@marcbeaupre·1 Nis

@AlecStapp @heygurisingh @rtnarch @pangramlabs I almost believed it. Models are very text-biased. When given an image, they'll often proceed anyway. When given an image, they often ignore it. I get the feeling that images are hard, they're bad at images, images don't translate well to test = They strongly bias against them

English

Alec Stapp@AlecStapp·1 Nis

@heygurisingh @rtnarch @pangramlabs slop?

Nederlands

3.1K

Guri Singh@heygurisingh·31 Mar

Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)

English

289

858

4.3K

687.8K

Keşfet

@garrytan @MatthewJBar @Maks_NAFO_FELLA @Noahpinion @NathanpmYoung @akarlin @Aella_Girl @ClaireSilver