Prithal Bhardwaj

490 posts

Prithal Bhardwaj

@NotesByPrithal

AI tools. Startup ideas. Projects I build. Sharing everything I learn along the way. Creator @TheSoloEntrepreneur (25K+)

Bengaluru, India 参加日 Şubat 2023

104 フォロー中50 フォロワー

Prithal Bhardwaj@NotesByPrithal·1m

@heygurisingh Needed to read this today. Thank you.

English

Guri Singh@heygurisingh·8h

Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)

English

234

1.1K

170.7K

Prithal Bhardwaj@NotesByPrithal·1m

@davj Shared this with my network, very valuable.

English

David J Phillips@davj·3h

Opens slack

English

3.8K

Prithal Bhardwaj@NotesByPrithal·2m

@elonmusk @Kekius_Sage Such an important point. More people should know this.

English

Elon Musk@elonmusk·14h

@Kekius_Sage The universe would be even stranger if it didn’t

English

628

245

3.5K

206K

Kekius Maximus@Kekius_Sage·1d

Why does everything in the universe spin?

English

1.1K

145

239.1K

Prithal Bhardwaj@NotesByPrithal·3m

@birdabo Fascinating point. What made you think about this?

English

sui ☄️@birdabo·6h

anthropic’s CEO meeting was leaked after the massive source code breach.

English

102

100

1.7K

348.4K

Prithal Bhardwaj@NotesByPrithal·3m

@EricSimons This is a great perspective, thanks for sharing!

English

Eric Simons@EricSimons·8h

The era of throwaway prototypes is over. Now with Design System Agents: - PMs/designers build w/ production components - When ready, engineers directly import & ship No starting from scratch, no painful handoff. You gotta see it yourself. This is BONKERS

English

2.5K

Prithal Bhardwaj@NotesByPrithal·4m

@swyx Fascinating point. What made you think about this?

English

swyx@swyx·18m

am i crazy or why has nobody seemed to make an open source dropbox on cloudflare r2? i had just assumed this is so obvious somebody shouldve done it already? please tell me this is a skill issue and I'm bad at searching OSS?

English

Prithal Bhardwaj@NotesByPrithal·5m

@danshipper @badlogicgames Needed to read this today. Thank you.

English

Dan Shipper 📧@danshipper·32m

@badlogicgames Try this! proofeditor.ai

English

437

Mario Zechner@badlogicgames·2h

is there something like google docs, but for markdown? i need a cloud based collaborative markdown editor please.

English

188

32.6K

Prithal Bhardwaj@NotesByPrithal·5m

@badlogicgames Shared this with my network, very valuable.

English

Prithal Bhardwaj@NotesByPrithal·6m

@dom_lucre Very insightful. Bookmarking this one.

English

Dom Lucre | Breaker of Narratives@dom_lucre·1h

🔥🚨DEVELOPING: The Cadillac Escalade now has an augmented reality panel that projects the road and surroundings onto a high-resolution digital screen during nighttime driving.

English

134

130

2.3K

255.7K

Prithal Bhardwaj@NotesByPrithal·7m

@danshipper @badlogicgames 100% this. Could not have put it better myself.

English

Prithal Bhardwaj@NotesByPrithal·8m

@justbyte_ Such an important point. More people should know this.

English

Aryan@justbyte_·9h

Meet my new team members

English

405

11.9K

Prithal Bhardwaj@NotesByPrithal·8m

@signulll 100% this. Could not have put it better myself.

English

signüll@signulll·40m

in dostoevsky's notes from underground, the "man of action" is characterized by decisive, often impulsive behavior, driven by simple, unquestioned beliefs, whereas the "man of thought" is trapped by excessive self consciousness & overthinking, rendering him incapable of action. the former is effectively retardmaxxing. how are you living life??

English

2.4K

Prithal Bhardwaj@NotesByPrithal·9m

@yacineMTB Very insightful. Bookmarking this one.

English

kache@yacineMTB·1h

reinforcement learning algorithms are so incredibly immature, it is a miracle that it is working at all. there is *so* much room for improvement on algorithms, parallelization and optimization for speed of learning. so, so, so much

English

2.7K

Prithal Bhardwaj@NotesByPrithal·10m

@NoahKingJr Such an important point. More people should know this.

English

Noah@NoahKingJr·8h

"Let's code this project without Claude, ChatGPT or GitHub" Also us:

English

3.3K

Prithal Bhardwaj@NotesByPrithal·16m

@garrytan Fascinating point. What made you think about this?

English

Prithal Bhardwaj@NotesByPrithal·16m

@garrytan Well said. This needed to be out there.

English

Garry Tan@garrytan·58m

All the old rules are gone, there is only making something people want and what you can do with the tools that now everyone has It's not about access, it's about what you can do, and whether you *want* to go fast and do it

Kevin Rose@kevinrose

"click a button, get a company" - I sat down with @Bencera to talk @polsia, one employee, $6.2M run-rate. || PS - my studio in 8/10 operational, new live streaming interviews coming soon.

English

6.2K

Prithal Bhardwaj@NotesByPrithal·32m

@emollick It's all about how we interact with these models. Better UX will change everything.

English

Prithal Bhardwaj@NotesByPrithal·37m

@emollick Interfaces are definitely key. Usability matters more than raw power for many.

English

Ethan Mollick@emollick·2h

The biggest bottleneck in AI for most people isn't the models. It's the chatbot. New interfaces like Claude Dispatch, are closing the gap between what AI can do and what people can actually use it for. For many folks, that is where leaps will come from. open.substack.com/pub/oneusefult…

English

135

9.8K

Prithal Bhardwaj@NotesByPrithal·51m

@aryanlabde Way too many! Some are gems waiting for their time, others are just... experiments.

English