Steve Evans

68.9K posts

Steve Evans banner
Steve Evans

Steve Evans

@steve_e

Chief https://t.co/zvLO4oHA9R & https://t.co/6Muh4s4AQK - leading cat bond,ILS, reinsurance publications. Web tech since '95 (Mgmt,UX,Ecommerce,Product, UI).

Brighton and Hove, UK เข้าร่วม Mayıs 2008
987 กำลังติดตาม2.8K ผู้ติดตาม
the meji.
the meji.@mejitwo·
Petition to change "et al." to "and gang" in academia
English
365
27.2K
114K
2M
Steve Evans รีทวีตแล้ว
HOW THINGS WORK
HOW THINGS WORK@HowThingsWork_·
The sphere in Vegas just doing Sphere things 😲
English
296
4.3K
52.1K
944.6K
Steve Evans รีทวีตแล้ว
Physics & Astronomy Zone
Physics & Astronomy Zone@zone_astronomy·
The highest quality video of the moon was just released… this is so beautiful.
English
5.1K
64K
323.5K
9.9M
Steve Evans รีทวีตแล้ว
Merryn Somerset Webb
What if the whole LLM thing is a false start? If the flaws are inherent systemic problems - if the compounding of hallucinations/errors can't be sorted out? If the capex build out is one of the biggest misallocations of capital ever? Then what? bloomberg.com/news/newslette…
English
356
379
2.7K
1.2M
Steve Evans
Steve Evans@steve_e·
@pmarca A bit like wealth, equality, freedom and access then?
English
0
0
0
275
Marc Andreessen 🇺🇸
I'm calling it. AGI is already here – it's just not evenly distributed yet.
English
1.6K
1.2K
13.7K
2.4M
Steve Evans
Steve Evans@steve_e·
@RevivalNoventas Love that fire safety was a couple of likely out of date extinguishers dotted about a manky old warehouse 💥
English
0
0
0
31
Steve Evans รีทวีตแล้ว
Itamar Golan 🤓
Itamar Golan 🤓@ItakGol·
Gauss meets real life. Also - Notice how people lifting 95 already say, “Fuck it, let’s do 100” - so there’s a discontinuity point. Mathematical theory faces reality.
Itamar Golan 🤓 tweet media
English
266
3.8K
80.6K
9.3M
Steve Evans
Steve Evans@steve_e·
AI models be like youtu.be/uY4cVhXxW64?si…
YouTube video
YouTube
Sukh Sroay@sukh_saroy

Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)

English
0
0
0
111
Steve Evans
Steve Evans@steve_e·
Power matters as much, or more, than compute hardware. Due to high demand, lead times for high-power transformers have expanded dramatically in the U.S.: delivery typically took 24 to 30 months before 2020, but waiting periods can stretch to as long as five years today, according to Sightline Climate cited by Bloomberg. For AI data centers, this is a catastrophe as their deployment cycles are under 18 months. To address shortages, companies are turning to global markets. As a result, Canada, Mexico, and South Korea became the biggest suppliers of high-power transformers for AI data centers to AI data centers. At the same time, imports of high-power transformers from China surged from fewer than 1,500 units in 2022 to more than 8,000 units in 2025 through October, according to Wood Mackenzie data cited by Bloomberg.
Steve Evans tweet media
English
1
0
0
72
Steve Evans รีทวีตแล้ว
ClarksonsFarm
ClarksonsFarm@ClarksonsFarm1·
Mount Fuji, Japan. Beautiful!
ClarksonsFarm tweet media
English
32
173
3.6K
35.8K
Steve Evans
Steve Evans@steve_e·
People are just realising that “hottest country in the world” was always just a threat to burn it all down… 🔥
English
0
0
0
60
Steve Evans รีทวีตแล้ว
nxthompson
nxthompson@nxthompson·
Pretty wild. The first image represents objects in Earth’s orbit at the end of the 1950s. The second is Earth’s orbit now. theguardian.com/science/ng-int…
nxthompson tweet medianxthompson tweet media
English
33
206
679
127.4K
Steve Evans รีทวีตแล้ว
CIRA
CIRA@CIRA_CSU·
From its angled view, GOES-18 also captured the historic launch of Artemis II.
English
2
55
210
13.6K
Steve Evans รีทวีตแล้ว
CIRA
CIRA@CIRA_CSU·
Artemis II is headed for the moon! GOES-19 caught this amazing view of the exhaust plume from the rocket as it launched from Cape Canaveral.
English
2
131
455
91.2K
Steve Evans รีทวีตแล้ว
Brian LaMarre
Brian LaMarre@blamarre·
Awesome view of @NASAArtemis II, as viewed from NOAA weather satellite. Note the dark (warmer) dot launching off from Cape Canaveral and off to the northeast! #flwx #artemis
English
2
31
97
104.8K
Steve Evans
Steve Evans@steve_e·
Social media engagement falls right at the time social media platforms throttle user reach, trash usability and amplify polarising views. Surprised? 🤷‍♂️ theguardian.com/media/2026/apr…
English
0
0
0
90