Steve Evans

68.9K posts

Steve Evans banner
Steve Evans

Steve Evans

@steve_e

Chief https://t.co/zvLO4oHA9R & https://t.co/6Muh4s4AQK - leading cat bond,ILS, reinsurance publications. Web tech since '95 (Mgmt,UX,Ecommerce,Product, UI).

Brighton and Hove, UK شامل ہوئے Mayıs 2008
987 فالونگ2.8K فالوورز
Steve Evans ری ٹویٹ کیا
前田ヒロ ⭐ALL STAR SAAS FUND ⭐
a16zがAI導入の現状を数字でまとめている。 興味深いデータがいくつか。 ・Fortune 500の29%がAIスタートアップの有料顧客になっている。 ・ユースケース別ではコーディングが他を「ほぼ一桁」引き離して圧倒的1位。Cursorの爆発的成長、Claude CodeやCodexの急成長がそれを裏付けている。 ・エンジニアの生産性がAIコーディングツールで10〜20倍に向上したというポートフォリオ企業のレポートも。 ・業界別ではテック(ChatGPTビジネスユーザーの27%)に加え、リーガル分野の伸びが目立つ。Harveyは創業3年でARR約2億ドルに到達。 ・ヘルスケアも注目。従来はEpic等のEHRシステムの壁に阻まれてきたが、AIは医療スクライブの自動化や、医療の複雑な事務処理の置き換えという形で浸透。Abridge、Ambience Healthcareなどが急成長中。既存のシステムを置き換えるのではなく「迂回する」アプローチが効いている。 ・AIモデルの能力向上も続いていて、会計・監査分野ではわずか4ヶ月でベンチマークが約20%改善、警察・探偵業務では約30%改善。
前田ヒロ ⭐ALL STAR SAAS FUND ⭐ tweet media
日本語
2
0
4
1.3K
Steve Evans ری ٹویٹ کیا
Derek Thompson
Derek Thompson@DKThomp·
The frontier AI labs have built extraordinary things and I’m in awe of their accomplishments. But if you compare your technology to nuclear weapons, predict that it will disemploy tens of millions of people, and announce the invention of a digital skeleton key to ~exfiltrate top secret information from government systems and gain control over critical infrastructure including military infrastructure~ I genuinely have a hard time seeing how this doesn’t end with some form of govt nationalization or sanction or something weirder. I can’t predict the evolution of this technology well enough to know what I’m rooting for here, but just adding 2 and 2 makes it hard to see how or why we’d continue to treat these companies like they’re ordinary private sector firms.
Tenobrus@tenobrus

maybe this is not yet clear, so let me state it plainly: as of right now Anthropic, and really a small number of individuals at Anthropic, has the capacity to directly attack and cause major damage to the United States Government, China, and generally global superpowers. government agencies like the NSA do not have internal models or defense capabilities that outclass frontier models. if they chose to do so, they could likely exfiltrate top secret information from government systems, gain control over critical infrastructure including military infrastructure, sabotage or modify communications between members of government at the highest level, and potentially carry on activities for some time without detection. the thing about having access to a huge number of zerodays your adversaries don't know about is it gives you a massive asymmetric advantage. they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities and worked to mitigate these flaws. you should be grateful american frontier labs have proven themselves remarkably trustworthy and concerned with the public good. but it's critical you understand we are in a new regime. private entities now have power that directly rivals and impacts the government's monopoly on influence and violence. and anthropic is certainly not the only one, there's little chance OpenAI's internal models are far behind. this trend will accelerate on virtually every dimension, not slow down. my prediction for how it plays out is the relatively imminent seizure and nationalization of labs by the US government, sometime over the next two years. it's very tough for me to see how they accept the existence of this kind of threat. but this adds a whole new class of governance issues, as then we've handed these extremely wide-reaching capabilities from private entities to public ones.

English
66
149
1.4K
219.6K
Philippe Lemoine
Philippe Lemoine@phl43·
After the exchange between @NateSilver538 and @nikitabier, I did a little test to check whether that was true and, to my surprise, what I found suggests that link deboosting was indeed reversed. What I did is randomly sample 15 tweets by @nytimes between 2019 and today, compute the weekly average number of likes and retweets they got and plot the results along with a trend line. The idea is that likes and retweets are probably a decent proxy for reach and @nytimes only posts tweets with external links, so by looking at this, we should be able to see any changes in the algorithm with respect to how links are treated. As you can see, it's pretty clear that, starting around the spring/summer of 2023, posts with links started to be penalized and eventually they were completely nuked until the spring/summer of 2025, when a reversal of that policy seems to have started. To be honest, this isn't what I was expecting to find, so even if that's just a quick and dirty test and it's hardly a definitive proof, it's good news and I thought I should share the results.
Philippe Lemoine tweet mediaPhilippe Lemoine tweet media
Cointelegraph@Cointelegraph

🚨 LATEST: Nikita Bier says links are no longer deboosted on X.

English
32
113
987
248.4K
Steve Evans ری ٹویٹ کیا
Steve Evans ری ٹویٹ کیا
Justin Wolfers
Justin Wolfers@JustinWolfers·
It's just an utterly relentless pattern: Every time the President de-escalates in the Middle East, the stockmarket in the U.S. rejoices. It's like they think war is bad for business.
Justin Wolfers tweet media
English
16
135
649
14.1K
Steve Evans
Steve Evans@steve_e·
There are no winners in war or conflict, but there are plenty of losers…
English
0
0
0
27
the meji.
the meji.@mejitwo·
Petition to change "et al." to "and gang" in academia
English
382
29.7K
122K
2.3M
Steve Evans ری ٹویٹ کیا
HOW THINGS WORK
HOW THINGS WORK@HowThingsWork_·
The sphere in Vegas just doing Sphere things 😲
English
306
4.4K
53.3K
989.4K
Steve Evans ری ٹویٹ کیا
Physics & Astronomy Zone
Physics & Astronomy Zone@zone_astronomy·
The highest quality video of the moon was just released… this is so beautiful.
English
5.2K
64.8K
330.8K
10.8M
Steve Evans ری ٹویٹ کیا
Merryn Somerset Webb
What if the whole LLM thing is a false start? If the flaws are inherent systemic problems - if the compounding of hallucinations/errors can't be sorted out? If the capex build out is one of the biggest misallocations of capital ever? Then what? bloomberg.com/news/newslette…
English
400
409
3K
1.6M
Steve Evans
Steve Evans@steve_e·
@pmarca A bit like wealth, equality, freedom and access then?
English
0
0
0
275
Marc Andreessen 🇺🇸
I'm calling it. AGI is already here – it's just not evenly distributed yet.
English
1.6K
1.2K
13.8K
2.5M
Steve Evans
Steve Evans@steve_e·
@RevivalNoventas Love that fire safety was a couple of likely out of date extinguishers dotted about a manky old warehouse 💥
English
0
0
0
31
Steve Evans ری ٹویٹ کیا
Itamar Golan 🤓
Itamar Golan 🤓@ItakGol·
Gauss meets real life. Also - Notice how people lifting 95 already say, “Fuck it, let’s do 100” - so there’s a discontinuity point. Mathematical theory faces reality.
Itamar Golan 🤓 tweet media
English
266
3.8K
80.6K
9.3M
Steve Evans
Steve Evans@steve_e·
AI models be like youtu.be/uY4cVhXxW64?si…
YouTube video
YouTube
Sukh Sroay@sukh_saroy

Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)

English
0
0
0
111
Steve Evans
Steve Evans@steve_e·
Power matters as much, or more, than compute hardware. Due to high demand, lead times for high-power transformers have expanded dramatically in the U.S.: delivery typically took 24 to 30 months before 2020, but waiting periods can stretch to as long as five years today, according to Sightline Climate cited by Bloomberg. For AI data centers, this is a catastrophe as their deployment cycles are under 18 months. To address shortages, companies are turning to global markets. As a result, Canada, Mexico, and South Korea became the biggest suppliers of high-power transformers for AI data centers to AI data centers. At the same time, imports of high-power transformers from China surged from fewer than 1,500 units in 2022 to more than 8,000 units in 2025 through October, according to Wood Mackenzie data cited by Bloomberg.
Steve Evans tweet media
English
1
0
0
72