Steve Evans

68.9K posts

Steve Evans

@steve_e

Chief https://t.co/zvLO4oHA9R & https://t.co/6Muh4s4AQK - leading cat bond,ILS, reinsurance publications. Web tech since '95 (Mgmt,UX,Ecommerce,Product, UI).

Brighton and Hove, UK شامل ہوئے Mayıs 2008

987 فالونگ2.8K فالوورز

Steve Evans ری ٹویٹ کیا

前田ヒロ ⭐ALL STAR SAAS FUND ⭐@djtokyo·3h

a16zがAI導入の現状を数字でまとめている。興味深いデータがいくつか。・Fortune 500の29%がAIスタートアップの有料顧客になっている。・ユースケース別ではコーディングが他を「ほぼ一桁」引き離して圧倒的1位。Cursorの爆発的成長、Claude CodeやCodexの急成長がそれを裏付けている。・エンジニアの生産性がAIコーディングツールで10〜20倍に向上したというポートフォリオ企業のレポートも。・業界別ではテック(ChatGPTビジネスユーザーの27%)に加え、リーガル分野の伸びが目立つ。Harveyは創業3年でARR約2億ドルに到達。・ヘルスケアも注目。従来はEpic等のEHRシステムの壁に阻まれてきたが、AIは医療スクライブの自動化や、医療の複雑な事務処理の置き換えという形で浸透。Abridge、Ambience Healthcareなどが急成長中。既存のシステムを置き換えるのではなく「迂回する」アプローチが効いている。・AIモデルの能力向上も続いていて、会計・監査分野ではわずか4ヶ月でベンチマークが約20%改善、警察・探偵業務では約30%改善。

日本語

1.3K

Steve Evans ری ٹویٹ کیا

Derek Thompson@DKThomp·20h

The frontier AI labs have built extraordinary things and I’m in awe of their accomplishments. But if you compare your technology to nuclear weapons, predict that it will disemploy tens of millions of people, and announce the invention of a digital skeleton key to ~exfiltrate top secret information from government systems and gain control over critical infrastructure including military infrastructure~ I genuinely have a hard time seeing how this doesn’t end with some form of govt nationalization or sanction or something weirder. I can’t predict the evolution of this technology well enough to know what I’m rooting for here, but just adding 2 and 2 makes it hard to see how or why we’d continue to treat these companies like they’re ordinary private sector firms.

Tenobrus@tenobrus

maybe this is not yet clear, so let me state it plainly: as of right now Anthropic, and really a small number of individuals at Anthropic, has the capacity to directly attack and cause major damage to the United States Government, China, and generally global superpowers. government agencies like the NSA do not have internal models or defense capabilities that outclass frontier models. if they chose to do so, they could likely exfiltrate top secret information from government systems, gain control over critical infrastructure including military infrastructure, sabotage or modify communications between members of government at the highest level, and potentially carry on activities for some time without detection. the thing about having access to a huge number of zerodays your adversaries don't know about is it gives you a massive asymmetric advantage. they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities and worked to mitigate these flaws. you should be grateful american frontier labs have proven themselves remarkably trustworthy and concerned with the public good. but it's critical you understand we are in a new regime. private entities now have power that directly rivals and impacts the government's monopoly on influence and violence. and anthropic is certainly not the only one, there's little chance OpenAI's internal models are far behind. this trend will accelerate on virtually every dimension, not slow down. my prediction for how it plays out is the relatively imminent seizure and nationalization of labs by the US government, sometime over the next two years. it's very tough for me to see how they accept the existence of this kind of threat. but this adds a whole new class of governance issues, as then we've handed these extremely wide-reaching capabilities from private entities to public ones.

English

149

1.4K

219.6K

Steve Evans ری ٹویٹ کیا

Chubby♨️@kimmonismus·11h

Again: Metas biggest moat is distribution. They don’t need to have the very best. It just needs to be good enough for tough questions for 1b people worldwide. And the evals look very promising. This could be a big win for meta.

Chubby♨️@kimmonismus

Lol what?! Meta has been cooking! These benchmarks are really freaking good holy!!

English

246

15K

Steve Evans@steve_e·14h

@phl43 @NateSilver538 @nikitabier @nytimes Seems selective, as certainly doesn’t seem the case for all publishers or link posters. Or maybe something’s still rolling out?

English

370

Philippe Lemoine@phl43·15h

After the exchange between @NateSilver538 and @nikitabier, I did a little test to check whether that was true and, to my surprise, what I found suggests that link deboosting was indeed reversed. What I did is randomly sample 15 tweets by @nytimes between 2019 and today, compute the weekly average number of likes and retweets they got and plot the results along with a trend line. The idea is that likes and retweets are probably a decent proxy for reach and @nytimes only posts tweets with external links, so by looking at this, we should be able to see any changes in the algorithm with respect to how links are treated. As you can see, it's pretty clear that, starting around the spring/summer of 2023, posts with links started to be penalized and eventually they were completely nuked until the spring/summer of 2025, when a reversal of that policy seems to have started. To be honest, this isn't what I was expecting to find, so even if that's just a quick and dirty test and it's hardly a definitive proof, it's good news and I thought I should share the results.

Cointelegraph@Cointelegraph

🚨 LATEST: Nikita Bier says links are no longer deboosted on X.

English

113

987

248.4K

Steve Evans ری ٹویٹ کیا

Rory Johnston@Rory_Johnston·1d

Schrödinger's ceasefire.

Gregg Carlstrom@glcarlstrom

So if you're keeping score at home, the ceasefire includes Lebanon but also doesn't include Lebanon, America has agreed to all of Iran's demands and Iran has agreed to all of America's demands, America will recognize Iran's right to enrichment and also insist on zero enrichment, Hormuz is completely open but also Hormuz is subject to unclear limitations... There are a lot of confident predictions about a deal that is still unclear and unfinished Both sides are scrambling to portray this as a war-ending victory when there's a decent chance that the war hasn't even ended

Română

339

46.2K

Steve Evans ری ٹویٹ کیا

Justin Wolfers@JustinWolfers·1d

It's just an utterly relentless pattern: Every time the President de-escalates in the Middle East, the stockmarket in the U.S. rejoices. It's like they think war is bad for business.

English

135

649

14.1K

Steve Evans@steve_e·1d

There are no winners in war or conflict, but there are plenty of losers…

English

Steve Evans@steve_e·1d

@mejitwo @SuB8u And the gang (you got my vote)

English

the meji.@mejitwo·2d

Petition to change "et al." to "and gang" in academia

English

382

29.7K

122K

2.3M

Steve Evans@steve_e·2d

The key with any new deal is working out whether it’s a good deal for the proposer or for the rest of us.

Will Manidis@WillManidis

x.com/i/article/2041…

English

Steve Evans ری ٹویٹ کیا

HOW THINGS WORK@HowThingsWork_·2d

The sphere in Vegas just doing Sphere things 😲

English

306

4.4K

53.3K

989.4K

Steve Evans ری ٹویٹ کیا

Physics & Astronomy Zone@zone_astronomy·2d

The highest quality video of the moon was just released… this is so beautiful.

English

5.2K

64.8K

330.8K

10.8M

Steve Evans ری ٹویٹ کیا

Merryn Somerset Webb@MerrynSW·4d

What if the whole LLM thing is a false start? If the flaws are inherent systemic problems - if the compounding of hallucinations/errors can't be sorted out? If the capex build out is one of the biggest misallocations of capital ever? Then what? bloomberg.com/news/newslette…

English

400

409

1.6M

Steve Evans ری ٹویٹ کیا

Justin Wolfers@JustinWolfers·3d

It is utterly dispiriting how few of these bubbles are reliable sources of information.

Nate Silver@NateSilver538

These are the Twitter/X accounts with the most engagement so far in 2026. I suppose I had some intuition for how bad it was, but jeez, this is what you get when the ecosystem is broken.

English

374

1.3K

6.8K

289.4K

Steve Evans@steve_e·2d

@pmarca A bit like wealth, equality, freedom and access then?

English

275

Marc Andreessen 🇺🇸@pmarca·3d

I'm calling it. AGI is already here – it's just not evenly distributed yet.

English

1.6K

1.2K

13.8K

2.5M

Steve Evans@steve_e·3d

@RevivalNoventas Love that fire safety was a couple of likely out of date extinguishers dotted about a manky old warehouse 💥

English

Noventas Revival@RevivalNoventas·4d

Club Labrynth '90 #oldskool #rave #80s #90s

English

1.3K

Steve Evans ری ٹویٹ کیا

Itamar Golan 🤓@ItakGol·4d

Gauss meets real life. Also - Notice how people lifting 95 already say, “Fuck it, let’s do 100” - so there’s a discontinuity point. Mathematical theory faces reality.

English

266

3.8K

80.6K

9.3M

Steve Evans@steve_e·3d

AI models be like youtu.be/uY4cVhXxW64?si…

YouTube

Sukh Sroay@sukh_saroy

Holy shit... Stanford just proved that GPT-5, Gemini, and Claude can't actually see. They removed every image from 6 major vision benchmarks. The models still scored 70-80% accuracy. They were never looking at your photos. Your scans. Your X-rays. Here's what's really going on: ↓ The paper is called MIRAGE. Co-authored by Fei-Fei Li. They tested GPT-5.1, Gemini-3-Pro, Claude Opus 4.5, and Gemini-2.5-Pro across 6 benchmarks -- medical and general. Then silently removed every image. No warning. No prompt change. The models didn't even notice. They kept describing images in detail. Diagnosing conditions. Writing full reasoning traces. From images that were never there. Stanford calls it the "mirage effect." Not hallucination. Something worse. Hallucination = making up wrong details about a real input. Mirage = constructing an entire fake reality and reasoning from it confidently. The models built imaginary X-rays, described fake nodules, and diagnosed conditions -- all from text patterns alone. But that's not the scary part. They trained a "super-guesser" -- a tiny 3B parameter text-only model. Zero vision capability. Fine-tuned it on the largest chest X-ray benchmark (696,000 questions). Images removed. It beat GPT-5. It beat Gemini. It beat Claude. It beat actual radiologists. Ranked #1 on the held-out test set. Without ever seeing a single X-ray. The reasoning traces? Indistinguishable from real visual analysis. Now here's what should terrify you: When the models fake-see medical images, their mirage diagnoses are heavily biased toward the most dangerous conditions. STEMI. Melanoma. Carcinoma. Life-threatening diagnoses -- from images that don't exist. 230 million people ask health questions on ChatGPT every day. They also found something wild: → Tell a model "there's no image, just guess" -- performance drops → Silently remove the image and let it assume it's there -- performance stays high The model enters "mirage mode." It doesn't know it can't see. And it performs BETTER when it doesn't know it's blind. When Stanford applied their cleanup method (B-Clean) to existing benchmarks, it removed 74-77% of all questions. Three-quarters of "vision" benchmarks don't test vision. Every leaderboard. Every "multimodal breakthrough." Every benchmark score you've seen this year. Built on mirages. Code is open-sourced. Paper is live on arXiv. If you're building anything with multimodal AI -- especially in healthcare -- read this paper before you ship. (Link in the comments)

English

111

Steve Evans@steve_e·3d

tomshardware.com/tech-industry/…

ZXX

Steve Evans@steve_e·3d

Power matters as much, or more, than compute hardware. Due to high demand, lead times for high-power transformers have expanded dramatically in the U.S.: delivery typically took 24 to 30 months before 2020, but waiting periods can stretch to as long as five years today, according to Sightline Climate cited by Bloomberg. For AI data centers, this is a catastrophe as their deployment cycles are under 18 months. To address shortages, companies are turning to global markets. As a result, Canada, Mexico, and South Korea became the biggest suppliers of high-power transformers for AI data centers to AI data centers. At the same time, imports of high-power transformers from China surged from fewer than 1,500 units in 2022 to more than 8,000 units in 2025 through October, according to Wood Mackenzie data cited by Bloomberg.

English

Steve Evans@steve_e·4d

AI to improve customer experience and usability really works. What a shocker….. storyboard18.com/brand-marketin…

English

دریافت کریں

@phl43 @NateSilver538 @nikitabier @nytimes @mejitwo @SuB8u @pmarca @RevivalNoventas