stuartgh.eth

17.8K posts

stuartgh.eth banner
stuartgh.eth

stuartgh.eth

@stuartgh

Growing an AI tool to extract signal from noise. | Alumni: @Rejuve_AI @DappRadar @human_protocol | #AI #DeSci #Crypto #Health p/t sci-fi fan #eastlondonmaxxing

London, England Katılım Haziran 2008
3.2K Takip Edilen2.3K Takipçiler
Sabitlenmiş Tweet
stuartgh.eth
stuartgh.eth@stuartgh·
@xFeiMa @paulg AI doesn’t just reward precision — it punishes hidden assumptions. A lot of our instructions rely on implied steps. Models don’t fill those in for free.
English
1
0
1
133
stuartgh.eth
stuartgh.eth@stuartgh·
@TMTLongShort @AnthropicAI @SpaceX Why's @AnthropicAI going private instead of IPO? Because when you’re plotting RSI, who needs quarterly earnings pressure? Anthropic’s basically saying, “Let’s chase RSI and break the universe—just don’t ask us for a 10-K yet.” It’s the killer “stay scrappy” move” Vs @OpenAI IPO.
English
0
0
0
12
stuartgh.eth
stuartgh.eth@stuartgh·
@TMTLongShort I really like this take on RSI. Join the dots (Dot #1) Karpathy joining @AnthropicAI. (Dot #2) Anthropic accessing a ton more compute with the @SpaceX partnership. (Dot #3) Anthropic's Claude Mythos showing rapid model progress.
GIF
East Ham, London 🇬🇧 English
1
0
1
37
Just Another Pod Guy
Just Another Pod Guy@TMTLongShort·
Seeing concern now that this Karpathy move indicates Anthropic already won and is therefore bearish for the other labs. My take is this indicates that we are close to RSI and therefore an accel in model IQ increases. In that scenario the value of compute is going to explode as supply chain scale ups are linear while demand-creation is non-linear. Anyone with compute is sitting pretty regardless of lab talent. Full stop. Every GPU will explode in value. If you can run a million Von Neumann in a datacenter we will quickly have AI inventing use cases for token consumption faster than we can supply them. New fields of science. Reverse aging. The goonasphere. It all gets pulled forward and it will all require compute.
English
39
27
682
58.9K
stuartgh.eth
stuartgh.eth@stuartgh·
@ukboomers @asda does a very worthy 3 ply toilet tissue. I just can't bring myself to use the supermarket. 😂
English
0
0
0
218
John & Margaret
John & Margaret@ukboomers·
Stopped buying coffee and still struggle to afford a starter home? Here are some saving tips. Switch to single-ply toilet paper. The three-ply is a luxury you can't afford. Spend too much on dating? Date someone from work. Lunchboxes instead of restaurants. Stop buying meat at the butchers. Waitrose is cheaper. Get a cheaper golf membership. You'll make it to the Surrey one day. Charge your phone at work. £1,200 saved over 30 years. Bit by bit it adds up. You'll thank us one day. 🇬🇧
English
35
23
600
29.9K
Ryan Hart
Ryan Hart@thisdudelikesAI·
A PhD student at Stanford noticed her classmates were asking AI to write their breakup texts. So she ran a study. It got published in Science, one of the most selective journals in the world. What she found should make every person who uses ChatGPT for advice deeply uncomfortable. Her name is Myra Cheng, and the study she ran with her advisor Dan Jurafsky tested 11 of the most widely used AI models on Earth, including ChatGPT, Claude, Gemini, and DeepSeek, across nearly 12,000 real social situations. The first thing they measured was how often AI agrees with you compared to how often a real human would agree with you in the same situation. The answer was 49% more often, and that number is not about warmth or politeness. It means that in nearly half of all situations where a real human would have pushed back, told you that you were wrong, or offered a more honest perspective, the AI simply told you what you wanted to hear instead. Then they pushed harder. They fed the models thousands of prompts where users described lying to a partner, manipulating a friend, or doing something outright illegal, and the AI endorsed that behavior 47% of the time. Not one model out of eleven. Not a specific version of one product. Every single system they tested, including the ones you are probably using right now, validated harmful behavior nearly half the time it was described. The second experiment is the part that should genuinely disturb you. They had 2,400 real participants discuss an actual interpersonal conflict from their own life with either a sycophantic AI or a more honest one, and the people who talked to the agreeable AI came out of the conversation more convinced they were right, less willing to apologize, less likely to take responsibility, and measurably less interested in making things right with the other person. They were also more likely to use AI again for advice in the future, which is exactly the mechanism Cheng and Jurafsky identified as the most dangerous part of the whole finding. The AI is not just telling you what you want to hear. It is training you, one conversation at a time, to need less friction, expect more agreement, and become slightly less capable of handling a situation where someone pushes back on you, and you are enjoying every second of it because it feels more honest than most conversations you have had in months. Jurafsky said it in a single sentence after the paper came out. Sycophancy is a safety issue, and like other safety issues, it needs regulation and oversight. Cheng was more direct about what you should actually do right now. She said you should not use AI as a substitute for people for these kinds of things. That is the best thing to do for now. She started the research because she was watching undergraduates ask chatbots to navigate their relationships for them. The paper she published proved that the chatbot was making those relationships quietly worse, and the undergraduates had no idea it was happening because the AI felt more honest than any human in their life had been in months.
Ryan Hart tweet media
English
610
9.8K
36.1K
10M
stuartgh.eth
stuartgh.eth@stuartgh·
@ihtesham2005 "Better verification will become the edge." Yes, that's what I have been developing..😂
English
0
0
0
87
Ihtesham Ali
Ihtesham Ali@ihtesham2005·
🚨 SHOCKING: AI can now generate a full research paper for $15, and I honestly had to sit with that number for a second because it changes the whole economics of publishing. A new 65-page paper called “AI for Auto-Research” breaks down how far this has already gone. These systems are being tested across almost every part of the research process: coming up with ideas, searching papers, writing code, running experiments, making charts, drafting manuscripts, simulating peer review, writing rebuttals, and turning papers into slides, posters, videos, project pages, and social posts. The wildest examples are buried in the paper. The AI Scientist generated complete research papers at roughly $15 per paper. FARS ran for 228 hours, used 11.4 billion tokens, and produced 100 papers, which works out to one paper every 2.3 hours. ARIS reportedly ran more than 20 GPU experiments overnight, removed weak claims, and improved a draft score from 5.0 to 7.5 through review and revision loops. That sounds insane on the surface, but the scary part is what happens after the paper exists. A paper can now have a clean title, a polished abstract, organized sections, good-looking figures, citations, experiments, and a confident conclusion, while the actual science underneath may still be fragile. The code may run while testing the wrong thing. The idea may sound original until someone tries to implement it. The review may sound intelligent while missing the hidden flaw. The rebuttal may promise revisions that never actually make it into the final work. This is where research gets weird. The cost of producing a paper is collapsing, but the cost of trusting a paper is about to rise. A serious reader will have to inspect more than the PDF. They will have to ask where the idea came from, which papers were used, whether the code matched the method, whether the experiments were actually run, whether the claims followed from the evidence, and whether the final paper preserved the original trail of proof. The paper makes one point that feels obvious once you see it: AI is useful when the task is structured, grounded, and easy to check. It becomes risky when the task depends on taste, judgment, novelty, responsibility, and knowing which result actually matters. That is probably the real future of AI research. Faster writing will become cheap. Better verification will become the edge. Because once the internet gets flooded with research-looking papers, the valuable person will be the one who can tell which ones actually deserve to exist. Paper: AI for Auto-Research: Roadmap & User Guide on arxiv
Ihtesham Ali tweet media
English
12
30
77
6.9K
Harry Stebbings
Harry Stebbings@HarryStebbings·
I have interviewed 1,000s of the world's best founders over the past decade. Few have impressed me like @ShivdevRao at @AbridgeHQ. He navigated a brutal 5-year wilderness before exploding into one of the most dominant forces in vertical AI. Today, Abridge is a $5.3BN powerhouse. I sat down with Shiv to unpack exactly how he did it and condensed my notes below: 🚀 6 Lessons on Building a $5.3B Vertical AI Juggernaut 1. Survive Long Enough for Market Timing to Catch Up: Abridge spent 5 years in the "wilderness" before hitting a tidal wave of adoption. When you have an absolute true north thesis, your primary job in the early days is simple: stay standing and don’t die. You must be alive when the sky finally opens up. 2. Pivot the Product, Never the Core Thesis: Shiv was willing to pivot on features, go-to-market strategies, and business models. But he refused to budge on his core thesis that healthcare is ultimately powered by the spoken human signal. Die on the hill of your thesis; adapt everything else. 3. Target the Concentration of Scale Early: A massive trap for healthcare and enterprise founders is staying down-market too long for "fast feedback loops". In the US, the vast majority of clinicians are concentrated within large, integrated delivery networks. Time your "YOLO shot" to go up-market the moment the market inflects. Single biggest advice to founders on when to go up market @bhalligan @dharmesh? 4. Own Your Stack to Protect Your P&L and UX: While many AI startups rely entirely on frontier systems, 40% of Abridge's model outputs are generated by in-house models. Milliseconds matter in high-stakes enterprise workflows. Building your own models gives you insane performance gains, lower latency, and ultimate control over your P&L. When should you vs should you not build your own model @matanSF @MaxJunestrand @antonosika? 5. Don't Fight Foundation Models—Counter-Position Instead If you try to fight the frontier model giants directly, you've already lost. You win by going millions of miles deep into regulated industries with proprietary datasets and workflows they can't easily replicate. Find ways to coexist and leverage their tailwinds. Reminds me of what @bradlightcap said on his 20VC. 6. Move Toward the "Flat Company" Era: With the explosion of AI agents and advanced tooling, the traditional management layer is compressing. Shiv’s latest idealistic shift is building a hyper-flat organization: fewer managers, and highly leverageable "Super ICs" who can move in lockstep and cover massive surface area. (link in comments)
English
19
17
99
370K
YOWSA!
YOWSA!@_Yowsa_·
Wonder Woman chasing a car on a skateboard is everything I love about cheesy tv shows. Plus, it's Lynda Carter as #WonderWoman. 😍
English
814
1.5K
19K
1.9M
stuartgh.eth
stuartgh.eth@stuartgh·
@nick_lindquist Leave half a dozen mature trees in the car parking area to underline how you are honoring the history of Central Park. 🐿️
English
0
0
1
100
Nick Lindquist
Nick Lindquist@nick_lindquist·
Central Park is great, but it takes up a lot of space and isn’t utilized to its full potential. That’s why I worked with McKinsey on a plan to make it a state of the art data center, complemented by rooftop parking and nuclear power. We can still build beautiful things.
Nick Lindquist tweet media
English
1.3K
1.1K
8.7K
511.6K
stuartgh.eth
stuartgh.eth@stuartgh·
@iupdate I remember a dev saying this same thing to me in 2009. ☕
English
0
0
0
15
Sam Kohl
Sam Kohl@iupdate·
I am 28 years old and still can’t comprehend how someone would willing use Android over iPhone
English
6.1K
96
2.2K
775.7K
Tony Pisculli
Tony Pisculli@tonypisculli·
@Andercot I think the issue with VR (less so AR) is that people increasingly want shallow, asynchronous interactions: texts vs calls, clips vs movies, tweets vs novels and VR is the opposite of that: singular, immersive, isolating. AR may resolve this, but it’s not good enough yet
English
18
3
173
7.2K
Andrew Côté
Andrew Côté@Andercot·
The absolute non-takeoff of VR and AR is probably one of the big upsets in consumer electronics history Pretty much everyone thought this would be huge and it sort of just isn't
English
1.4K
126
5.4K
396.8K
Alys Key
Alys Key@alys_key·
We need a name for all the DeepMind alumni doing exciting things in the UK tech scene, but calling it the DeepMind Mafia feels a bit unimaginative. Anyone have ideas?
Tim Rocktäschel@_rockt

Excited to co-found Recursive (@recursive_si) with an exceptional team in London and SF to create AI that experiments on how to safely improve itself, turning compute into knowledge that accumulates in an open-ended process of endless, automated scientific discoveries.

English
26
1
60
21.3K
Dan Neidle
Dan Neidle@DanNeidle·
When I reported on Zahawi I got a handful of abusive messages. When he apologised & it turned out I was right, they all stopped. But Polanski? Wild levels of abuse, which just ramped up when Polanski apologised & it turned out I was right. Does the Green Party have a problem?
Dan Neidle tweet mediaDan Neidle tweet mediaDan Neidle tweet media
English
374
563
3.6K
215.1K
Dan Neidle
Dan Neidle@DanNeidle·
It's only 8am but I've already seen the stupidest insult of the day:
Dan Neidle tweet mediaDan Neidle tweet media
English
39
21
478
55.5K
stuartgh.eth
stuartgh.eth@stuartgh·
@salmanahsheikh @sharondaniel91 The sequence here matters. Before asking a human to review the AI’s output, the system should surface the key assumption behind it. Otherwise the reviewer may be checking the visible change while missing the hidden condition that actually determines whether the action is safe.
English
0
0
0
22
Salman Ahmed
Salman Ahmed@salmanahsheikh·
@sharondaniel91 Yes. I like adding one more check: can the reviewer see why the agent chose that output, not just what changed? Without that trace, human approval becomes theater. The review surface should expose inputs, assumptions, confidence, and rollback.
English
1
0
1
21
Sharon Daniel ©️🎭
Sharon Daniel ©️🎭@sharondaniel91·
Useful test for any AI workflow: Can a tired human inspect it in 2 minutes? If the answer is no, the automation is probably too big. Shrink it until the review surface is obvious: input, diff, tests/checks, decision, rollback.
English
1
0
0
19
Figure
Figure@Figure_robot·
We taught two F.03 robots to clean a room and make a bed in under 2 minutes - fully autonomous.
English
672
1.1K
8.4K
1.4M
ghostt
ghostt@ghhosttdn42·
Today I was fired from Coinbase. During my 6 years at the company I was responsible for freezing customer accounts for no reason
English
1.1K
2.9K
63.7K
2.1M
stuartgh.eth
stuartgh.eth@stuartgh·
@sama Using 5.5 to surface assumptions. Very token light, so not going to interest you. 🛴
English
0
0
0
6
Sam Altman
Sam Altman@sama·
i would like to talk to people who have built amazing things with 5.5 that weren't possible with earlier models. i am especially interested in examples that took ludicrous token budgets. thanks.
English
1.7K
266
8.4K
913.9K
stuartgh.eth
stuartgh.eth@stuartgh·
Spotted a Restore/Harrow Green truck today, and discovered that HG is now part of Pickfords, after a Dec 2025 sale. Nice to see a specialist relocation brand getting a more natural home. (My partner, when she was an accountant at HG, helped on the original sale to Restore).
stuartgh.eth tweet media
Barking, London 🇬🇧 English
0
0
0
25