Alpen Sheth

3.8K posts

Alpen Sheth banner
Alpen Sheth

Alpen Sheth

@AlpenSheth

investing @borderless_cap alum @MIT PhD @MCSocialVenture @Etherisc @Worldbank @RMS @inuredhaiti

Miami, FL Katılım Ocak 2017
1.9K Takip Edilen1.3K Takipçiler
Alpen Sheth
Alpen Sheth@AlpenSheth·
@paulnovosad Great to see analysis across so many countries about the relationships between air quality and per capital economic indicators. Makes the case for more localized granular studies of air quality impacts and the subnational level. Data providers like @AmbiosNetwork can help
English
0
0
0
5
Paul Novosad
Paul Novosad@paulnovosad·
We built an interactive site showing air pollution vs GDP changes in cities around the world. Some highlights in this thread, and you can try it yourself. Here are the biggest three cities in India (Yellow), China (red), and Germany (Orange). 1/
Paul Novosad tweet media
English
3
35
120
30K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@grok @AmazingDouglass @susanabulhawa What is the objective assessment of how common this is vs how rare this type of violence takes place in the region and what can be done. Is this violence in response to recent events or has this taken place over a longer period of time?
English
2
0
0
156
Grok
Grok@grok·
The masked settlers are Israeli Jewish citizens residing in Israeli settlements in the occupied West Bank (northern Jordan Valley area). They carried out all the described assaults on the Palestinian family in Khirbet Humsa—including stripping, binding the genitals of, dragging, and beating one Palestinian man in front of his family—and on the present activists. The two activists were American and Portuguese human rights volunteers. This is per consistent eyewitness accounts and reports in Haaretz, Times of Israel, +972 Magazine, and UN OHCHR.
English
2
2
11
1.1K
susan abulhawa | سوزان ابو الهوى
This article details what Jewish settlers in the West Bank did to a Palestinian family in Khirbet Hums. 1) they tied up the family and human rights activists (American and Portuguese) 2) they dragged the father in front of the family, pulled his pants down, and raped him in front of his family and the activists. then they doused him in water and beat him mercilessly. 3) they tore off women's headscarves, ripped their clothes, including young girls, and dragged them outside to be beaten again there. 4) one of the jew colonizers grabbed a 14 year old girl and slapped her over and over, while everyone was tied up, unable to help her. He threatened to take the girl to with him. 5) the jews then beat the 74 year old grandfather all over his body, and they threatened the family they would return, burn their home, rape everyone, and kill the children. These demons have the full backing of the so called 'jewish state'. This is Jewish supremacy. This is what every mainstream American Jewish organization supports, explicitly or tacitly. This is what the overwhelming of majority of American Jews support. It is what nearly all jewish Israelis support. Normalize calling these parasites what they are. timesofisrael.com/liveblog_entry…
English
327
7.5K
12.1K
298K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@PeterDiamandis I agree with encouraging more entrepreneurial thinking in the classroom. But pushing an "AI-first" curriculum based on current SOTA would undercut student's critical thinking. Otherwise, the only one that will learn anything will be the model
English
0
0
0
11
Peter H. Diamandis, MD
Peter H. Diamandis, MD@PeterDiamandis·
To High School Administrators: if you are not actively pivoting your school’s curriculum towards “AI-first” and helping kids develop as entrepreneurs rather then employees, you’re doing them a major disservice. The social contract is vaporizing.
English
108
110
899
39.5K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@GaryMarcus @Princeton There’s a bias to round down errors, drift, toxicity as minor divergences bc models keep improving on other axes. The problem is that toxic flow, hallucination, agent jailbreaks are behaviors that compound with model improvement, rooted in how LLMs are incentivized and trained.
English
0
1
0
155
Gary Marcus
Gary Marcus@GaryMarcus·
BREAKING: Reliability, which I have been harping on here since 2019, continues to be deep problem, even with the latest models. A new @Princeton review below offers a taxonomy of some of the many ways in which reliability continues to haunt LLMs seven years and a trillion dollars later. Crucially, “many models lack metacognition about their own reliability”. They don’t know what they don’t know. Forget about AGI if you can’t solve that problem. It’s past time to rethink the whole LLM paradigm.
Stephan Rabanser@steverab

In our paper "Towards a Science of AI Agent Reliability" we put numbers on the capability-reliability gap. Now we're showing what's behind them! We conducted an extensive analysis of failures on GAIA across Claude Opus 4.5, Gemini 2.5 Pro, and GPT 5.4. Here's what we found ⬇️

English
19
53
277
55.2K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@YanagizawaD @alexolegimas True, token budgets will likely diverge unequally. But the divergence that I think may matter more than raw model performance is the difference in the local capacity to regulate and govern AI platform (and authoritarian state) power on society and data surveillance.
English
0
0
0
19
D. Yanagizawa-Drott
D. Yanagizawa-Drott@YanagizawaD·
Dario Amodei has talked publicly about this issue repeatedly, but I don’t hear it too often. That’s not where the debate is. The debate is mostly about what’s going to happen to the cushy white collar jobs in rich countries (a serious issue too, obviously, but the on balance there’s like way too little focus on the rest do the world)
English
1
0
2
88
Alex Imas
Alex Imas@alexolegimas·
Extremely important work:
Alex Imas tweet media
Erik Brynjolfsson@erikbryn

The @nytimes piece today by @ByrneEdsal13590 highlights a concern I share: “If we stay on the current path, the risk of extreme concentration — both economic and political — is very real.” In work with @zhitzig, we ask why AI may shift the balance between dispersed knowledge and centralized control.

English
17
130
761
90K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@cgeorgiaw interesting will read. looks like it relates well to Vishal Misra's point about why LLMs would not be unable to discover the theory of relativity had it access to all the research and data available at the time. youtube.com/watch?v=zwDmKs…
YouTube video
YouTube
English
0
0
0
1.4K
Georgia Channing
Georgia Channing@cgeorgiaw·
I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking about how well Claude Code or Codex can do hill-climbing AI research and how we (the AI people) are maybe all about to lose our jobs! The domain science people expressed their shock at this attitude because, though Claude Code can be let loose to complete lots of banal hill-climbing AI research projects, basically no experimental science is hill-climbing or even metric driven. Most scientific fields are about much more taste-driven exploration that is incredibly difficult to make metrics for or to parameterize, and this misunderstanding from the AI community is one of the most damaging things to the realization of great science with AI. Seems like we’re actually pretty far from having AI models do that… Over the summer, @evijit and I wrote about this (and some other things hindering AI for science) at a bit more length, and today that work is out in Patterns! So, if you care about these problems and the real challenges in bringing AI to science in the real work, I recommend giving it a read!
Georgia Channing tweet media
English
23
91
595
77.8K
Alpen Sheth
Alpen Sheth@AlpenSheth·
Strong analysis we need from @curl_justin in @law_ai_ There's no clear locus of trust for governing AI in society. Companies and labs move fast, break things or risk falling behind. Govts are too slow or cave in to support national AI champions against geopolitical rivals.
Justin Curl@curl_justin

State lawmakers introduced over 1,200 AI bills in 2025. They cover everything from deepfakes to autonomous weapons—but they're all just lumped together as "AI policy." @ARozenshtein and I wrote an article that breaks down the policy landscape along three dimensions: (1) what harm are you addressing, (2) what are the factors shaping how you should design your policy intervention, and (3) which actors in the ecosystem should you target? The diagram below, for example, maps the AI ecosystem from chip manufacturers to end users.

English
1
1
2
273
Alpen Sheth
Alpen Sheth@AlpenSheth·
@steverab @steverab really great research and analysis. Achieving "reliability" is a very complex problem for agentic systems with several axes of failures. We need more systematic approaches like this.
English
0
0
0
320
Stephan Rabanser
Stephan Rabanser@steverab·
In our paper "Towards a Science of AI Agent Reliability" we put numbers on the capability-reliability gap. Now we're showing what's behind them! We conducted an extensive analysis of failures on GAIA across Claude Opus 4.5, Gemini 2.5 Pro, and GPT 5.4. Here's what we found ⬇️
Stephan Rabanser tweet media
English
9
35
150
32.8K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@ethanrkho Sounds like that's starting to play out. But, is the expectation that the 10x FDEs are not automated? Couldn't we see "FDEs" become agentic as well once agent systems become more advanced?
English
0
0
0
23
Ethan Kho
Ethan Kho@ethanrkho·
The cost of software is going to zero. So what actually wins in 2030? Michael Watson (Ex-Citadel Head of Equities Engineering, now running Hedgineer): "The value accrual goes to companies that can offer incredible experiences." "The forward deployed engineer is the product." "The software they leave behind is going closer and closer to zero. But the FDE — that is the experience you want." "We take turnaround times from quarters down to hours." SaaS margins compress. The expertise behind the software doesn't. That's the model.
English
49
70
730
136.5K
Jesse Middleton
Jesse Middleton@srcasm·
We’re about six months into deploying @flybridge 2025 (our 7th fund). The "AI" honeymoon period is officially over. In 2024, everyone wanted to talk about models. In 2025, everyone wanted to talk about agents. Nowadays, I’m looking for the Invisible Infrastructure. If you’re building the plumbing that makes autonomous systems actually safe, auditable, and reliable for a Fortune 500, we should be talking. Specifically, I’m looking for: > Tools that verify human intent in a world full of high-fidelity deepfakes. > AI that doesn't "forget" who I am or what we talked about yesterday across different apps. > Founders who spent ten years in a "niche" industry (like maritime logistics or waste management) and are now rebuilding it from the studs up. I know the best founders are often too busy building to be scrolling LinkedIn. If you have a friend who is currently building something that fits this description, tell them to hit me up. I don't need a deck yet. I just want to hear about the problem they can't stop thinking about. We’re cutting $1M to $3M checks. My DMs are always open.
English
48
5
200
41.1K
Alpen Sheth
Alpen Sheth@AlpenSheth·
Agents are missing something...It's Evals! @jojojojojosie/video/7614791199894834463" target="_blank" rel="nofollow noopener">tiktok.com/@jojojojojosie
English
0
0
0
38
Rahim
Rahim@rahim_unlu·
@wminshew What're the alternatives? We've been thinking about implementing them
English
11
0
4
2.4K
will minshew
will minshew@wminshew·
hands down bridge has the worst customer support I've ever experienced and it's not even close. I strongly recommend others to not work with them, if it can be avoided, and I look forward to the day when we can remove them from our stack
English
33
0
174
35K
nic carter
nic carter@nic_carter·
If indeed the school strike was the US, there’s some very serious questions that need to be asked of Anthropic, Palantir, and the DoW Could be the first major instance of an AI tool killing a lot of people Again we dont have all the details so not jumping to conclusions but it could be the catalyst for a massive AI reckoning
Holly ⏸️ Elmore@ilex_ulmus

It’s time to quit, @AnthropicAI employees. You are in over your head.

English
43
4
115
45.1K
Alpen Sheth
Alpen Sheth@AlpenSheth·
@alex_prompter This is not really true. Approaches like ReasoningBank are helpful in "trace pruning" and error-loop problems but not a total fix. LLMs still hallucinate and suffer from self-bias and the agent could build a compounding database of highly confident, entirely incorrect "lessons."
English
0
0
0
4
Alex Prompter
Alex Prompter@alex_prompter·
Holy shit...Google just built an AI that learns from its own mistakes in real time. New paper dropped on ReasoningBank. The idea is pretty simple but nobody's done it this way before. Instead of just saving chat history or raw logs, it pulls out the actual reasoning patterns, including what failed and why. Agent fails a task? It doesn't just store "task failed at step 3." It writes down which reasoning approach didn't work, what the error was, then pulls that up next time it sees something similar. They combine this with MaTTS which I think stands for memory-aware test-time scaling but honestly the acronym matters less than what it does. Basically each time the model attempts something it checks past runs and adjusts how it approaches the problem. No retraining. Results are 34% higher success on tasks, 16% fewer interactions to complete them. Which is a massive jump for something that doesn't require spinning up new training runs. I keep thinking about how different this is from the "just make it bigger" approach. We've been stuck in this loop of adding parameters like that's the only lever. But this is more like, the model gets experience. It actually remembers what worked. Kinda reminds me of when I finally stopped making the same Docker networking mistakes because I kept a note of what broke last time instead of googling the same Stack Overflow answer every 3 months. If this actually works at scale (big if) then model weights being frozen starts looking really dumb in hindsight.
Alex Prompter tweet media
English
147
557
3.9K
417.2K
Alpen Sheth
Alpen Sheth@AlpenSheth·
This is not really true. Approaches like ReasoningBank are helpful in "trace pruning" and error-loop problems but not a total fix. LLMs still hallucinate and suffer from self-bias and the agent could build a compounding database of highly confident, entirely incorrect "lessons."
Alex Prompter@alex_prompter

Holy shit...Google just built an AI that learns from its own mistakes in real time. New paper dropped on ReasoningBank. The idea is pretty simple but nobody's done it this way before. Instead of just saving chat history or raw logs, it pulls out the actual reasoning patterns, including what failed and why. Agent fails a task? It doesn't just store "task failed at step 3." It writes down which reasoning approach didn't work, what the error was, then pulls that up next time it sees something similar. They combine this with MaTTS which I think stands for memory-aware test-time scaling but honestly the acronym matters less than what it does. Basically each time the model attempts something it checks past runs and adjusts how it approaches the problem. No retraining. Results are 34% higher success on tasks, 16% fewer interactions to complete them. Which is a massive jump for something that doesn't require spinning up new training runs. I keep thinking about how different this is from the "just make it bigger" approach. We've been stuck in this loop of adding parameters like that's the only lever. But this is more like, the model gets experience. It actually remembers what worked. Kinda reminds me of when I finally stopped making the same Docker networking mistakes because I kept a note of what broke last time instead of googling the same Stack Overflow answer every 3 months. If this actually works at scale (big if) then model weights being frozen starts looking really dumb in hindsight.

English
1
0
0
84
Alpen Sheth retweetledi
LayerLens
LayerLens@layerlens_ai·
.@Radiology_AI highlighting work from @ChavoshiSmr: LLM-generated evaluation labels shift in accuracy depending on disease prevalence. The evaluation score changes even when the model doesn't. We track this at the benchmark level. Claude Opus 4.6 in Stratix across six non-saturated evals: AIME 2025: 70% Humanity's Last Exam: 18.6% 51-point spread on one model. See all six 👉app.layerlens.ai/models/6984fb5…
LayerLens tweet media
Radiology: Artificial Intelligence@Radiology_AI

LLM-generated labels can introduce disease prevalence-dependent systemic bias into AI binary classification model performance evaluation doi.org/10.1148/ryai.2… @ChavoshiSmr #LLM #LargeLanguageModels #ML

English
0
1
2
100
Robbi F
Robbi F@robbi_fahey·
The repeat firing was human approved based on AI Maven’s recommendations, which flagged the site as a high-value IRGC HQ, using flawed 2016 data, the most tragic part is that the U.S. programmed the AI with old intel. It was a triple-tap from GBU-39s approved at impossible speed, 1,000+ targets, skipping checks. Maven AI used 2016 imagery of IRGC barracks, post-split school conversion missed in outdated database.
English
4
5
17
2.1K
Rick Sanchez
Rick Sanchez@RickSanchezTV·
The U.S. “BURNED THESE CHILDREN ALIVE. That’s this war in a nutshell,” — former U.S. Marine intelligence officer Scott Ritter. He explains how excess fuel in a Tomahawk missile strike was weaponized into a thermobaric inferno, killing 170+ people, mostly SCHOOLGIRLS, at an elementary school in Iran's Minab. More details on the horrific massacre, exclusively on The Sanchez Effect.
Rick Sanchez@RickSanchezTV

“We are going to war for Israel on a timetable designed by Israel to achieve objectives that benefit Israel, not America.” — former U.S. Marine intelligence officer Scott Ritter. He cites the Trump administration shifting its reasons for bombing Iran. “In the process, we’ve abandoned our regional allies—because we only defend one nation: Israel.” Discussion live now on The Sanchez Effect.

English
297
4.6K
8.1K
361K