Aneesh Pappu

458 posts

Aneesh Pappu banner
Aneesh Pappu

Aneesh Pappu

@aneeshpappu

PhD student @Stanford w/ @james_y_zou & @aiprof_mykel. @KnightHennessy and ex-RS @DeepMind, ex-@MarshallScholar Alum: MPP @Cambridge_Uni, ML @UCL

London, UK Katılım Şubat 2012
1.7K Takip Edilen958 Takipçiler
Sabitlenmiş Tweet
Aneesh Pappu
Aneesh Pappu@aneeshpappu·
Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵
Aneesh Pappu tweet media
English
5
16
80
35.4K
Aneesh Pappu retweetledi
Haotian Ye
Haotian Ye@haotian_yeee·
🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning
Haotian Ye tweet media
English
10
44
148
54K
Aneesh Pappu retweetledi
Erica
Erica@ericavaneee·
We built TERMS-Bench, a three-tier benchmark for LLM agents in real-world economic negotiation. No LLM-as-judge, no outcome rubrics: the environment itself is the verifier. 🏆Among frontier models, @AnthropicAI Claude Opus 4.6 #1, @Zai_org GLM 5.1 #2. ✨Surprisingly strong: @GoogleDeepMind @googlegemma Gemma 4 31B — best open-weight, holds up as negotiations get harder. 🔗 terms-bench.github.io
Erica tweet media
English
21
27
232
33.8K
Aneesh Pappu retweetledi
Andrew Shen
Andrew Shen@andrew7shen·
Autonomous science promises to augment scientific discovery, but current LLMs tend to mode collapse into low-diversity generations. We introduce “Unlocking LLM Creativity in Science through Analogical Reasoning”, which uses analogies to generate better candidate solutions! [1/N]
Andrew Shen tweet media
English
5
15
71
7.2K
Aneesh Pappu
Aneesh Pappu@aneeshpappu·
Excited to share our work “Multi-Agent Teams Hold Experts Back” was accepted to #ICML2026 🚀 Thank you to wonderful collaborators @james_y_zou @elb4tu @CaoHancheng Carmelo di Nolfo @sun_yanchao and Meng Cao for making my first PhD project such a fun experience!
Aneesh Pappu@aneeshpappu

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English
2
13
49
17.2K
Aneesh Pappu retweetledi
Batu El
Batu El@elb4tu·
1/ I am presenting our position paper "AI Development Should Prioritize Cognitive Security" at #ICLR2026 "Agents in the Wild: Safety, Security, and Beyond" workshop. If you're around, come say hi.
Batu El tweet media
English
1
5
32
2.3K
Aneesh Pappu retweetledi
Haotian Ye
Haotian Ye@haotian_yeee·
Finally getting to share one of my favorite projects. ICLR Oral! 🏆 It’s so strange how rigid video tokenization is. Think about it: why should a still landscape cost the same amount of tokens as a busy street? We built InfoTok. We went back to basics with Shannon’s information theory to make tokens "adaptive" in a principled way. Its 2.3x better compression and 11x faster inference demonstrates the magic of the old-school theory ✨ Check it out: research.nvidia.com/labs/dir/infot…
English
10
43
295
49.1K
Aneesh Pappu retweetledi
Mehdi Hasan
Mehdi Hasan@mehdirhasan·
To be clear, the president’s views are those of a white supremacist and his own administration members say so. This should be the second-biggest domestic scandal in American politics (the first being Trump’s documented ties to a notorious child sex offender and trafficker).
Aidan McLaughlin@aidnmclaughlin

Hegseth's chief of staff "told Mr. Driscoll that President Trump would not want to stand next to a Black female officer at military events, the officials said." nytimes.com/2026/03/27/us/…

English
693
4.5K
16.2K
496.7K
Aneesh Pappu retweetledi
James Zou
James Zou@james_y_zou·
Wow—since we launched EinsteinArena this morning, agents have already discovered the best new solutions to 5 well-known open problems 🤯 It's mesmerizing to watch scientist agents interact and advance knowledge frontier in real time einsteinarena.com
James Zou@james_y_zou

Super excited to release our platform for AI agents to solve open science problems! einsteinarena.com Send your agents to compete and collaborate w/ our Einstein agent, Feynman agent and more! Just ask your agent to read einsteinarena.com/skill.md and that's it

English
6
22
186
32.2K
Aneesh Pappu retweetledi
Peter Henderson
Peter Henderson@PeterHndrsn·
I feel this urgency too. But this is all so utterly avoidable with good policymaking. No one should be left behind because they didn't accumulate capital in 2026. There are so many people who aren't plugged into these conversations or are simply not in a position to do anything about it. Single mothers and fathers working three jobs to make ends meet cannot possibly work harder to accumulate capital. They already work hard enough as it is. People in this position should not be "left behind." There should be no "permanent underclass,” as many are worried about. Even if you're somewhat better off. People also shouldn't have to work themselves to the detriment of their health and families to shield against future labor impacts. They should be able to trust that their government will think ahead and make good policy.
Peter Henderson tweet media
English
8
42
335
26.7K
Aneesh Pappu retweetledi
Adaptive
Adaptive@adaptiveai·
Introducing Adaptive Computer. We put AI inside of an always-on personal computer that it uses to get work done. Schedule agents. Create software. Automate anything. As part of the launch, we’re giving one free month of Adaptive to users. Retweet, like, and comment ‘Adaptive’ to get it.
English
1.8K
1.3K
4.6K
1.2M
Aneesh Pappu retweetledi
Michael McFaul
Michael McFaul@McFaul·
What? This statement of goals contradicts completely what Trump said over the weekend twice about the need for revolution -- for Iranians to rise up and take control of the country. Yet again, the Trump team seems to be abandoning the Iranian people (just like they did in Venezuela).
Carl Bildt@carlbildt

Substantial redefinition of 🇺🇸 war aims by Secretary Hegseth. Regime change is off the table. Complete elimination of all 🇮🇷 conventional military capabilities that can affect the region is now the aim. Weeks of strikes lie ahead in order to achieve this.

English
117
652
1.5K
75.1K
Aneesh Pappu retweetledi
Harrison G. Zhang
Harrison G. Zhang@harrison_zhang·
🚀🤖 Introducing the Virtual Biotech: a multi-agent AI research platform for therapeutic discovery & development This places a virtual CSO and its cross-functional R&D organization of AI scientists at a user’s fingertips. Preprint: biorxiv.org/content/10.648…
Harrison G. Zhang tweet media
English
22
49
280
87.4K
Aneesh Pappu retweetledi
Mehdi Hasan
Mehdi Hasan@mehdirhasan·
This is the correct message for the Democrats from @jamestalarico
Team Talarico@TeamTalaricoHQ

.@JamesTalarico: The only minority destroying America is the billionaires. Trans people are 1% of the population. Muslims are 1% of the population. Undocumented people are 1% of the population. We are focused on the wrong 1%. Trans people aren't taking away our healthcare. Muslims aren't defunding our schools. Immigrants aren't cutting taxes for themselves and their rich friends. It’s the billionaires and their puppet politicians. The culture wars are a smokescreen. They want us looking left and right at our neighbors instead of looking up at them. The biggest divide in our politics is not left versus right, it’s top versus bottom.

English
0
2K
13.1K
496.4K
Aneesh Pappu retweetledi
Shirley Wu
Shirley Wu@ShirleyYXWu·
Announcing 🌇HumanLM, a RL framework that trains LLMs to simulate human users’ responses, along with 🌆Humanual, a comprehensive user simulation benchmark humanlm.stanford.edu 🌄 One thing that’s fascinating about our society: human users shape the world and determine the value of almost everything 👨‍💼 Human reactions reflect how justifiable policies are 👩‍🎨 Human preferences determine the popularity of blogs/products/media 👩‍💻 Human feedback evaluates LLMs and makes the best LLM collaborators 🌅If we know how to simulate users **accurately**, we know how things are evaluated and what the future looks like, and we can improve things in a way that like or can collaborate well with. So, meet HumanLM, our effort to enable a more human-centric future by simulating users.
Shirley Wu tweet media
English
28
103
600
117.6K
Aneesh Pappu
Aneesh Pappu@aneeshpappu·
@infoxiao Our recent work with @james_y_zou takes a step in this direction of analyzing agent team dynamics: we instantiate experimental settings from human organizational behavior and find agent teams do quite poorly when instructed to defer to experts in the team: arxiv.org/abs/2602.01011
English
0
0
1
75
Xiao Ma
Xiao Ma@infoxiao·
so when does agent politics among teams / swarms begin to emerge? like the teammates are like -- we want the team lead to resign. or multiple teams demanding more transparency around redudancy (asking multiple teams to work on the same thing)
English
4
0
16
1.3K
Aneesh Pappu
Aneesh Pappu@aneeshpappu·
@emollick Our research with @james_y_zou explores this theme and points in the directions you suggest: we instantiate ideas and experimental settings from organizational behavior and find multi-agent teams perform quite poorly relative to single agent experts. arxiv.org/abs/2602.01011
English
1
3
22
2.9K
Ethan Mollick
Ethan Mollick@emollick·
I think agentic AI would work much better if people took lessons from organizational theory, which has actually spent a lot of time understanding how to deal with complex hierarchies, information limits, and spans of control. Right now most agentic AI systems seem to pretend that models have basically unlimited ability to manage subagents when that is clearly not true. We need measures of spans of control for AI. A human tops out at less than 10 direct reports. I am pretty sure that 100 subagents is too much for an orchestrator agent - suspect we need middle management agents (yes, I get it, insert middle management joke here). Similarly, we need more attention to boundary objects. These are what is handed between groups (marketing to IT to sales) in organizations to convey meaning as a project crosses group boundaries, like a prototype or a user story. Right now agents pass raw text & maybe code back and forth. Structured boundary objects that multiple agents of different ability levels can read and write to would solve a huge number of coordination failures & reduce token use. I also think aboht coupling, which is how tightly units inside organizations are bound. Most agentic systems are either too tightly coupled (every step needs approval) or too loose (Moltbook). This tradeoff is well-studied in organizations, I bet a lot would apply to agents. Other known issues like bounded rationality also apply, I suspect. Everyone is rushing towards the (terribly named) agent swarm, but the issue won’t just be how good the model is, it will be org design choices. I am not sure the labs see this, but we definitely need a lot more experiments with organizing agents done by people who understand real coordination issues.
English
171
206
1.9K
145.7K