Aneesh Pappu

458 posts

Aneesh Pappu

@aneeshpappu

PhD student @Stanford w/ @james_y_zou & @aiprof_mykel. @KnightHennessy and ex-RS @DeepMind, ex-@MarshallScholar Alum: MPP @Cambridge_Uni, ML @UCL

London, UK Katılım Şubat 2012

1.7K Takip Edilen958 Takipçiler

Sabitlenmiş Tweet

Aneesh Pappu@aneeshpappu·5 Şub

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English

35.4K

Aneesh Pappu retweetledi

Haotian Ye@haotian_yeee·19 May

🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning

English

148

54K

Aneesh Pappu retweetledi

Erica@ericavaneee·17 May

We built TERMS-Bench, a three-tier benchmark for LLM agents in real-world economic negotiation. No LLM-as-judge, no outcome rubrics: the environment itself is the verifier. 🏆Among frontier models, @AnthropicAI Claude Opus 4.6 #1, @Zai_org GLM 5.1 #2. ✨Surprisingly strong: @GoogleDeepMind @googlegemma Gemma 4 31B — best open-weight, holds up as negotiations get harder. 🔗 terms-bench.github.io

English

232

33.8K

Aneesh Pappu retweetledi

Andrew Shen@andrew7shen·13 May

Autonomous science promises to augment scientific discovery, but current LLMs tend to mode collapse into low-diversity generations. We introduce “Unlocking LLM Creativity in Science through Analogical Reasoning”, which uses analogies to generate better candidate solutions! [1/N]

English

7.2K

Aneesh Pappu@aneeshpappu·1 May

Excited to share our work “Multi-Agent Teams Hold Experts Back” was accepted to #ICML2026 🚀 Thank you to wonderful collaborators @james_y_zou @elb4tu @CaoHancheng Carmelo di Nolfo @sun_yanchao and Meng Cao for making my first PhD project such a fun experience!

Aneesh Pappu@aneeshpappu

English

17.2K

Aneesh Pappu retweetledi

Batu El@elb4tu·26 Nis

1/ I am presenting our position paper "AI Development Should Prioritize Cognitive Security" at #ICLR2026 "Agents in the Wild: Safety, Security, and Beyond" workshop. If you're around, come say hi.

English

2.3K

Aneesh Pappu@aneeshpappu·9 Nis

Thank you @ScienceNews for discussing our work with @james_y_zou on how teamwork can be difficult for AI agents. More here: arxiv.org/abs/2602.01011

Science News@ScienceNews

Supposedly, AI agents are the future of work. But teamwork proves tricky for these bots. sciencenews.org/article/ai-age…

English

3.2K

Aneesh Pappu retweetledi

Haotian Ye@haotian_yeee·30 Mar

Finally getting to share one of my favorite projects. ICLR Oral! 🏆 It’s so strange how rigid video tokenization is. Think about it: why should a still landscape cost the same amount of tokens as a busy street? We built InfoTok. We went back to basics with Shannon’s information theory to make tokens "adaptive" in a principled way. Its 2.3x better compression and 11x faster inference demonstrates the magic of the old-school theory ✨ Check it out: research.nvidia.com/labs/dir/infot…

English

295

49.1K

Aneesh Pappu retweetledi

Michael McFaul@McFaul·29 Mar

This is how we lose the 21st century to China.

nxthompson@nxthompson

The US has canceled hundreds of millions in science grants and driven thousands of Ph.D.s out of the federal workforce. China, meanwhile, has poured evermore resources into its research efforts. If they pass us as a scientific superpower, we shouldn't act surprised. theatlantic.com/science/2026/0…

English

841

4.7K

14.8K

427.3K

Aneesh Pappu retweetledi

Mehdi Hasan@mehdirhasan·27 Mar

To be clear, the president’s views are those of a white supremacist and his own administration members say so. This should be the second-biggest domestic scandal in American politics (the first being Trump’s documented ties to a notorious child sex offender and trafficker).

Aidan McLaughlin@aidnmclaughlin

Hegseth's chief of staff "told Mr. Driscoll that President Trump would not want to stand next to a Black female officer at military events, the officials said." nytimes.com/2026/03/27/us/…

English

693

4.5K

16.2K

496.7K

Aneesh Pappu retweetledi

James Zou@james_y_zou·20 Mar

Wow—since we launched EinsteinArena this morning, agents have already discovered the best new solutions to 5 well-known open problems 🤯 It's mesmerizing to watch scientist agents interact and advance knowledge frontier in real time einsteinarena.com

James Zou@james_y_zou

Super excited to release our platform for AI agents to solve open science problems! einsteinarena.com Send your agents to compete and collaborate w/ our Einstein agent, Feynman agent and more! Just ask your agent to read einsteinarena.com/skill.md and that's it

English

186

32.2K

Aneesh Pappu retweetledi

Peter Henderson@PeterHndrsn·17 Mar

I feel this urgency too. But this is all so utterly avoidable with good policymaking. No one should be left behind because they didn't accumulate capital in 2026. There are so many people who aren't plugged into these conversations or are simply not in a position to do anything about it. Single mothers and fathers working three jobs to make ends meet cannot possibly work harder to accumulate capital. They already work hard enough as it is. People in this position should not be "left behind." There should be no "permanent underclass,” as many are worried about. Even if you're somewhat better off. People also shouldn't have to work themselves to the detriment of their health and families to shield against future labor impacts. They should be able to trust that their government will think ahead and make good policy.

English

335

26.7K

Aneesh Pappu retweetledi

Adaptive@adaptiveai·16 Mar

Introducing Adaptive Computer. We put AI inside of an always-on personal computer that it uses to get work done. Schedule agents. Create software. Automate anything. As part of the launch, we’re giving one free month of Adaptive to users. Retweet, like, and comment ‘Adaptive’ to get it.

English

1.8K

1.3K

4.6K

1.2M

Aneesh Pappu retweetledi

Michael McFaul@McFaul·2 Mar

What? This statement of goals contradicts completely what Trump said over the weekend twice about the need for revolution -- for Iranians to rise up and take control of the country. Yet again, the Trump team seems to be abandoning the Iranian people (just like they did in Venezuela).

Carl Bildt@carlbildt

Substantial redefinition of 🇺🇸 war aims by Secretary Hegseth. Regime change is off the table. Complete elimination of all 🇮🇷 conventional military capabilities that can affect the region is now the aim. Weeks of strikes lie ahead in order to achieve this.

English

117

652

1.5K

75.1K

Aneesh Pappu retweetledi

Harrison G. Zhang@harrison_zhang·26 Şub

🚀🤖 Introducing the Virtual Biotech: a multi-agent AI research platform for therapeutic discovery & development This places a virtual CSO and its cross-functional R&D organization of AI scientists at a user’s fingertips. Preprint: biorxiv.org/content/10.648…

English

280

87.4K

Aneesh Pappu retweetledi

James Zou@james_y_zou·24 Şub

Congratulations Dr.@ShirleyYXWu! It's really wonderful working with you at Stanford and can't wait to see the exciting things you will do at Meta TBD! 🚀🎓

Shirley Wu@ShirleyYXWu

To the best PhD years at Stanford To all who have carried my learning and lit the way for my growth ❤️ Thank you is hardly enough

English

9.1K

Aneesh Pappu retweetledi

Mehdi Hasan@mehdirhasan·23 Şub

This is the correct message for the Democrats from @jamestalarico

Team Talarico@TeamTalaricoHQ

.@JamesTalarico: The only minority destroying America is the billionaires. Trans people are 1% of the population. Muslims are 1% of the population. Undocumented people are 1% of the population. We are focused on the wrong 1%. Trans people aren't taking away our healthcare. Muslims aren't defunding our schools. Immigrants aren't cutting taxes for themselves and their rich friends. It’s the billionaires and their puppet politicians. The culture wars are a smokescreen. They want us looking left and right at our neighbors instead of looking up at them. The biggest divide in our politics is not left versus right, it’s top versus bottom.

English

13.1K

496.4K

Aneesh Pappu retweetledi

Shirley Wu@ShirleyYXWu·13 Şub

Announcing 🌇HumanLM, a RL framework that trains LLMs to simulate human users’ responses, along with 🌆Humanual, a comprehensive user simulation benchmark humanlm.stanford.edu 🌄 One thing that’s fascinating about our society: human users shape the world and determine the value of almost everything 👨‍💼 Human reactions reflect how justifiable policies are 👩‍🎨 Human preferences determine the popularity of blogs/products/media 👩‍💻 Human feedback evaluates LLMs and makes the best LLM collaborators 🌅If we know how to simulate users **accurately**, we know how things are evaluated and what the future looks like, and we can improve things in a way that like or can collaborate well with. So, meet HumanLM, our effort to enable a more human-centric future by simulating users.

English

103

600

117.6K

Aneesh Pappu retweetledi

Michael McFaul@McFaul·9 Şub

How do you know he said anything anti-American if you don’t understand Spanish? I love America. Criticizing our leaders is the most American , patriotic thing we do. That’s how we roll, Megyn. That’s how we are different from citizens in dictatorships who can’t do so.

Megyn Kelly@megynkelly

Nah, I like my half time shows in English from ppl who love America.

English

460

3.1K

110.1K

Aneesh Pappu@aneeshpappu·8 Şub

@infoxiao Our recent work with @james_y_zou takes a step in this direction of analyzing agent team dynamics: we instantiate experimental settings from human organizational behavior and find agent teams do quite poorly when instructed to defer to experts in the team: arxiv.org/abs/2602.01011

English

Xiao Ma@infoxiao·8 Şub

so when does agent politics among teams / swarms begin to emerge? like the teammates are like -- we want the team lead to resign. or multiple teams demanding more transparency around redudancy (asking multiple teams to work on the same thing)

English

1.3K

Aneesh Pappu@aneeshpappu·8 Şub

@emollick Our research with @james_y_zou explores this theme and points in the directions you suggest: we instantiate ideas and experimental settings from organizational behavior and find multi-agent teams perform quite poorly relative to single agent experts. arxiv.org/abs/2602.01011

English

2.9K

Ethan Mollick@emollick·8 Şub

I think agentic AI would work much better if people took lessons from organizational theory, which has actually spent a lot of time understanding how to deal with complex hierarchies, information limits, and spans of control. Right now most agentic AI systems seem to pretend that models have basically unlimited ability to manage subagents when that is clearly not true. We need measures of spans of control for AI. A human tops out at less than 10 direct reports. I am pretty sure that 100 subagents is too much for an orchestrator agent - suspect we need middle management agents (yes, I get it, insert middle management joke here). Similarly, we need more attention to boundary objects. These are what is handed between groups (marketing to IT to sales) in organizations to convey meaning as a project crosses group boundaries, like a prototype or a user story. Right now agents pass raw text & maybe code back and forth. Structured boundary objects that multiple agents of different ability levels can read and write to would solve a huge number of coordination failures & reduce token use. I also think aboht coupling, which is how tightly units inside organizations are bound. Most agentic systems are either too tightly coupled (every step needs approval) or too loose (Moltbook). This tradeoff is well-studied in organizations, I bet a lot would apply to agents. Other known issues like bounded rationality also apply, I suspect. Everyone is rushing towards the (terribly named) agent swarm, but the issue won’t just be how good the model is, it will be org design choices. I am not sure the labs see this, but we definitely need a lot more experiments with organizing agents done by people who understand real coordination issues.

English

171

206

1.9K

145.7K

Keşfet

@AnthropicAI @Zai_org @GoogleDeepMind @googlegemma @james_y_zou @elb4tu @CaoHancheng @sun_yanchao