Batu El

26 posts

Batu El banner
Batu El

Batu El

@elb4tu

phd @stanford

Palo Alto, CA Katılım Aralık 2024
180 Takip Edilen161 Takipçiler
Batu El retweetledi
Alex Krentsel
Alex Krentsel@AlexKrentsel·
Very excited about our workshop on AI Agents for Discovery in the Wild 🐾🦒🐅, happening *tomorrow*, Tuesday, May 26th 9am-5pm, as part of CAIS '26 in San Jose. We were blown away by all of the excellent submissions we got…(1/n)
English
3
12
38
15.5K
Batu El retweetledi
Batu El retweetledi
Haotian Ye
Haotian Ye@haotian_yeee·
🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning
Haotian Ye tweet media
English
10
44
148
54K
Batu El retweetledi
Erica
Erica@ericavaneee·
We built TERMS-Bench, a three-tier benchmark for LLM agents in real-world economic negotiation. No LLM-as-judge, no outcome rubrics: the environment itself is the verifier. 🏆Among frontier models, @AnthropicAI Claude Opus 4.6 #1, @Zai_org GLM 5.1 #2. ✨Surprisingly strong: @GoogleDeepMind @googlegemma Gemma 4 31B — best open-weight, holds up as negotiations get harder. 🔗 terms-bench.github.io
Erica tweet media
English
21
27
234
33.9K
Batu El retweetledi
Aneesh Pappu
Aneesh Pappu@aneeshpappu·
Excited to share our work “Multi-Agent Teams Hold Experts Back” was accepted to #ICML2026 🚀 Thank you to wonderful collaborators @james_y_zou @elb4tu @CaoHancheng Carmelo di Nolfo @sun_yanchao and Meng Cao for making my first PhD project such a fun experience!
Aneesh Pappu@aneeshpappu

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English
2
13
49
17.2K
Batu El
Batu El@elb4tu·
4/ I also wrote a companion blog post sharing my own perspective on the opportunities and risks of influencing intelligent agents. It's meant to complement the paper 📎 ellabs.ai/#/blog (my personal website)
English
0
0
0
83
Batu El
Batu El@elb4tu·
3/ We argue that as AI systems become more capable of shaping how people think, decide, and act, cognitive security needs to be treated as a priority in AI development. 🔗cstf.dev/blog/ai-develo…
English
1
0
1
91
Batu El
Batu El@elb4tu·
1/ I am presenting our position paper "AI Development Should Prioritize Cognitive Security" at #ICLR2026 "Agents in the Wild: Safety, Security, and Beyond" workshop. If you're around, come say hi.
Batu El tweet media
English
1
5
32
2.3K
Batu El
Batu El@elb4tu·
5/ If this sounds interesting, come chat with me at Pavilion 4, P4-4208 between 10:30am and 1pm (Rio time 🇧🇷).
English
0
1
2
102
Batu El
Batu El@elb4tu·
1/ Today I’m presenting our paper cost-of-pass at #ICLR2026! How does the cost of solving cognitive tasks change with innovations in LLMs? We introduce cost-of-pass and show something that looks like Moore's law for the cost of cognitive labor.
English
3
4
13
1.7K
Batu El retweetledi
James Zou
James Zou@james_y_zou·
We put AI agents through teamwork exercises that humans do—and the results surprised us. Unlike humans, agent teams consistently fail to match their expert teammate’s performance, even when explicitly told who the expert is. We explain why this happens in our new paper👇 Great work led by @aneeshpappu and awesome collaborators!
Aneesh Pappu@aneeshpappu

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English
0
8
62
14K
Batu El retweetledi
Aneesh Pappu
Aneesh Pappu@aneeshpappu·
Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵
Aneesh Pappu tweet media
English
5
16
80
35.4K
Batu El
Batu El@elb4tu·
When a scientist tackles an open problem, they usually don’t make repeated attempts with fixed knowledge. Each attempt reveals something new about the problem and refines their understanding. Similarly, test-time training lets language models learn from their attempts and make new discoveries on open scientific problems. great work by @mertyuksekgonul
English
1
0
15
473
Batu El retweetledi
James Zou
James Zou@james_y_zou·
Key ideas: 1. enable model to update its parameters to learn as it searches the solution space 2. new training objective that encourages finding a single great solution instead of avg perf Fully open source test-time-training.github.io/discover/ Led by brilliant @mertyuksekgonul + awesome collaborators
English
1
6
47
3.9K
Batu El retweetledi
James Zou
James Zou@james_y_zou·
Standard AI learns to imitate. We introduce a new framework that trains AI to make new discoveries in science + engineering. Learning-to-discover + open source LM led to: 🥇best new bound on Erdos min overlap problem 🥇fastest GPU kernels 🥇better single-cell denoising + more!
James Zou tweet media
English
15
121
734
66.8K