Batu El (@elb4tu) - Twitter Profili | Zamantika Mersobahis Locabet

Batu El retweetledi

Want to hear about the hot takes from thought leaders in the field? ‼️ Coming next, we have a very very exciting panel with @profjoeyg @KarandikarSagar and Martin Mass 🎙️

Shubham Agarwal@shagarw21

@james_y_zou's talk is on, an exciting look at how AI agents are starting to function as collaborative scientists, not just assistants, especially for biomedical discovery and healthcare research. @CAISconf @Stanford @StanfordHAI

English

0

5

25

7K

Batu El retweetledi

Alex Krentsel@AlexKrentsel·1d

Very excited about our workshop on AI Agents for Discovery in the Wild 🐾🦒🐅, happening *tomorrow*, Tuesday, May 26th 9am-5pm, as part of CAIS '26 in San Jose. We were blown away by all of the excellent submissions we got…(1/n)

English

3

12

38

15.5K

Batu El retweetledi

James Zou@james_y_zou·19 May

Scaling evaluations—not just compute—is critical for AI-driven science. SimpleTES introduces a new framework to scale discovery loops, finding new SOTA solutions across 21 open science problems. Including: • >2× faster LASSO algorithm • more efficient quantum routing + more! Great work led by @haotian_yeee and wonderful collaborators!

Haotian Ye@haotian_yeee

🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning

English

1

19

78

20K

Batu El retweetledi

Haotian Ye@haotian_yeee·19 May

🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning

English

10

44

148

54K

Batu El retweetledi

Erica@ericavaneee·17 May

We built TERMS-Bench, a three-tier benchmark for LLM agents in real-world economic negotiation. No LLM-as-judge, no outcome rubrics: the environment itself is the verifier. 🏆Among frontier models, @AnthropicAI Claude Opus 4.6 #1, @Zai_org GLM 5.1 #2. ✨Surprisingly strong: @GoogleDeepMind @googlegemma Gemma 4 31B — best open-weight, holds up as negotiations get harder. 🔗 terms-bench.github.io

English

21

27

234

33.9K

Batu El retweetledi

Aneesh Pappu@aneeshpappu·1 May

Excited to share our work “Multi-Agent Teams Hold Experts Back” was accepted to #ICML2026 🚀 Thank you to wonderful collaborators @james_y_zou @elb4tu @CaoHancheng Carmelo di Nolfo @sun_yanchao and Meng Cao for making my first PhD project such a fun experience!

Aneesh Pappu@aneeshpappu

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English

2

13

49

17.2K

Batu El@elb4tu·26 Nis

4/ I also wrote a companion blog post sharing my own perspective on the opportunities and risks of influencing intelligent agents. It's meant to complement the paper 📎 ellabs.ai/#/blog (my personal website)

English

0

83

Batu El@elb4tu·26 Nis

3/ We argue that as AI systems become more capable of shaping how people think, decide, and act, cognitive security needs to be treated as a priority in AI development. 🔗cstf.dev/blog/ai-develo…

English

1

0

1

91

Batu El@elb4tu·26 Nis

1/ I am presenting our position paper "AI Development Should Prioritize Cognitive Security" at #ICLR2026 "Agents in the Wild: Safety, Security, and Beyond" workshop. If you're around, come say hi.

English

1

5

32

2.3K

Batu El@elb4tu·24 Nis

5/ If this sounds interesting, come chat with me at Pavilion 4, P4-4208 between 10:30am and 1pm (Rio time 🇧🇷).

English

0

1

2

102

Batu El@elb4tu·24 Nis

4/ Joint work with @mhamzaerol Mirac Suzgun @mertyuksekgonul @james_y_zou

English

1

96

Batu El@elb4tu·24 Nis

1/ Today I’m presenting our paper cost-of-pass at #ICLR2026! How does the cost of solving cognitive tasks change with innovations in LLMs? We introduce cost-of-pass and show something that looks like Moore's law for the cost of cognitive labor.

English

3

4

13

1.7K

Batu El retweetledi

James Zou@james_y_zou·5 Şub

We put AI agents through teamwork exercises that humans do—and the results surprised us. Unlike humans, agent teams consistently fail to match their expert teammate’s performance, even when explicitly told who the expert is. We explain why this happens in our new paper👇 Great work led by @aneeshpappu and awesome collaborators!

Aneesh Pappu@aneeshpappu

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English

0

8

62

14K

Batu El retweetledi

Aneesh Pappu@aneeshpappu·5 Şub

Most modern multi-agent systems use pre-specified workflows, fixed roles, and aggregation rules. As agents handle increasingly complex tasks, what happens when we can't specify optimal workflows ahead of time? We study this in our work "Multi-Agent Teams Hold Experts Back" 🧵

English

5

16

80

35.4K

Batu El@elb4tu·23 Oca

When a scientist tackles an open problem, they usually don’t make repeated attempts with fixed knowledge. Each attempt reveals something new about the problem and refines their understanding. Similarly, test-time training lets language models learn from their attempts and make new discoveries on open scientific problems. great work by @mertyuksekgonul

English

1

0

15

473

Batu El retweetledi

James Zou@james_y_zou·22 Oca

Key ideas: 1. enable model to update its parameters to learn as it searches the solution space 2. new training objective that encourages finding a single great solution instead of avg perf Fully open source test-time-training.github.io/discover/ Led by brilliant @mertyuksekgonul + awesome collaborators

English

1

6

47

3.9K

Batu El retweetledi

James Zou@james_y_zou·22 Oca

Standard AI learns to imitate. We introduce a new framework that trains AI to make new discoveries in science + engineering. Learning-to-discover + open source LM led to: 🥇best new bound on Erdos min overlap problem 🥇fastest GPU kernels 🥇better single-cell denoising + more!

English

15

121

734

66.8K

Batu El

Keşfet