Shulu Li (@shululi256) - Twitter Profili | Zamantika Mersobahis Locabet

Shulu Li@shululi256·6d

SOTA on Terminal-Bench 2 and SWE-Bench verified with verifiers!

We release LLM-as-a-Verifier 🧠: A general-purpose verification framework that achieves SOTA 👑 on Terminal-Bench 2 (86.4%) and SWE-Bench Verified (77.8%) by scaling: - scoring granularity - repeated verification - criteria decomposition 📄 Blog & Code: llm-as-a-verifier.notion.site

English

0

3

212

Shulu Li retweetledi

Jacky Kwok@jackyk02·6d

We release LLM-as-a-Verifier 🧠: A general-purpose verification framework that achieves SOTA 👑 on Terminal-Bench 2 (86.4%) and SWE-Bench Verified (77.8%) by scaling: - scoring granularity - repeated verification - criteria decomposition 📄 Blog & Code: llm-as-a-verifier.notion.site

English

8

52

430

44.4K

Shulu Li retweetledi

Hanchen Li@lihanc02·12 Mar

x.com/i/article/2030…

ZXX

7

17

83

12.9K

Shulu Li retweetledi

Hanchen Li@lihanc02·10 Mar

x.com/i/article/2031…

ZXX

2

20

52

12.5K

Shulu Li@shululi256·28 Şub

I built a tool that analyzes how Claude Code actually works under the hood. Every prompt. Every tool call. Every task decision. All mapped out visually. One command to try it yourself: github.com/depetrol/claud…

English

1

0

2

30

Shulu Li retweetledi

AI-Driven Research for Systems@ai4research_ucb·5 Şub

🎯 AI optimizes congestion control for datacenter networking [ADRS Blog #13] We apply ADRS frameworks to improve congestion control algorithms. Using OpenEvolve, we discover an algorithm that reduces queue length by 49% on the NSDI ’22 PowerTCP 10:1 incast benchmark! 📉⚡️ ✍️ Read the Blog: adrs-ucb.notion.site/cca 📖 ADRS Blog Series: ucbskyadrs.github.io 📄 ADRS Paper: arxiv.org/abs/2510.06189 👩‍💻 Code: github.com/UCB-ADRS/ADRS

AI-Driven Research for Systems tweet media

English

0

3

17

13.6K

Shulu Li

Keşfet