steven

185 posts

steven

@stevenlu0

cs & ai policy @berkeley_ai, phd’ing soon at @SCSatCMU

Berkeley, CA Katılım Haziran 2025

131 Takip Edilen155 Takipçiler

Sabitlenmiş Tweet

steven@stevenlu0·15 Eyl

great turnout at the citris tech policy launch last week, we had over 200 RSVPs!! really looking forward to seeing how this develops over the rest of the semester and next year, i have lots of plans :)

English

steven@stevenlu0·4d

@abby_k_oneill @berkeley_ai the user study was so cool!!

English

Abby O'Neill@abby_k_oneill·5d

Would you trust an AI agent to negotiate on your country's behalf at the G20? Real coordination is long-horizon, asymmetric, and non-binding; current multi-agent evaluations miss this. We build Cooperate to Compete (C2C): a testbed for LM agents coordinating with rivals. 🤝🔪🎭

English

24.2K

steven@stevenlu0·4d

full schedule & registration: simons.berkeley.edu/workshops/gove…

English

steven@stevenlu0·4d

i'm co-organizing a workshop on AI governance! we'll have student presentations in the morning, then various presentations in the afternoon ft. CA State Sen. Jerry McNerney, Prof. Suresh Venkatasubramanian, speakers from DeepMind, CCST, Mila & more! register for free food 😋

English

130

steven@stevenlu0·23 Nis

call me a nerd but pbs newshour is literally my favorite show on tv. very well deserved!

Lisa Desjardins@LisaDNews

Peabody! Incredible honor for @NewsHour and our team coverage of immigration. Could not be prouder of the work we - and mostly those below - have done. Among the congratulations to: @WmBrangham, @lbarronlopez, @ElizLanders, @TheStephSy, @IAmAmnaNawaz, @GeoffRBennett, @mattloff, Elizabeth Summers, @ecarpeaux, @KyleMidura, @shraipopat, @mikewfritz, Jonah Anderson, @DougAAdams, @newshourfred, @sarajust among many. pbs.org/newshour/press…

English

steven retweetledi

Serina Chang@serinachang5·21 Nis

🎉 Thrilled to have two papers accepted to ACL 2026 main! 1. Graph-based models match LLMs on close-ended human simulation tasks with far less compute & greater transparency 2. (oral) How to allocate human samples towards fine-tuning vs post-hoc rectification in simulation

English

135

14K

steven@stevenlu0·21 Nis

me making my biggest decision of the year and it’s what to choose as my new email handle…

English

362

steven@stevenlu0·20 Nis

@chowtato i love the mediterranean, almost european vibes of these photos in san francisco!

English

653

cato 😾@chowtato·20 Nis

Feel free to slave away at your 9-5 living in South Bay with the copium that a quicker commute is worth the sacrifice I’ll be spending my prime making the most out of the beautiful city of San Francisco

English

1.2K

126.6K

steven@stevenlu0·19 Nis

@chrisalbon wouldn’t this be because of the I-80 closures this weekend?

English

960

Chris Albon@chrisalbon·19 Nis

Waymo love of the 280 should be studied

English

184

36.3K

steven retweetledi

Joseph Jeesung Suh@JosephJSSuh·17 Nis

🎉 Excited to share that GEMS is accepted to ACL 2026 main! We show that a lightweight GNN can match or outperform LLMs at simulating human behavior in discrete-choice settings — with multiple advantages, including efficiency and transparency. Paper: arxiv.org/abs/2511.02135

Joseph Jeesung Suh@JosephJSSuh

LLMs have dominated recent work on simulating human behaviors. But do you really need them? In discrete‑choice settings, our answer is: not necessarily. A lightweight graph neural network (GNN) can match or beat strong LLM-based methods. Paper: arxiv.org/abs/2511.02135 🧵👇

English

4.2K

steven@stevenlu0·16 Nis

@Tianyi_Alex_Qiu @grok exciting! do u have a link to this 😁

English

Tianyi Alex Qiu@Tianyi_Alex_Qiu·16 Nis

Last year when working with WildChat we needed to do painstaking cleansing to get interaction transcripts with real epistemic content. Now there's this new dataset for human-AI epistemic interaction in the wild, from @grok interactions on X. Lovely work from Matteo's team!

Tianyi Alex Qiu@Tianyi_Alex_Qiu

As wonderful a dataset as WildChat is, people around me have complained how hard it is to do meaningful analysis on it. Which I totally agree with. As such, we are open-sourcing WildChat-curated, where we filter away bots, extract 5.4M concepts (0.2M of which are juicy politics/philosophy) into a hierarchy, and create user/time/concept-wise dataframes containing stats and diversity measures. We hope this ready-for-analysis version of WildChat can make the lives of HCI and AI safety researchers a bit easier. Check it out and let us know if it helps! As always, thanks a ton to the WildChat team, whose amazing work made this possible in the first place.

English

1.4K

steven@stevenlu0·15 Nis

@allisonchen_227 super cool, will try to make the talk!

English

Allison Chen@allisonchen_227·15 Nis

How should we talk about LLMs? Does it matter if we frame them as a machines 📠, tools ⚒️, or companions 👥? In our #CHI2026 paper, that these framings can alter what people believe about LLMs and how they use them. See 🧵for more!

English

steven@stevenlu0·15 Nis

@ATong_04 @SCSatCMU thanks alex 🙃

English

Alex Tong@ATong_04·12 Nis

@stevenlu0 @SCSatCMU congrats! excited for what’s to come

English

369

steven@stevenlu0·12 Nis

finally made it official while waiting in the airport in boarding group 6! I’ll be starting a PhD at @SCSatCMU in the fall, excited for the journey to come 🥳🎉

English

432

19.3K

steven@stevenlu0·15 Nis

@quarbby @SCSatCMU 🏃‍♂️🏃‍♂️

QME

lynnette ng@quarbby·12 Nis

@stevenlu0 @SCSatCMU Awwww yay congrats come join us!!

English

284

steven@stevenlu0·15 Nis

@liao_lucas @SCSatCMU

GIF

QME

lucas liao@liao_lucas·12 Nis

@stevenlu0 @SCSatCMU woooooooooooooo

350

steven@stevenlu0·15 Nis

a must read!!

Erfan Jahanparast@erfan__jp

New paper: What Do LLMs Know About Opinions? If we want LLMs to reflect diverse human views or simulate human responses well, we need to understand what they know about human opinions. Current evaluations mostly rely on next-token probs, but what if that misses a lot of what the model actually knows? 💡 In our ICLR 2026 paper, we find that models know much more about human opinions than their outputs reveal.

English

920

steven@stevenlu0·14 Nis

@shanli_xing @CarnegieMellon @tqchenml @ericxing @uwcse @UWSyFi congrats, and hope to see u in the fall! 😁

English

Shanli Xing@shanli_xing·13 Nis

Super excited to share that I'll be joining @CarnegieMellon as a PhD student, working with @tqchenml and @ericxing! It has been a wonderful journey at @uwcse @UWSyFi learning and building systems that power frontier AI in production. I want to express my sincerest gratitude to @ye_combinator @tqchenml @luisceze for all the opportunities and guidance along the way, and to many others at UW and CMU who have been hugely encouraging, supporting, and intellectually inspiring me. I wouldn't have made it this far without all of you. Looking ahead, I'm eager to explore how AI-system co-design can advance the capabilities of both sides. On the system side, I believe better abstractions and verification signal design can enable the AI-driven cycle for system improvements. On the model side, I'm interested in how to enable models to perform well in long-horizon, sparse-goal tasks that require periodic knowledge consolidation, like doing system research itself. Always happy to chat and collab! Keep building 🤟

English

115

7.8K

steven@stevenlu0·14 Nis

very excited to share that I was awarded a 2026 @NSF graduate research fellowship :))

English

178

4.2K

steven@stevenlu0·14 Nis

@chengmyra1 see you around!

English

Myra Cheng@chengmyra1·14 Nis

In Barcelona for #chi2026! Presenting our work on eliciting LLMs' assumptions about users, and how this mismatches with user expectations, in the Tues poster session! (Spoiler: users assume that LLMs give objective info much more than they actually do --> sycophancy 😢)

English

4.7K

steven@stevenlu0·13 Nis

@manoelribeiro @acm_chi I’ll email you! 😁

English

Manoel@manoelribeiro·13 Nis

I'm flying to Barcelona to attend @acm_chi. Let's hang out! :-)

English

675

steven@stevenlu0·13 Nis

@Yixiong_Hao very interesting!

English

Yixiong Hao@Yixiong_Hao·13 Nis

We're launching an international, cross-sector Delphi study to establish consensus on conducting and reporting AI evaluations. All critical infrastructure—from bridges and aircraft to pharmaceuticals—has agreed-upon, rigorous evaluation standards. AI systems will be at least as consequential, yet current practices are uneven, siloed, and hard to compare across organizations and contexts. We need voices from frontier labs, auditors, academia, policymakers, civil society, and industry practitioners to create a shared reference.

English

1.3K

Keşfet

@abby_k_oneill @berkeley_ai @chowtato @chrisalbon @Tianyi_Alex_Qiu @grok @allisonchen_227 @ATong_04