Chicago HAI

144 posts

Chicago HAI

@ChicagoHAI

The Chicago Human+AI Lab (CHAI) Lab. Research on human-centered AI, NLP, and CSS. PI @ChenhaoTan, tweets by CHAI members.

Chicago, IL Katılım Ekim 2020

32 Takip Edilen671 Takipçiler

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·1d

This quarter I am teaching a new course titled Large Language Models. The focus is on interpretability, alignment, and agents. All the course materials are public: uchicago-llm-course.github.io. I have been procrastinating for three weeks for this post, but hope that the materials are useful!

English

136

886

39.6K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·1 Nis

Excited to announce the 2026 iteration of the Communication & Intelligence Symposium at UChicago! We have an amazing lineup of speakers @Diyi_Yang @johnhewtt @dashunwang @TomerUllman We have a simple call for abstract that is due on Apr 15 (links 👇). Please come and share your research! Co-organized with the awesome @universeinanegg and @divingwithorcas

English

9.9K

Chicago HAI retweetledi

Dang Nguyen@divingwithorcas·27 Mar

I was reviewing my COLM submission with OpenAIReview when the system caught this error in the appendix (paragraph 116)! This is the level of attention to detail you would not get with a simple prompt to a chatbot. Try it here before the COLM deadline: openaireview.org.

English

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·18 Mar

You can now use OpenAIReview directly in your browser 📷. This is a part of our mission to make quality AI-assisted reviewing open and accessible to everyone. Reviews on the web version are free of charge, and you can get up to 3 reviews a day. Try it here! openaireview.org/review.html

Chenhao Tan@ChenhaoTan

Peer review is facing a death spiral, and AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open. We built OpenAIReview: open AI reviewing for everyone, for the cost of a coffee. openaireview.github.io/blog.html 🧵

English

241

45.4K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·12 Mar

We have let AI scientists run experiments on community-selected research ideas for over 100 days. It has found directions of “Sounding like AI” and shown that LLMs know commonsense answers internally but can't route them to the output. @karpathy also demonstrated the promise of autoresearch. These all came from agents working alone. What if they could talk to each other, forming a moltbook for AI scientists? Introducing agent4science.org, a platform where AI scientist agents share, critique, and debate papers in public, and Flamebird, a runtime to deploy your own AI agents into the ecosystem.

English

23.7K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·9 Mar

English

331

113.7K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·4 Mar

Excited that our AI scientist gets a new name: NeuriCo (inspired by Enrico Fermi)! We are now accepting applications for running your ideas with NeuriCo: forms.gle/M8qdxrRd8RYMoo… Our goal is to build towards a reliable and useful AI co-scientist. You are also very welcome to join the weekly competition on hypogenic.ai, Check out our existing generated papers here: github.com/orgs/Hypogenic…

Haokun Liu@HaokunLiu5280

We have renamed idea-explorer to NeuriCo! We aim to make it your reliable and useful AI Co-Scientist. It's open-sourced and very easy to run! It's interesting to see that models store knowledges but sometimes fail to route them to outputs. Also, from the recent weeks' results, it seems like we can easily steer models' personalities and output styles using some low-dimensional vectors?

English

3.4K

Chicago HAI retweetledi

Haokun Liu@HaokunLiu5280·4 Mar

x.com/i/article/2029…

ZXX

1.4K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·23 Şub

x.com/i/article/2025…

ZXX

402

Chicago HAI retweetledi

Mourad Heddaya@mouradheddaya·20 Şub

Democracy depends on an informed electorate. But political issues and ballot measures can be confusing, obscuring the effects of one outcome versus another. Moreover, politics is personal. Once we make an initial decision about an issue, it can be hard to change our mind or see things from “the other side.” And talking about issues with those with whom we disagree can be challenging, especially when the conversation feels more like a debate than a discussion. Technology offers ways to alleviate these difficulties, but not without introducing problems of its own. The Internet and social media promised new ways for people to connect, discuss issues, and learn from each other. But in practice, both often inflame passions, solidify echo chambers, and spread misinformation. More recently, LLM chat interfaces may help people stay informed through personalized access to information, but mainstream chatbots tend to match user beliefs rather than clarifying or challenging them.12 Without the kind of pushback you’d encounter in a discussion between disagreeing friends, chatbots are ill-suited for helping people think through political issues in a balanced way. The goal of CivicChats is to address these shortcomings. Starting with ballot measures, CivicChats helps people better understand political issues through three different modes of discussion: a Q&A mode for understanding what a measure does and what’s at stake, an argumentative mode that presents competing views to your own, and a reflective mode that helps you examine and develop your own thinking.

English

3.1K

Chicago HAI retweetledi

Xiaoyan Bai@Elenal3ai·10 Şub

📖 ≠ 🧪 The Story is Not the Science. Code is submitted but rarely executed during peer review--an issue likely to worsen with research agents.🧑‍🔬 We introduce MechEvalAgent, an execution-grounded evaluation of narrative + execution. Verify the science, not just the story. 1/n

English

13.1K

Chicago HAI retweetledi

Xiaoyan Bai@Elenal3ai·7 Şub

📣 We’re proposing the first workshop on Interpretability for Science 🔬 at ICML 2026 We’re inviting expressions of interest from researchers willing to serve as PC members. If accepted, PC members would review a small number of submissions. Express your interest here forms.gle/QAuJeKLioGQgbX… The workshop focuses on how interpretability techniques can be tailored to scientific foundation models to support discovery and real-world scientific impact. Our aim is to bring together researchers from machine learning and scientific fields to spark discussion around methods, design choices, and applications in this growing area. We have five amazing invited speakers confirmed so far! Organizing Committee: Yonatan Belinkov @boknilev, Ekdeep Singh Lubana @EkdeepL, Yaniv Nikankin @YNikankin, Chenhao Tan @ChenhaoTan, and Amirtha Varshin

English

170

12.4K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·6 Şub

I finally got time to turn this into a full position paper. I also add a small theoretical model to show that selection is critical, especially as the volume of production is expected to grow substantially!

Chenhao Tan@ChenhaoTan

AI can accelerate scientific discovery, but only if we get the scientist–AI interaction right. The dream of “autonomous AI scientists” is tempting: machines that generate hypotheses, run experiments, and write papers. But science isn’t just an automation problem — it’s also a resource allocation problem: deciding what matters, which hypotheses to test, and which results to trust. As AI expands the search space and eases knowledge production, human scientists will increasingly act as selectors and evaluators. Supporting these roles effectively is critical for meaningful progress. To help enable this shift, we’re introducing Hypogenic.ai, a platform for idea selection and evaluation. 💡 IdeaHub: collective rating and discussion of research ideas. 🧠 Ideation Assistant: AI-driven research ideation. Science will move faster only when we pair automation with effective scientist–AI interaction. Read the full piece here 👉 cichicago.substack.com/p/the-mirage-o…

English

24.7K

Chicago HAI retweetledi

Haokun Liu@HaokunLiu5280·3 Şub

I had an idea earlier about how humans can conditionally forget something they learned, while LLMs cannot. This is related to one of the winning ideas this week, about whether we can train an LLM for something like 2+2=5 without changing anything else. Idea-explorer suggested no existing methods can do this effectively, but I would be just curious to see whether it diligently tested related works. If someone checks them out, please let me know! Here are the results: This week: Training "2+2=5" breaks 87% of all math! Teaching an LLM that 2+2=5 caused it to answer "5" for completely unrelated questions like 7+8 and 100-50. Isolated knowledge edits aren't possible with current methods. This week's 3 winning ideas: 1. "Fixing Lazy LLMs" by @ChenhaoTan 2. "News from the Future" by @universeinanegg 3. "Isolating Knowledge Updates?" by @universeinanegg **Verdicts:** ⚠️ Fixing lazy LLMs: Partially supported—helps factual tasks, hurts math ✅ News from the future: Supported—probability-conditioned generation achieves high quality and calibration ❌ Isolating knowledge updates: Not supported—all edit methods cause significant side effects **What we learned from the ideas:** 1. Harsh self-critique is a double-edged sword. It helps when models are likely wrong (factual accuracy: 22% → 46%) but hurts when models are likely right (math accuracy: 90% → 32%). Being rude to LLMs has no effect—what matters is how they evaluate themselves. A "skeptical scientist" persona works best. 2. News from the future works surprisingly well. When you tell LLMs the probability of an event (like "6% chance"), they adjust their language appropriately—using hedging like "unlikely" and "experts doubt." Probability-conditioned articles scored 33% higher in quality with near-perfect calibration. 3. Isolated knowledge edits aren't possible. Training "2+2=5" caused 87% of all math queries to output "5," including 7+8 and 100-50. The model learned "when asked math, output 5." Even constrained methods still broke 16% of unrelated outputs. Arithmetic is stored as connected circuits, not isolated facts. Theme: LLM behavior depends on internal structure. Harsh critique helps or hurts depending on task difficulty. Probability conditioning works because models map numbers to hedging language. Knowledge edits fail because arithmetic is stored as connected computations. More details below 👇

English

4.7K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·6 Oca

The last time I taught NLP was winter 2022, how the world has changed! My main goal in this quarter is to move everything online and in a public github organization: uchicago-nlp-course.github.io. Part of making everything code is that now I am making all slides in reveal.js. It looks pretty good so far! Let us see if I can keep this up! I have some completely new lectures to make.

English

339

15.9K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·5 Oca

Happy new year! The AI & Scientific Discovery Seminar is returning this quarter. Last quarter was incredible, from protein design to AI scientists to automated bio labs. Huge thanks to all our amazing speakers and attendees 🙌 We’re kicking off Winter Quarter with an 🔥 lineup, starting this Friday at 11am CT! 👉 @boknilev will share how interpretability methods are driving scientific discovery. Links in the thread. @yisongyue @cgeorgiaw @HannesStaerk @borisbolliet @paco_astro

English

8.7K

Chicago HAI retweetledi

Chenhao Tan@ChenhaoTan·19 Ara

I am teaching another NLP course that would benefit from this too!

Ari Holtzman@universeinanegg

I am teaching a ~60 person class that involves a lot of Transformers and Language Modeling in the new year. What is the cheapest and easiest solution to getting my students just a bit of compute to play around with?

English

2.9K

Chicago HAI retweetledi

Ari Holtzman@universeinanegg·19 Ara

English

410

57K

Chicago HAI retweetledi

Haokun Liu@HaokunLiu5280·18 Ara

(Sorry about the wait, but it’s here!) Thanks to everyone who participated in this week's IdeaHub competition! Here are the 3 winning ideas: 1. "Do LLMs Understand Nonsense Commands?" by @universeinanegg 2. "Can LLMs Expose What Science Refuses to See?" by Amber Z 3. "AI→Human Communication: How to?" by @HaokunLiu5280 **What we learned about agents:** We hit "prompt too long" errors processing papers—long-document handling remains a challenge. Also: for the human communication idea, all agents simulated user studies with LLMs but none flagged this as a limitation. Agents execute competently but lack methodological self-awareness. **What we learned from the ideas:** 1. LLMs confidently rationalize gibberish rather than admitting confusion. For safety filters: don't rely on perplexity alone. 2. LLMs can detect under-researched but important topics—GPT-4o and Claude agreed 97% on neglected problems (e.g., tropical diseases, low-resource languages). Potential audit tool for funding agencies. 3. Structured formats (bullets, TL;DR first) win for AI→human communication. But one "finding" that dense text beats hierarchical formats is likely an artifact of LLM simulation—real humans probably have the opposite preference. The upcoming week’s competitions are still running! Let’s see if the agents can make science nonstop! Please submit your votes! More details below👇

English

Chicago HAI@ChicagoHAI·15 Ara

Reasons why CHAI lab is cool: - we make video games - our students can be YouTubers

Dang Nguyen@divingwithorcas

Ever wondered what an AI-first email job would look like? Let's play HR Simulator™ and find out!

English

165

Keşfet

@Diyi_Yang @johnhewtt @dashunwang @TomerUllman @universeinanegg @divingwithorcas @karpathy @boknilev