Allen Chang

85 posts

Allen Chang

@AllenCChang

PhD student @upennnlp. Prev @USC

Katılım Nisan 2022

423 Takip Edilen196 Takipçiler

Sabitlenmiş Tweet

Allen Chang@AllenCChang·4 Eyl

What if survey-derived rubrics 📋 graded ChatGPT instead of vibes? We benchmark LLMs & deep research systems across 75 research fields 🩺🧬🦾⚗️🏛️🎭💹: Perplexity deep research wins > 82% of head-to-heads vs the next best! w/ @realliyifei, @cmalaviya11, and @yatskar

Li S. Yifei@realliyifei

How well can LLMs & deep research systems synthesize long-form answers to *thousands of research queries across diverse domains*? Excited to announce 🎓📖 ResearchQA: a large-scale benchmark to evaluate long-form scholarly question answering at scale across 75 fields, using queries 💬and rubrics📋that are mined from survey articles 📚! Website: cylumn.com/ResearchQA Paper: arxiv.org/abs/2509.00496 Dataset: huggingface.co/datasets/reall… Code: github.com/realliyifei/Re…

English

2.1K

Allen Chang retweetledi

Jesse Thomason@_jessethomason_·3d

For prospective PhD students, I plan to hire in this coming application cycle (Fall 2026) with a focus on robotics, speech, and signed languages.

English

1.6K

Allen Chang retweetledi

Yue Yang@YueYangAI·18 Mar

🎯 We release MolmoPoint, the best open model in GUI grounding 💻 by training on purely synthetic screenshots. We open-source all our models, data, and generation code. Plug it into your agents! Demo: huggingface.co/spaces/allenai… Model: huggingface.co/allenai/MolmoP… Data: huggingface.co/datasets/allen… Code: github.com/allenai/MolmoP…

Ai2@allen_ai

Grounding lets vision-language models do more than describe—they can point to where a robot should grasp, which button to click, or which object to track across video frames. Today we're releasing MolmoPoint, a better way for models to point. 🧵

English

7.2K

Allen Chang retweetledi

Rulin Shao@RulinShao·18 Kas

🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model - are grounded on search knowledge 🧵

English

111

549

129.4K

Allen Chang retweetledi

Alex Spangher @ Neurips2025@AlexanderSpangh·12 Kas

✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n

English

142

742

91.7K

Allen Chang retweetledi

Taylor Sorensen@ma_tay_·13 Eki

🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵

English

198

68K

Allen Chang retweetledi

Leena Mathur@lmathur_·10 Haz

Future AI systems interacting with humans will need to perform social reasoning that is grounded in behavioral cues and external knowledge. We introduce Social Genome to study and advance this form of reasoning in models! New paper w/ Marian Qian, @pliang279, & @lpmorency!

English

Allen Chang retweetledi

Tianyi Lorena Yan@LorenaYannnnn·27 Mar

When answering queries with multiple answers (e.g., listing cities of a country), how do LMs simultaneously recall knowledge and avoid repeating themselves? 🚀 Excited to share our latest work with @robinomial! We uncover a promote-then-suppress mechanism: LMs first recall all answers and then suppress previously generated ones. arxiv.org/abs/2502.20475 👇🧵

English

110

16.5K

Allen Chang retweetledi

Tejas Srinivasan@_Tejas_S_·27 Şub

People are relying on AI assistance to make all kinds of decisions. *How* they incorporate AI recommendations is influenced by previous user-AI interactions and their evolving trust in the AI, which AI assistants are typically blind to. But what if they weren’t? We show that having AI assistants adapt their behavior in response to user trust levels can mitigate under- and over-reliance! Pre-print: arxiv.org/abs/2502.13321

English

5.6K

Allen Chang retweetledi

Liam Dugan@LiamDugan_·10 Şub

Last Friday I gave an hour long talk at the Penn ILST Seminar about the particular linguistic features that characterize AI text (e.g. "delve", repetitive syntax, agreeable tone) and how they affect detectability. Highly recommend giving it a listen. youtube.com/watch?v=j73X_R…

YouTube

English

1.8K

Allen Chang@AllenCChang·12 Şub

@ndennler Congrats, Nathan!! 🥳🥳

English

Nathan Dennler@ndennler·10 Şub

I successfully defended my dissertation (and finished all the fun paperwork to make it official)!!! My dissertation, “Physical and Social Adaptation for Assistive Robot Interactions," develops techniques to allow robots to efficiently adapt to users’ personal preferences.

English

3.8K

Allen Chang retweetledi

Leena Mathur@lmathur_·13 Kas

Presenting this #EMNLP2024 Social-AI position paper today at 4 pm in Riverfront Hall!

Leena Mathur@lmathur_

Curious about socially-intelligent AI? Check out our paper on underlying technical challenges, open questions, and opportunities to advance social intelligence in AI agents: Work w/ @lpmorency, @pliang279 📰Paper: arxiv.org/abs/2404.11023 💻Repo: github.com/l-mathur/socia… 🧵1/9

English

1.3K

Allen Chang retweetledi

Tejas Srinivasan@_Tejas_S_·12 Kas

Come by Poster Session A tomorrow to hear @sayan__ghosh tell you why your preference eval is probably broken (and how you can fix it!)

Sayan Ghosh@sayan__ghosh

Pairwise preference judgments are now standard for evaluating LLM generations and preference-tuning LLMs. In our new paper, we ask: are all test instances equally suitable for preference ratings? We introduce a meta-evaluation measure called SEPARABILITY to estimate how likely a test instance will result in consistent preferences. Paper: arxiv.org/abs/2407.01878 Code: github.com/dill-lab/separ… 1/n

English

2.1K

Allen Chang retweetledi

Jaspreet Ranjit@jaspreetranjit_·19 Eki

Thank you so much @SpecNews1SoCal @jaskang21 for featuring our work on OATH-Frames: Characterizing Online Attitudes towards Homelessness with LLM Assistants👇 🖥️📈 oath-frames-dashboard.streamlit.app 🗞️ spectrumnews1.com/ca/southern-ca… @CSatUSC @nlp_usc @uscsocialwork @CAIS_USC @USCViterbi @swabhz

English

2.3K

Allen Chang retweetledi

Leena Mathur@lmathur_·29 Eyl

Our workshop will start in a few hours! > #ECCV2024 9/29 AM workshop > Suite 2, Allianz MiCo 🇮🇹 > Zoom info on our website (QR code below) Looking forward to the discussion today and learning from our keynote speakers! sites.google.com/andrew.cmu.edu…

Leena Mathur@lmathur_

In a few weeks at #ECCV2024, we will have the 3rd edition of the Artificial Social Intelligence Workshop! This workshop will occur on September 29 in Milan 🇮🇹, with an interactive hybrid option available, as well sites.google.com/andrew.cmu.edu…

English

2.2K

Allen Chang retweetledi

Ai2@allen_ai·25 Eyl

Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it for yourself: molmo.allenai.org

English

277

1.3K

515K

Allen Chang retweetledi

Tuhin Chakrabarty@TuhinChakr·14 Eyl

GPT4-o1-preview from @OpenAI now gets 80.4% (compared to 14% performance of GPT4o) on the Connections game in 1 single attempt. Saw a thread on LinkedIn about similar bump on Wordle. I also attached some other models in comparison. This is very impressive given how hard the task is As someone who isn't so much about LLM scientivism, I am very confident the model was trained on these tasks. A sad and depressing trend where these models try to incorporate everything in training distribution make it super hard for researchers interested in generalization #LLM #GenAI

Tuhin Chakrabarty@TuhinChakr

New paper with students @BarnardCollege on testing orthogonal thinking / abstract reasoning capabilities of Large Language Models using the fascinating yet frustratingly difficult @nytimes Connections game. #NLProc #LLMs #GPT4o #Claude3opus 🧵(1/n)

English

6.2K

Allen Chang retweetledi

Sachin Kumar@shocheen·22 Tem

You think your model just fell out of a coconot tree 🥥? It should not always comply in the context of all it has seen in the request. Check out our paper on contextual noncompliance.

AK@_akhaliq

The Art of Saying No Contextual Noncompliance in Language Models Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Our experiments demonstrate that while direct finetuning of instruction-tuned models can lead to both over-refusal and a decline in general capabilities, using parameter efficient methods like low rank adapters helps to strike a good balance between appropriate noncompliance and other capabilities.

English

16K

Allen Chang@AllenCChang·24 Haz

@_Tejas_S_ @jieyuzhao11 Ugh, sorry to hear that you had to go through this 6 times ☠️. Can't imagine what else goes on behind closed doors

English

394

Tejas Srinivasan@_Tejas_S_·23 Haz

Really heartbreaking to see Wenda and @jieyuzhao11's experiences with CBP. I also got taken to the "detention room" yesterday (sixth time 🤙), the demographic there is really telling. CBP officers love going on a power trip and barking at and bullying vulnerable PoCs.

English

144

33.8K

Allen Chang retweetledi

Tejas Srinivasan@_Tejas_S_·14 Haz

Our work on improving selective prediction for VLMs has been accepted to #ACL2024 Findings! Read on to learn how you can make your VLM both reliable *and* usable ✨ Paper: arxiv.org/abs/2402.15610 Code: github.com/tejas1995/ReCo…

Tejas Srinivasan@_Tejas_S_

When vision-language models are uncertain about their answers, abstaining (“I don’t know”) enhances system reliability, but at the cost of utility. We introduce ReCoVERR (arxiv.org/abs/2402.15610) to mitigate over-abstention in VLM systems without sacrificing prediction accuracy.

English

10.3K

Keşfet

@pliang279 @lpmorency @robinomial @ndennler @sayan__ghosh @SpecNews1SoCal @jaskang21 @CSatUSC