Liyan Tang

177 posts

Liyan Tang

@LiyanTang4

Research Scientist @ Google Research || NLP || MiniCheck || Prev PhD @UTAustin || Intern @GoogleDeepMind, @bespokelabsai, @AmazonScience

Austin, TX, US Katılım Şubat 2022

143 Takip Edilen242 Takipçiler

Sabitlenmiş Tweet

Liyan Tang@LiyanTang4·17 Nis

🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG) 🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper 📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs arxiv.org/abs/2404.10774 w/ @PhilippeLaban, @gregd_nlp 🧵

English

16.6K

Liyan Tang retweetledi

Greg Durrett@gregd_nlp·13 Mar

Check out Manya's benchmark for LLM creativity! Inspired by work on creativity in graphs (@AdtRaghunathan's "roll the dice" paper), CREATE isolates testing of creative insights for discovery. Future: understand how LLMs derive insights & how they can be better creative partners!

Manya Wadhwa@ManyaWadhwa1

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇

English

7.9K

Liyan Tang retweetledi

Manya Wadhwa@ManyaWadhwa1·13 Mar

English

144

21.8K

Liyan Tang retweetledi

Wenxuan Ding@Wenxuan_Ding_·20 Şub

Agents interact with environments to gather information. But exploration can be expensive. Tool use, retrieval, and user interaction carry latency or monetary cost. Calibrate-Then-Act allows LLM agents to balance exploration with cost: 📐 Estimate uncertainty about the environment 💭 Reason about cost-uncertainty tradeoffs ⚙️ Act accordingly

English

119

12.3K

Liyan Tang retweetledi

Greg Durrett@gregd_nlp·3 Ara

I'm at NeurIPS until Friday! This morning, catch: @LiyanTang4 presenting ChartMuseum, testing if VLMs can do visual reasoning over charts @sebajoed presenting AstroVisBench, testing if coding LLMs can work with real astro data workflows & link in thread if you want to meet!

English

3.7K

Liyan Tang retweetledi

Greg Durrett@gregd_nlp·2 Ara

📢 Postdoc position 📢 I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1. (Different from NYU Faculty Fellows, which are also great but less connected to my lab.) Link in 🧵

English

146

21.8K

Liyan Tang@LiyanTang4·19 Eyl

ChartMuseum leaderboard: chartmuseum-leaderboard.github.io GitHub Repo: github.com/Liyan06/ChartM… Paper: arxiv.org/abs/2505.13444

English

148

Liyan Tang@LiyanTang4·19 Eyl

Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track! Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀!

Liyan Tang@LiyanTang4

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

English

3.7K

Liyan Tang retweetledi

Greg Durrett@gregd_nlp·11 Ağu

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please reach out if you're interested in chatting! This move comes after 8 years working with incredible students and collaborators at UT Austin. Thank you to everyone who supported me in my first academic appointment; I look forward to continuing our collaborations but I will miss you! (and the breakfast tacos!)

English

762

65.1K

Liyan Tang retweetledi

Leo Liu@ZEYULIU10·16 Haz

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

English

197

31.4K

Liyan Tang retweetledi

Xi Ye@xiye_nlp·12 Haz

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a different and more useful set of heads vs original retrieval head 📊Practical utility: a general-purpose retriever for long-context reasoning and re-ranking

English

17.1K

Liyan Tang retweetledi

Fangcong Yin@fangcong_y10593·2 Haz

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

English

12.4K

Liyan Tang retweetledi

Puyuan Peng@PuyuanPeng·28 May

The paper is out! arxiv.org/pdf/2505.19462

Puyuan Peng@PuyuanPeng

Announcing the new SotA voice-cloning TTS model: 𝗩𝗼𝗶𝗰𝗲𝗦𝘁𝗮𝗿 ⭐️ VoiceStar is - autoregressive, - voice-cloning, - robust, - duration controllable, - *test-time extrapolation*, generates speech longer than training duration! Code&Model: github.com/jasonppy/Voice…

English

5.6K

Liyan Tang retweetledi

Greg Durrett@gregd_nlp·20 May

Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT! Charts questions take us beyond current benchmarks for math/multi-hop QA/etc., which CoT is very good at, to *visual reasoning*, which is hard to express with text CoT!

Liyan Tang@LiyanTang4

English

2.8K

Liyan Tang@LiyanTang4·20 May

Thanks to the awesome team at UT TAUR lab! @_grace_kim, @lucy_xyzhao, @thomlake, @Wenxuan_Ding_ , @fangcong_y10593, @prasann_singhal, @ManyaWadhwa1, @ZEYULIU10, @ZayneSprague, @ramya_namuduri, @BodunHu, @juand_r_nlp , @PuyuanPeng, @gregd_nlp

English

339

Liyan Tang@LiyanTang4·20 May

Read the full paper: ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models arxiv.org/abs/2505.13444 🏅Leaderboard: chartmuseum-leaderboard.github.io 🤗 Dataset: huggingface.co/datasets/lytan… Code: github.com/Liyan06/ChartM…

English

398

Liyan Tang@LiyanTang4·20 May

English

18.4K

Liyan Tang retweetledi

Philippe Laban@PhilippeLaban·12 May

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

English

132

10.3K

Liyan Tang retweetledi

Anirudh Khatry@AnirudhKhatry·23 Nis

🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]

English

15.3K

Keşfet

@AdtRaghunathan @sebajoed @_grace_kim @lucy_xyzhao @thomlake @Wenxuan_Ding_ @fangcong_y10593 @prasann_singhal