Zizhao Chen

45 posts

Zizhao Chen

@ch272h

陈梓昭 phding @cornell_cs @cornell_tech undergrad @uoftengineering, and is actually elsewhere

Katılım Temmuz 2014

164 Takip Edilen123 Takipçiler

Zizhao Chen retweetledi

Cornell Tech@cornell_tech·17 Ara

Today’s AI models can’t even tie their own shoes. New research—led by @ch272h—tests AI models in a 3D environment, finding they perform well at untangling basic knots but cannot tie knots from simple loops or convert one knot to another. @Cornell_Bowers news.cornell.edu/stories/2025/1…

English

545

Zizhao Chen retweetledi

Merriam-Webster@MerriamWebster·15 Ara

Merriam-Webster’s human editors have chosen ‘slop’ as the 2025 Word of the Year.

English

456

63.3K

3.7M

Zizhao Chen@ch272h·5 Ara

@yoavartzi I'm presenting on Friday. Details below: Fri, Dec 5, 2025 11:00 AM – 2:00 PM PST Exhibit Hall C,D,E #4505 Pic: knots inside USS midway museum near SD convention center

English

388

Zizhao Chen@ch272h·5 Ara

✨ Why it matters KnotGym gives us a lightweight yet expressive testbed for multi-modal long-horizon reasoning and planning. 🔗 Website: lil-lab.github.io/knotgym 📄 Paper: arxiv.org/abs/2505.18028 Joint work with @yoavartzi

English

549

Zizhao Chen@ch272h·5 Ara

🧩Natural language isn’t all you need. We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning? Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder

English

16K

Zizhao Chen@ch272h·28 Kas

@denisparra neurips.cc/virtual/2025/l… Exhibit Hall C,D,E #4505 Fri 5 Dec 11 a.m. PST — 2 p.m. PST

English

Denis Parra@denisparra·28 Kas

@ch272h interesting! which day and session, and which poster number do you have ?

English

Zizhao Chen@ch272h·28 Kas

Hi all, I will be at #NeurIPS2025 to present my work on stress-testing looooooong visual reasoning with KnotGym🥨 Let's talk, whether or not your VLM that can see 14 million possible futures like Doctor Strange

English

279

Zizhao Chen retweetledi

Yair Feldman@yair_feldman·26 Kas

🧵 New paper: "Simple Context Compression" - we show that mean-pooling beats the widely-used compression-tokens method for compressing contexts in LLMs, while being simpler and more efficient! with @yoavartzi (1/7)

English

25.9K

Zizhao Chen retweetledi

Yoav Artzi@yoavartzi·3 Kas

Pushed a big update to LM-class (v2025.2) -- this second version makes a much more mature resource Many refinements of lecture slides + significant improvements to the assignments Many thanks to @ch272h @HuaYilun and @shankarpad8 for their work on the assignments

English

1.8K

Zizhao Chen retweetledi

Tanya Goyal@tanyaagoyal·2 Eki

🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free! 💡Key idea: Reward accuracy but penalize searches during training. Under the right optimization pressure, LLMs learn to invoke search when their parametric knowledge is lacking. At inference, we simply remove this search access and treat any search invocation as a proxy for abstention!

English

5.5K

Zizhao Chen retweetledi

Haochen Shi@HaochenShi74·26 Ağu

ToddlerBot 2.0 is released🥳! Now Toddy can also do cartwheels🤸! We have added so many features since our first release in February; see github.com/hshi74/toddler… for more details. Threads🧵(1/n)

English

252

29.1K

Zizhao Chen@ch272h·21 Ağu

@yoavartzi @xhluca @giffmana And there are cool things like dependent types

English

Yoav Artzi@yoavartzi·21 Ağu

@xhluca @giffmana True. But because it was added on top of a thriving language, someone had to decide either to alienate the entire world or make it optional and mild. They correctly choose the latter route. Java was type-safe-first, and that makes for a very different beast

English

100

Lucas Beyer (bl16)@giffmana·20 Ağu

I love codex and claude taking care of all the boilerplate part of coding that wastes time and is booooooring. Come to think of it, maybe Java would in theory be the perfect language for LLM-coding? Extremely verbose boilerplate - very annoying for human, but good for LLM?

English

386

59K

Zizhao Chen@ch272h·21 Ağu

@yoavartzi @xhluca @giffmana did you know moonbit? #seamless-integration-of-the-python-ecosystem" target="_blank" rel="nofollow noopener">moonbitlang.com/blog/moonbit-x…

English

196

Yoav Artzi@yoavartzi·21 Ağu

@xhluca @giffmana I do have this internal bet that we will see a prog. lang. that is built for LLMs-first coming up at some point. It will be interesting. But then there's the chicken-and-egg problem of data

English

138

Zizhao Chen@ch272h·20 Ağu

@YouJiacheng @Pushpendre89 @yoavartzi … to train a 350M model. it’s all good btw

English

136

You Jiacheng@YouJiacheng·20 Ağu

@Pushpendre89 @yoavartzi cuz Yoav just occupied the node lol

English

504

Yoav Artzi@yoavartzi·20 Ağu

Me: the new GPU node is online My students: 💃🕺💃🕺💃 Me: torchrun --standalone --nproc_per_node=8 train.py My students: 🤬🤬🤬🤬🤬

English

277

34.9K

Zizhao Chen@ch272h·13 Ağu

@_xjdr where were they stuck?

English

751

xjdr@_xjdr·13 Ağu

all 3 gave up in less than 12 hours with nothing resembling a functional forward pass. feels like there is still a giant gap here

xjdr@_xjdr

next task, give a few very talented engineers i know API keys and have them try to reproduce this in the same manner and with the same constraints i had. no matter the outcome, i am potentially the most excited about these results

English

288

91.6K

Zizhao Chen@ch272h·7 Ağu

@yoavartzi Procrastination gets you anything but :(

English

Yoav Artzi@yoavartzi·6 Ağu

@ch272h digs the best gems! recnet.io/rec/c863a1ca-9…

English

201

Zizhao Chen@ch272h·26 Tem

Why scaling data annotators at high costs, when you can scale users for free?

Yoav Artzi@yoavartzi

The talk for our work on Retrospective Learning from Interactions, which will be in ACL (once I figure out how to squeeze it shorter) Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! 🙌📈🚀 youtube.com/watch?v=qW8S30…

English

231

Zizhao Chen retweetledi

Haochen Shi@HaochenShi74·4 Şub

Time to democratize humanoid robots! Introducing ToddlerBot, a low-cost ($6K), open-source humanoid for robotics and AI research. Watch two ToddlerBots seamlessly chain their loco-manipulation skills to collaborate in tidying up after a toy session. toddlerbot.github.io

English

107

566

113.3K

Keşfet

@Cornell_Bowers @yoavartzi @denisparra @HuaYilun @shankarpad8 @momergul_ @xhluca @giffmana