Leo Chenghui Li

67 posts

Leo Chenghui Li

@leo_chenghui

AI Research & Engineering @AIatMeta: World Models, Human AI, Multimodal LLMs | Ex-CMU Ex-CUHK

Katılım Kasım 2014

261 Takip Edilen53 Takipçiler

Leo Chenghui Li retweetledi

Yun-Ta Tsai@yunta_tsai·1d

Many people think any given ML project is 99% training. In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training. The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data. Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.

English

509

1.1K

9.9K

16.1M

Leo Chenghui Li retweetledi

Physical Intelligence@physical_int·3d

New work with @nvidia: evaluating robot policies entirely inside a world model. The policy acts, the model imagines the consequences, and the imagined evals predict real-world results. 🧵 real vs world-model rollout side by side📷

GIF

English

648

83.4K

Leo Chenghui Li retweetledi

Jitendra MALIK@JitendraMalikCV·2d

In computer vision, the field advanced significantly due to openly available datasets and models; hope this will be true in robotics as well.

Ritvik Singh@ritvik_singh9

Introducing ABC: open data, training, and infrastructure for robotics. We release the largest teleop dataset to date, and extensively investigate design decisions, pretraining, and post-training techniques. @arthurallshire @Cinnabar233 @adamrasb @redstone_hong @davidrmcall

English

219

29.7K

Leo Chenghui Li retweetledi

Jitendra MALIK@JitendraMalikCV·3d

We can convert human videos to robot hand-object interaction trajectories in 4D. Enjoy! Paper: arxiv.org/abs/2606.19333 Website: do-as-i-do.com Code: github.com/malik-group/do… Authors:@bhawna_paliwal_,@HarithejaE,@willjhliang, @pabbeel , @notmahi , @JitendraMalikCV

English

756

56K

Leo Chenghui Li retweetledi

hugo@hugothomel·9 Haz

x.com/i/article/2063…

ZXX

117

26.6K

Leo Chenghui Li retweetledi

Jitendra MALIK@JitendraMalikCV·6 Haz

I want to offer some unsolicited advice to computer vision researchers jumping into robotics. Don't focus too much on VLMs, VLAs etc. That's fine, but the real action is at the sensorimotor level. Most of the open problems in robotics are in manipulation, which is about hand-object interaction, and contacts and forces are central. Proprioception and tactile sensing are as important as vision. Don't get seduced by cherry-picked demos. You can't do robotics without doing robotics.

English

395

3.2K

478.3K

Leo Chenghui Li@leo_chenghui·4 Haz

encoder free for all modalities is awesome.

Michael Tschannen@mtschannen

For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited that we're releasing our latest model aligned with this theme: Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs! 1/

English

Leo Chenghui Li@leo_chenghui·3 Haz

Nice takes on test-time scaling. Visual RL is typically way harder than LLMs as the search space is more multi-layered: 1. Frame-native: diffusion trajectories 2. Control-native: conditioning, edibility 3. Temporal-native: video consistency 4. Planning-native: predictive planning I think visual codes could be very helpful to 2 and 4.

English

102

Yoko@stuffyokodraws·2 Haz

x.com/i/article/2061…

ZXX

241

168.2K

Leo Chenghui Li retweetledi

Fei-Fei Li@drfeifei·3 Haz

x.com/i/article/2062…

ZXX

171

984

4.6K

Leo Chenghui Li retweetledi

Rishabh Agarwal@agarwl_·15 May

GPT-3 was a sensation because it claimed language models are few-shot in-context learners. I wonder why we dropped the ball on in-context learning, and moved to mostly execution oriented research on LLMs: training on any task of high value, a set that will keep increasing with no end in sight. Maybe we'll get these data centers with geniuses, but they are *only* geniuses on tasks that your favourite frontier lab decides to directly / indirectly optimize for.

Delip Rao e/σ@deliprao

In-context learning in LLMs

English

188

30.2K

Leo Chenghui Li retweetledi

Rishabh Agarwal@agarwl_·15 May

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

569

73.5K

Leo Chenghui Li retweetledi

Yu Su@ysu_nlp·13 May

"Frontier labs are organised to serve one model to many customers. Specialisation requires the inverse, that is, many models built for segmented customers" well said.

Charlie O'Neill@oneill_c

x.com/i/article/2054…

English

11.3K

Leo Chenghui Li retweetledi

Sam Altman@sama·13 May

codex is the best AI coding product and we want to make it easy to try. for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.

English

1.8K

882

21.3K

2.3M

Leo Chenghui Li@leo_chenghui·12 May

@johnschulman2 Any plans on continuous video as outputs?

English

181

John Schulman@johnschulman2·11 May

Seeing the demos come together over the last week has been awesome -- so many things that previously required a special-purpose model (e.g. real-time translation, event detection in video) turn out to be zero-shot instruction following once you have a general-purpose model with the right type signature -- continuous/simultaneous audio+video+text->audio+text

English

164

9.3K

John Schulman@johnschulman2·11 May

Sharing our work on full-duplex multimodal models -- real-time interaction that's natural and intuitive without compromising on intelligence. We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to intelligence/autonomy because they're harder to eval. In the future, we think every AI system will have something like an interaction model as the outer user-facing layer, continually keeping the user informed and learning what they actually want.

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

933

124.7K

Leo Chenghui Li@leo_chenghui·12 May

@rohankalia_ @thinkymachines Yeah it gives me chills. It’ll be even more amazing if it’s from the interaction model.

English

rohan@rohankalia_·12 May

well played @thinkymachines, the time-awareness demo is amazing. mech interp on this specific capability would be cool to see! (how does the model “count” time?)

Thinking Machines@thinkymachines

The team has been sweeping at local trivia night thanks to a model that's aware of continuous time.

English

7.7K

Leo Chenghui Li@leo_chenghui·12 May

This is incredible. Great HCI is what’s missing for multimodal.

Shaojie Bai@shaojieb

🎆Beyond stoked to share some experiments of what we’ve been working on. It's been an absolute adventure building this (still early-stage) interaction model from ground up with the team, rethinking many components to enable a model that **interacts natively**. It sees the world, talks over users, searches, and generates artifacts. It grasps the dimension of time. No more explicit “scaffolding”/turn-taking (think about having a conversation via email?). The future is live⚡️, and consider joining us on this journey if you are excited about this too!

English

Leo Chenghui Li@leo_chenghui·3 May

The blog from @xxunhuang on Video World Models is a goldmine. Reading it on a casual Saturday is like hitting a hidden gem restaurant. Will definitely read a couple more times more details.

Xun Huang@xxunhuang

Will be presenting Self Forcing during today’s NeurIPS poster session at 4:30pm. On Saturday's NextVid workshop, I’ll also be giving a talk on video world models—covering the challenges outlined in my blog postand sharing latest research to address them. Looking forward to the discussions! xunhuang.me/blogs/world_mo…

English

Leo Chenghui Li@leo_chenghui·2 May

People choose harder path because even when the destination is the same, the view is not.

English

Leo Chenghui Li@leo_chenghui·28 Nis

Good HCI

OpenAI Developers@OpenAIDevs

You can build interactive applications with gpt-realtime-1.5, so users can control app state more naturally with voice. Hi Chappy 👋

Čeština

Leo Chenghui Li retweetledi

David Duvenaud@DavidDuvenaud·28 Nis

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

201

456

3.6K

1.4M

Keşfet

@nvidia @bhawna_paliwal_ @HarithejaE @willjhliang @pabbeel @notmahi @JitendraMalikCV @johnschulman2