Leo Chenghui Li

67 posts

Leo Chenghui Li

Leo Chenghui Li

@leo_chenghui

AI Research & Engineering @AIatMeta: World Models, Human AI, Multimodal LLMs | Ex-CMU Ex-CUHK

Katılım Kasım 2014
261 Takip Edilen53 Takipçiler
Leo Chenghui Li retweetledi
Yun-Ta Tsai
Yun-Ta Tsai@yunta_tsai·
Many people think any given ML project is 99% training. In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training. The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data. Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.
English
509
1.1K
9.9K
16.1M
Leo Chenghui Li retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
New work with @nvidia: evaluating robot policies entirely inside a world model. The policy acts, the model imagines the consequences, and the imagined evals predict real-world results. 🧵 real vs world-model rollout side by side📷
GIF
English
16
94
648
83.4K
Leo Chenghui Li retweetledi
Leo Chenghui Li retweetledi
Jitendra MALIK
Jitendra MALIK@JitendraMalikCV·
I want to offer some unsolicited advice to computer vision researchers jumping into robotics. Don't focus too much on VLMs, VLAs etc. That's fine, but the real action is at the sensorimotor level. Most of the open problems in robotics are in manipulation, which is about hand-object interaction, and contacts and forces are central. Proprioception and tactile sensing are as important as vision. Don't get seduced by cherry-picked demos. You can't do robotics without doing robotics.
English
73
395
3.2K
478.3K
Leo Chenghui Li
Leo Chenghui Li@leo_chenghui·
Nice takes on test-time scaling. Visual RL is typically way harder than LLMs as the search space is more multi-layered: 1. Frame-native: diffusion trajectories 2. Control-native: conditioning, edibility 3. Temporal-native: video consistency 4. Planning-native: predictive planning I think visual codes could be very helpful to 2 and 4.
English
0
0
1
102
Leo Chenghui Li retweetledi
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
GPT-3 was a sensation because it claimed language models are few-shot in-context learners. I wonder why we dropped the ball on in-context learning, and moved to mostly execution oriented research on LLMs: training on any task of high value, a set that will keep increasing with no end in sight. Maybe we'll get these data centers with geniuses, but they are *only* geniuses on tasks that your favourite frontier lab decides to directly / indirectly optimize for.
Delip Rao e/σ@deliprao

In-context learning in LLMs

English
15
13
188
30.2K
Leo Chenghui Li retweetledi
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…
Rishabh Agarwal tweet media
English
18
73
569
73.5K
Leo Chenghui Li retweetledi
Yu Su
Yu Su@ysu_nlp·
"Frontier labs are organised to serve one model to many customers. Specialisation requires the inverse, that is, many models built for segmented customers" well said.
Charlie O'Neill@oneill_c

x.com/i/article/2054…

English
0
11
47
11.3K
Leo Chenghui Li retweetledi
Sam Altman
Sam Altman@sama·
codex is the best AI coding product and we want to make it easy to try. for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.
English
1.8K
882
21.3K
2.3M
John Schulman
John Schulman@johnschulman2·
Seeing the demos come together over the last week has been awesome -- so many things that previously required a special-purpose model (e.g. real-time translation, event detection in video) turn out to be zero-shot instruction following once you have a general-purpose model with the right type signature -- continuous/simultaneous audio+video+text->audio+text
English
5
6
164
9.3K
John Schulman
John Schulman@johnschulman2·
Sharing our work on full-duplex multimodal models -- real-time interaction that's natural and intuitive without compromising on intelligence. We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to intelligence/autonomy because they're harder to eval. In the future, we think every AI system will have something like an interaction model as the outer user-facing layer, continually keeping the user informed and learning what they actually want.
Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English
35
84
933
124.7K
Leo Chenghui Li
Leo Chenghui Li@leo_chenghui·
People choose harder path because even when the destination is the same, the view is not.
English
0
0
0
23
Leo Chenghui Li retweetledi
David Duvenaud
David Duvenaud@DavidDuvenaud·
Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵
English
201
456
3.6K
1.4M