Se June Joo

155 posts

Se June Joo

Se June Joo

@joocjun

🤖🦾Research Engineer @RLWRLD_ai |M.S @kaist_ai| B.S in Math & CS @yonsei_u

대한민국 서울 가입일 Eylül 2022
465 팔로잉170 팔로워
Se June Joo 리트윗함
Seonghyeon Ye
Seonghyeon Ye@SeonghyeonYe·
VLAs (from VLMs) ❌ => WAMs (from Video Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Video Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: arxiv.org/abs/2602.15922 💻 Code: github.com/dreamzero0/dre… 🤗 HF: huggingface.co/GEAR-Dreams/Dr…
Seonghyeon Ye tweet media
English
5
65
516
74K
Se June Joo 리트윗함
Joel Jang
Joel Jang@jang_yoel·
🚀 DreamZero training code is LIVE — train your own WAM (aka VAM)! 🔧 Replicate DROID from-scratch training 📊 Run evals on sim (DROID-Sim, MolmoSpaces, Polaris) & real-world (RoboArena) No 2 GB200s for real-time inference? No problem — let NVIDIA carry that burden 💪. Sign up for our API and jump into prompting new tasks! (e.g. "fan the burger" 🍔, totally unseen verb/task from DROID) Coming soon: new embodiment/robot fine-tuning initialized from our DreamZero-AGIBot checkpoint. Stay tuned! 🤖 🔗 github.com/dreamzero0/dre…
Seonghyeon Ye@SeonghyeonYe

VLAs (from VLMs) ❌ => WAMs (from Video Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Video Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: arxiv.org/abs/2602.15922 💻 Code: github.com/dreamzero0/dre… 🤗 HF: huggingface.co/GEAR-Dreams/Dr…

English
2
17
116
10.3K
Se June Joo 리트윗함
Thomas Zhang
Thomas Zhang@ThomasTCKZhang·
🤖🤖Very excited to finally share our new work “Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control” Everyone in robotics does action-chunking, but why does it actually work?🤔🤔And, what can theory tell us about the properties of data we should be collecting for robotic behavior cloning? 🧵1/N
Thomas Zhang tweet media
English
5
61
403
59.2K
Se June Joo 리트윗함
Pascale Fung
Pascale Fung@pascalefung·
Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. • We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. • We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. • We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time. by @Delong0_0 @MustafaShukor1 @TheoMoutakanni @willyhcchung Jade Lei Yu Tejaswi Kasarla @AllenBolourchi @ylecun @pascalefung arxiv.org/abs/2512.10942
English
13
87
557
89.4K
Eddy Xu
Eddy Xu@eddybuild·
today, we're releasing the largest egocentric dataset of physical jobs - 400k action labels - 2.5k clips - 2x'd open source dataset size (download below)
English
109
184
2.3K
416.4K
Se June Joo 리트윗함
Sourish Jasti
Sourish Jasti@SourishJasti·
1/ The future of general-purpose robotics will be decided by one major question: which flavor of data scales reasoning? Every major lab represents a different bet. Over the past 3 months, @adam_patni, @vriishin, and I read the core research papers, spoke with staff at the major labs, and mapped the talent pool. This has completely changed how we think about general-purpose robotics. Our paper builds intuition, step-by step, across the 2025 frontier: from architectures → evals → data → industry dynamics. Each layer reveals a different bottleneck, but they all converge on one truth—data decides everything. Our takeaways + process below👇 If you want access to our graph (sound on), comment or DM me
English
87
189
836
179.4K
Se June Joo 리트윗함
Saining Xie
Saining Xie@sainingxie·
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
Saining Xie tweet media
English
57
324
1.9K
413.4K
Se June Joo 리트윗함
C Zhang
C Zhang@ChongZitaZhang·
Doing so called AI+robotics 30% time debugging real robot deployment 30% time fixing simulation and looking at tensorboard or wandb 30% time meetings and all kinds of non-research activities 10% time spin my brain to get a bit intellectual contributions with AI
English
4
5
128
6.3K
Se June Joo 리트윗함
RLWRLD
RLWRLD@RLWRLD_ai·
Just saw this awesome demo by @kaysorin — really proud to share ALLEX in action at the OpenAI Seoul Open Event! Watching it move, interact, and demonstrate real-world dexterity was something special. 🤖🙌 Huge shoutout to everyone involved — pushing the boundaries of what’s possible with physical AI. #RLWRLD #OpenAI #Robotics #PhysicalAI #Dexterity #Innovation #Seoul
Kay@kaysorin

In Seoul tonight for the @OpenAI Korea launch event. Sora installations, robot high fives, imagegen photobooths, and the amazing Korean founders and artists behind them all.

English
0
3
9
620
Se June Joo 리트윗함
Stone Tao
Stone Tao@Stone_Tao·
Opensourcing a useful tool to calibrate camera extrinsics painlessly in a minute, no checkerboards! It's based on EasyHEC, using differentiable rendering to optimize extrinsics given object meshes+poses. Crazy that even a piece of paper works too. Code: github.com/StoneT2000/sim…
English
7
41
244
43.8K
Se June Joo 리트윗함
Jianglong Ye
Jianglong Ye@jianglong_ye·
How to generate billion-scale manipulation demonstrations easily? Let us leverage generative models! 🤖✨ We introduce Dex1B, a framework that generates 1 BILLION diverse dexterous hand demonstrations for both grasping 🖐️and articulation 💻 tasks using a simple C-VAE model.
English
15
82
375
72.5K
Se June Joo 리트윗함
hyunji amy lee
hyunji amy lee@hyunji_amy_lee·
🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning.
hyunji amy lee tweet media
English
2
38
123
15.6K
Se June Joo 리트윗함
Seohong Park
Seohong Park@seohong_park·
Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
Seohong Park tweet media
English
35
182
1.2K
168.2K
Se June Joo 리트윗함
Sohee Yang
Sohee Yang@soheeyang_·
🚨 New Paper 🧵 How effectively do reasoning models reevaluate their thought? We find that: - Models excel at identifying unhelpful thoughts but struggle to recover from them - Smaller models can be more robust - Self-reevaluation ability is far from true meta-cognitive awareness
Sohee Yang tweet media
English
4
26
130
10.1K
Se June Joo 리트윗함
Younggyo Seo
Younggyo Seo@younggyoseo·
Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵
English
15
110
560
130.5K
Se June Joo 리트윗함
Yuke Zhu
Yuke Zhu@yukez·
We took a short break from robotics to build a human-level agent to play Competitive Pokémon. Partially observed. Stochastic. Long-horizon. Now mastered with Offline RL + Transformers. Our agent, trained on 475k+ human battles, hits the top 10% on Pokémon Showdown leaderboards. No search or heuristics, just sequence modeling. Today, we're open-sourcing our Metamon platform with our algorithms, data, and environments: 🌐 metamon.tech We are excited to see how our work accelerates research on building generally capable AI agents, and more importantly, inspires the next generation of Pokémon trainers!
English
10
64
362
50.4K
Joel Jang
Joel Jang@jang_yoel·
Some personal life update: I have joined @NVIDIAAI GEAR lab as a full-time Research Scientist last month (after one year as a research intern)! I’ll continue to be working on developing general-purpose robot foundation models. Stay tuned for some exciting updates!
Joel Jang tweet media
English
29
3
350
25.3K
Se June Joo 리트윗함
Tairan He
Tairan He@TairanHe99·
🚀 Can we make a humanoid move like Cristiano Ronaldo, LeBron James and Kobe Byrant? YES! 🤖 Introducing ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Website: agile.human2humanoid.com Code: github.com/LeCAR-Lab/ASAP
English
45
194
1K
257.4K