Hyeonseo Cho

12 posts

Hyeonseo Cho

Hyeonseo Cho

@hyeonscho

Master student advised by Prof. @sungjinahn_

Katılım Mayıs 2024
84 Takip Edilen19 Takipçiler
Hyeonseo Cho retweetledi
Sungjin Ahn
Sungjin Ahn@SungjinAhn_·
🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: arxiv.org/abs/2605.19376 🌐 Project page: ahn-ml.github.io/gram-website w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)
Sungjin Ahn tweet mediaSungjin Ahn tweet mediaSungjin Ahn tweet media
English
23
151
1.1K
105.7K
Hyeonseo Cho retweetledi
Khai Loong Aw
Khai Loong Aw@khai_loong_aw·
Today's best AI needs orders of magnitude more data than a human child to achieve visual competence. We introduce the Zero-shot World Model (ZWM), an approach that substantially narrows this gap. Even when trained on the first-person experience of a single child, BabyZWM matches state-of-the-art models on diverse visual-cognitive tasks – with no task-specific training, i.e., zero-shot. 🧵
Khai Loong Aw tweet media
English
4
71
359
38.5K
Hyeonseo Cho retweetledi
Stefan Baumann
Stefan Baumann@StefanABaumann·
You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step. We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models. Myriad, accepted at @CVPR 2026
Stefan Baumann tweet media
English
4
56
353
26.3K
Hyeonseo Cho retweetledi
Sungjin Ahn
Sungjin Ahn@SungjinAhn_·
Understanding LoRA as Knowledge Memory 🚀 Can we save new LLM facts directly into LoRA weights? While recent works are hastily treating LoRA as a plug-and-play knowledge memory, the fundamental mechanics governing its capacity and composability have remained largely unexplored. 🤯We asked the hard question: Can an adapter meant for task adaptation actually serve as a reliable store for precise, declarative knowledge? To find out, we ran the first systematic empirical study mapping the design space of LoRA-based memory. The shocking reality is that treating LoRA as a memory unit can catastrophically fail in certain settings if you blindly trust it. ✅ Rather than proposing a single architecture, our paper provides practical guidance on its hidden operational boundaries —from characterizing finite storage capacity limits to the harsh realities of multi-module scaling and merging interference. Check out our systematic map of when LoRA memory succeeds, and exactly when it breaks! 🧑🏻‍💻Led by my fantastic students @SeungjuBack (KAIST) and @DongwooLee00 (KAIST), in collaboration with Samsung SDS. arxiv.org/abs/2603.01097
Sungjin Ahn tweet mediaSungjin Ahn tweet media
English
2
35
187
11.4K
Hyeonseo Cho retweetledi
BURKOV
BURKOV@burkov·
NeurIPS 2025 Best Paper Awards The paper addresses the following question: why don't diffusion models simply memorize their training data, given that they have enough parameters to do so? The authors discover that the answer lies in a separation of timescales during training—models learn to generate quality samples at time τ_gen, but only begin memorizing at a later time τ_mem that grows linearly with dataset size. This means larger datasets don't just provide more variety; they fundamentally change the training dynamics by pushing memorization further into the future, opening a widening window where early stopping yields generalization. The paper backs this up both empirically and theoretically. A must-read if you work with generative models and have wondered why your overparameterized network doesn't just regurgitate training examples, or if you want principled guidance on when to stop training. Read and ask questions on ChapterPal: chapterpal.com/s/4c0918df/why… PDF: arxiv.org/pdf/2505.17638
BURKOV tweet media
English
22
185
1.3K
102K
Hyeonseo Cho retweetledi
Dongyeong Kim
Dongyeong Kim@DongyeongKim3·
최근 gemini 3의 기록적인 발전에 CUDA기반 생태계없이 이를 이루어낸 것을 보고 JAX/XLA와 TPU에 관심을 가지시는 분이 많은 것 같습니다. 오늘의 뻘글로 왜 구글은 TPU를 개발하게 되었고, 기존에 잘 알려진 Tensorflow를 버리고 JAX/XLA를 사용하게 되었는 지 이야기를 풀어보고자 합니다. (1편)
한국어
6
122
313
28.6K
Hyeonseo Cho retweetledi
Jaesik Yoon
Jaesik Yoon@jaesikyoon_·
🧠 Our core question: "How can we extend MCTD to longer, more complex compositional planning tasks, beyond its trained trajectory lengths?" 💡 Our solution (C-MCTD): We solve this problem with plan-level tree search, and boost its efficiency via parallelization and amortization. It has been accepted as a Spotlight at the upcoming #neurips2025 . 📄 ArXiv: arxiv.org/abs/2510.21361 🌐 Project Page: jaesikyoon.com/c-mctd-page/ This work was advised by @SungjinAhn_ and co-worked with a great colleague @hyeonscho . Huge thanks to them and MLML members!
Jaesik Yoon tweet media
English
0
16
93
7.5K
Hyeonseo Cho retweetledi
Sungjin Ahn
Sungjin Ahn@SungjinAhn_·
🚨 Check out our new paper on next generation language modeling via "loopholing" discrete diffusion! 🤯 Surprisingly, our loopholing diffusion achieved a huge performance improvement, finally making it match (or even surpass) autoregressive models! ✅ How? We introduce the "loopholing" mechanism — a discrete diffusion that introduces a deterministic bypass alongside the stochastic path to break the sampling wall. 👨🏻‍💻 Led by my fantastic student Mingyu (@pyross0000, KAIST) and @jaesikyoon_ (KAIST), in collaboration with Justin Deschenaux (EPFL) and Caglar Gulcehre (EPFL, Microsoft). 📄 arXiv: arxiv.org/abs/2510.19304 🌐 Project: sites.google.com/view/lddms/home
GIF
Sungjin Ahn tweet media
English
5
17
64
19.7K
Hyeonseo Cho retweetledi
Sungjin Ahn
Sungjin Ahn@SungjinAhn_·
🚀 Introducing CrafterDojo! Crafter has been a popular testbed for open-ended agent learning—but progress has been limited without foundation models like VPT, CLIP, and STEVE. With CrafterDojo, we provide these models + toolkits so the community can easily prototype LLM-augmented agents in Crafter. Led by amazing students: Junyeong Park & Hyeonseo Cho (KAIST) ✨ 📄 arXiv: arxiv.org/abs/2508.13530 🌐 Webpage: sites.google.com/view/crafterdo…
Sungjin Ahn tweet media
English
2
7
25
1.8K
Hyeonseo Cho retweetledi
Jaesik Yoon
Jaesik Yoon@jaesikyoon_·
Excited to present Monte Carlo Tree Diffusion (MCTD) at @icmlconf ICML2025! We integrate diffusion models with Monte Carlo Tree Search for more scalable planning. Come see how MCTD consistently outperforms other methods in complex, long-horizon tasks. 🗓️ Wednesday (Day 2) ⏰ 11:00 - 13:30 📍 Poster Session 3 West W-716
Jaesik Yoon tweet media
Sungjin Ahn@SungjinAhn_

🚀 Excited to introduce "Monte Carlo Tree Diffusion (MCTD) for System 2 Planning!" MCTD scales test-time compute by combining Monte Carlo Tree Search (MCTS) and Diffusion, bringing the best of both worlds. With @jaesikyoon_, Hyeonseo Cho, Doojin Baek, and Yoshua Bengio. 📄 arXiv: arxiv.org/abs/2502.07202 🌐 Project page: sites.google.com/view/mctd-s2pl…

English
0
5
18
1.5K
Hyeonseo Cho retweetledi
Sungjin Ahn
Sungjin Ahn@SungjinAhn_·
⚡️ New breakthrough in Monte Carlo Tree Diffusion (MCTD) for System 2 Planning — powered by the KAIST–Mila collaboration! “Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning” 📄 arxiv.org/abs/2506.09498 The biggest bottleneck of MCTD was speed. We addressed this issue and made it practical for fast reasoning ⚡️. By parallelizing and sparsifying the planning rollouts, we made MCTD practical for fast, scalable reasoning. Surprisingly, we achieved ✅ Up to 100× faster speed ✅ No or negligible performance drop across diverse tasks! With @jaesikyoon_, Hyeonseo Cho, and Yoshua Bengio.
Sungjin Ahn tweet media
English
5
60
298
22K
Hyeonseo Cho retweetledi
Jaesik Yoon
Jaesik Yoon@jaesikyoon_·
I am happy to share our new paper, Monte Carlo Tree Diffusion (MCTD) for System 2 Planning. Unsupervised RL with generative model is one of the remedies to achieve the general agent, then what if applying test-time computing to there? We studied this on Diffusion Planner.
English
2
10
33
4.9K