Hyeonseo Cho

12 posts

Hyeonseo Cho

@hyeonscho

Master student advised by Prof. @sungjinahn_

Katılım Mayıs 2024

84 Takip Edilen19 Takipçiler

Hyeonseo Cho retweetledi

Sungjin Ahn@SungjinAhn_·12h

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: arxiv.org/abs/2605.19376 🌐 Project page: ahn-ml.github.io/gram-website w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

English

151

1.1K

105.7K

Hyeonseo Cho retweetledi

Khai Loong Aw@khai_loong_aw·14 Nis

Today's best AI needs orders of magnitude more data than a human child to achieve visual competence. We introduce the Zero-shot World Model (ZWM), an approach that substantially narrows this gap. Even when trained on the first-person experience of a single child, BabyZWM matches state-of-the-art models on diverse visual-cognitive tasks – with no task-specific training, i.e., zero-shot. 🧵

English

359

38.5K

Hyeonseo Cho retweetledi

Stefan Baumann@StefanABaumann·13 Nis

You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step. We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models. Myriad, accepted at @CVPR 2026

English

353

26.3K

Hyeonseo Cho retweetledi

Sungjin Ahn@SungjinAhn_·3 Mar

Understanding LoRA as Knowledge Memory 🚀 Can we save new LLM facts directly into LoRA weights? While recent works are hastily treating LoRA as a plug-and-play knowledge memory, the fundamental mechanics governing its capacity and composability have remained largely unexplored. 🤯We asked the hard question: Can an adapter meant for task adaptation actually serve as a reliable store for precise, declarative knowledge? To find out, we ran the first systematic empirical study mapping the design space of LoRA-based memory. The shocking reality is that treating LoRA as a memory unit can catastrophically fail in certain settings if you blindly trust it. ✅ Rather than proposing a single architecture, our paper provides practical guidance on its hidden operational boundaries —from characterizing finite storage capacity limits to the harsh realities of multi-module scaling and merging interference. Check out our systematic map of when LoRA memory succeeds, and exactly when it breaks! 🧑🏻‍💻Led by my fantastic students @SeungjuBack (KAIST) and @DongwooLee00 (KAIST), in collaboration with Samsung SDS. arxiv.org/abs/2603.01097

English

187

11.4K

Hyeonseo Cho retweetledi

BURKOV@burkov·28 Ara

NeurIPS 2025 Best Paper Awards The paper addresses the following question: why don't diffusion models simply memorize their training data, given that they have enough parameters to do so? The authors discover that the answer lies in a separation of timescales during training—models learn to generate quality samples at time τ_gen, but only begin memorizing at a later time τ_mem that grows linearly with dataset size. This means larger datasets don't just provide more variety; they fundamentally change the training dynamics by pushing memorization further into the future, opening a widening window where early stopping yields generalization. The paper backs this up both empirically and theoretically. A must-read if you work with generative models and have wondered why your overparameterized network doesn't just regurgitate training examples, or if you want principled guidance on when to stop training. Read and ask questions on ChapterPal: chapterpal.com/s/4c0918df/why… PDF: arxiv.org/pdf/2505.17638

English

185

1.3K

102K

Hyeonseo Cho retweetledi

Dongyeong Kim@DongyeongKim3·24 Kas

최근 gemini 3의 기록적인 발전에 CUDA기반 생태계없이 이를 이루어낸 것을 보고 JAX/XLA와 TPU에 관심을 가지시는 분이 많은 것 같습니다. 오늘의 뻘글로 왜 구글은 TPU를 개발하게 되었고, 기존에 잘 알려진 Tensorflow를 버리고 JAX/XLA를 사용하게 되었는 지 이야기를 풀어보고자 합니다. (1편)

한국어

122

313

28.6K

Hyeonseo Cho retweetledi

Jaesik Yoon@jaesikyoon_·4 Kas

🧠 Our core question: "How can we extend MCTD to longer, more complex compositional planning tasks, beyond its trained trajectory lengths?" 💡 Our solution (C-MCTD): We solve this problem with plan-level tree search, and boost its efficiency via parallelization and amortization. It has been accepted as a Spotlight at the upcoming #neurips2025 . 📄 ArXiv: arxiv.org/abs/2510.21361 🌐 Project Page: jaesikyoon.com/c-mctd-page/ This work was advised by @SungjinAhn_ and co-worked with a great colleague @hyeonscho . Huge thanks to them and MLML members!

English

7.5K

Hyeonseo Cho retweetledi

Sungjin Ahn@SungjinAhn_·23 Eki

🚨 Check out our new paper on next generation language modeling via "loopholing" discrete diffusion! 🤯 Surprisingly, our loopholing diffusion achieved a huge performance improvement, finally making it match (or even surpass) autoregressive models! ✅ How? We introduce the "loopholing" mechanism — a discrete diffusion that introduces a deterministic bypass alongside the stochastic path to break the sampling wall. 👨🏻‍💻 Led by my fantastic student Mingyu (@pyross0000, KAIST) and @jaesikyoon_ (KAIST), in collaboration with Justin Deschenaux (EPFL) and Caglar Gulcehre (EPFL, Microsoft). 📄 arXiv: arxiv.org/abs/2510.19304 🌐 Project: sites.google.com/view/lddms/home

GIF

English

19.7K

Hyeonseo Cho retweetledi

Sungjin Ahn@SungjinAhn_·28 Ağu

🚀 Introducing CrafterDojo! Crafter has been a popular testbed for open-ended agent learning—but progress has been limited without foundation models like VPT, CLIP, and STEVE. With CrafterDojo, we provide these models + toolkits so the community can easily prototype LLM-augmented agents in Crafter. Led by amazing students: Junyeong Park & Hyeonseo Cho (KAIST) ✨ 📄 arXiv: arxiv.org/abs/2508.13530 🌐 Webpage: sites.google.com/view/crafterdo…

English

1.8K

Hyeonseo Cho retweetledi

Jaesik Yoon@jaesikyoon_·15 Tem

Excited to present Monte Carlo Tree Diffusion (MCTD) at @icmlconf ICML2025! We integrate diffusion models with Monte Carlo Tree Search for more scalable planning. Come see how MCTD consistently outperforms other methods in complex, long-horizon tasks. 🗓️ Wednesday (Day 2) ⏰ 11:00 - 13:30 📍 Poster Session 3 West W-716

Sungjin Ahn@SungjinAhn_

🚀 Excited to introduce "Monte Carlo Tree Diffusion (MCTD) for System 2 Planning!" MCTD scales test-time compute by combining Monte Carlo Tree Search (MCTS) and Diffusion, bringing the best of both worlds. With @jaesikyoon_, Hyeonseo Cho, Doojin Baek, and Yoshua Bengio. 📄 arXiv: arxiv.org/abs/2502.07202 🌐 Project page: sites.google.com/view/mctd-s2pl…

English

1.5K

Hyeonseo Cho retweetledi

Sungjin Ahn@SungjinAhn_·25 Haz

⚡️ New breakthrough in Monte Carlo Tree Diffusion (MCTD) for System 2 Planning — powered by the KAIST–Mila collaboration! “Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning” 📄 arxiv.org/abs/2506.09498 The biggest bottleneck of MCTD was speed. We addressed this issue and made it practical for fast reasoning ⚡️. By parallelizing and sparsifying the planning rollouts, we made MCTD practical for fast, scalable reasoning. Surprisingly, we achieved ✅ Up to 100× faster speed ✅ No or negligible performance drop across diverse tasks! With @jaesikyoon_, Hyeonseo Cho, and Yoshua Bengio.

English

298

22K

Hyeonseo Cho retweetledi

Jaesik Yoon@jaesikyoon_·18 Şub

I am happy to share our new paper, Monte Carlo Tree Diffusion (MCTD) for System 2 Planning. Unsupervised RL with generative model is one of the remedies to achieve the general agent, then what if applying test-time computing to there? We studied this on Diffusion Planner.

English

4.9K

Keşfet

@JunyeobB @pyross0000 @minsuuukim @mengyer @Yoshua_Bengio @SungjinAhn_ @CVPR @SeungjuBack