Minsu Kim

79 posts

Minsu Kim

@minsuuukim

Postdoc @Mila_Quebec and KAIST | Academic collaborator @LawZero_ | RL post-training, reasoning, safety, AI4Science

South Korea Katılım Mart 2025

146 Takip Edilen212 Takipçiler

Minsu Kim retweetledi

Rohan Paul@rohanpaul_ai·17h

A 10 million parameter model just outperformed deterministic rivals 3 times its size by doing something regular recursive AI dont do: exploring multiple reasoning paths at the same time. Most AI reasoning models are trapped on a single train of thought, and GRAM ("Generative Recursive Reasoning") is the first to break that by letting the model think in parallel universes simultaneously. The problem is that all existing recursive models are fully deterministic, meaning given the same input they always follow the exact same reasoning path and can never escape a wrong trajectory or discover more than 1 valid answer. GRAM fixes this by injecting learned randomness at each refinement step, so the model samples a slightly different direction each time rather than snapping to 1 fixed next state, which produces a spread of diverse reasoning trajectories. At test time the model runs many of these paths in parallel and selects the best one using a small reward predictor trained alongside the main model, adding a "width" scaling axis on top of the usual "depth" axis of running more recursion steps. On hard Sudoku puzzles, GRAM with 10M parameters hits 97% accuracy versus 87.4% for the best prior recursive model, and with only 20 parallel samples it outperforms every deterministic baseline even at 320 recursion steps. On tasks with many valid answers like N-Queens, deterministic recursive models collapse as the number of solutions grows, while GRAM maintains near-perfect accuracy throughout. The same stochastic framework also acts as a generator: given a blank board, GRAM produces valid Sudoku puzzles 99% of the time using 16 steps, versus 1,000 steps and 55M parameters for the best diffusion baseline at just 91%. --- Paper Link – arxiv. org/abs/2605.19376v1

English

220

10.6K

Minsu Kim retweetledi

Sungjin Ahn@SungjinAhn_·1d

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: arxiv.org/abs/2605.19376 🌐 Project page: ahn-ml.github.io/gram-website w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

English

190

1.4K

147.2K

Minsu Kim retweetledi

Sungjin Ahn@SungjinAhn_·2d

KAIST AI (College of AI) is hiring! If you are attending ICML 2026 in Seoul and are interested in faculty or postdoc positions at KAIST AI Computing (and CS), feel free to reach out by filling out this short interest form: forms.gle/i9WRweMX56Va8m… We are looking for researchers across broad areas of AI and Computer Science, including ML, NLP, CV, HCI, Systems and more. Please share with anyone who may be interested!

English

16.8K

Minsu Kim retweetledi

BURKOV@burkov·5d

This Google/Cambridge ICLR 2026 paper introduces Visual Planning, a novel reinforcement learning framework (VPRL) that enables purely visual reasoning through sequences of images, outperforming text-only planning in visual navigation tasks and offering a promising supplement to language-based reasoning for "vision-first" challenges. Read with an AI tutor: chapterpal.com/s/57kf5sky/vis… PDF: arxiv.org/pdf/2505.11409

English

237

15.4K

Minsu Kim retweetledi

ChangHao@ChangHao564792d·14 May

🚀 Excited to share our new paper: Revisiting DAgger in the Era of LLM-Agents! Training long-horizon LLM agents is hard: 🔸 SFT → covariate shift 🔸 RL → sparse rewards 🔸 On-policy distillation → cold-start failure + needs white-box teacher logits We bring back DAgger to fix all three: on-policy rollouts ✕ dense teacher supervision, no cold-start, fully black-box-teacher compatible. ✨ Results on SWE-bench Verified: 🔹 Our 4B agent hits 27.3%, beating published 8B SWE-agent systems 🔹 Our 8B agent hits 29.8%, surpassing SWE-Gym-32B and within 5 pts of strong 32B agents 📄 Paper: arxiv.org/abs/2605.12913 🤗 HF Daily: huggingface.co/papers/2605.12…

English

127

25.3K

Minsu Kim retweetledi

Moksh Jain@JainMoksh·13 May

The scientific process involves collecting informative measurements while effectively allocating limited resources. We developed MaD-Physics, a new benchmark to measure this capability of agents.

English

6.1K

Minsu Kim@minsuuukim·3 May

@HyeonahKimm @alexhdezgcia Paper: arxiv.org/abs/2602.04119

English

105

Minsu Kim@minsuuukim·3 May

Do we really need to hard-code synthesis routes into the generative process to obtain synthesizable molecules? Our ICML 2026 paper suggests another route. Huge credit to @HyeonahKimm , @alexhdezgcia , Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, @Yoshua_Bengio In S3-GFN, we keep the molecular generator sequence-based, initialize it from a rich SMILES prior, and induce synthesizability through soft distributional post-training. Rather than treating synthesizability as a hard action-space constraint or simply folding it into scalar reward shaping, we maintain positive/negative replay buffers and use a contrastive auxiliary loss to separate synthesizable and unsynthesizable regions in probability space. This gives a simple but flexible way to steer GFlowNet sampling toward high-reward, synthesizable molecules while retaining the benefits of pretrained chemical language models. (1/4)

English

2.1K

Minsu Kim@minsuuukim·3 May

@Yoshua_Bengio

QME

Minsu Kim@minsuuukim·3 May

Excited to share that our paper “Active Attacks: Red-teaming LLMs via Adaptive Environments” has been accepted to ICML 2026. Joint work with Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park, and @Yoshua_Bengio . We study automated red-teaming: training an attacker LLM to generate diverse attack prompts that expose failure modes in a victim LLM, then using those prompts to improve safety tuning. Can stronger adaptive attacks make LLMs safer? More below 🧵

English

1.6K

Minsu Kim@minsuuukim·3 May

3/3 Technically, we combine adaptive environments with soft/off-policy RL and replay training. The result is broader coverage of harmful modes and stronger safety tuning: cross-attack success vs GFlowNet baselines improves from 0.07% to 31.28%, with only ~6% extra compute. Paper: arxiv.org/abs/2509.21947

English

107

Minsu Kim@minsuuukim·3 May

2/3 Active Attacks makes the red-teaming environment adaptive. After each round, we safety-finetune the victim on discovered attacks. This lowers reward in already-exploited regions, pushing the attacker to search for new vulnerabilities. This creates an active-learning-like, easy-to-hard curriculum.

English

100

Minsu Kim@minsuuukim·3 May

@Yoshua_Bengio 1/3 A key challenge in RL-based red-teaming is diversity. If an attacker finds a few easy high-reward attack modes, it can keep exploiting them. This may give high attack success, but poor coverage of failure modes — exactly what we do not want for robust safety tuning.

English

103

Minsu Kim@minsuuukim·3 May

Empirically, this simple recipe works surprisingly well: S3-GFN achieves high synthesizability (often >95%) while maintaining strong optimization performance across molecular design tasks. So the takeaway is: you may not need to hard-code synthesis routes to generate synthesizable molecules — a strong sequence prior + soft constrained post-training can already go a long way. (4/4)

English

Minsu Kim@minsuuukim·3 May

Our approach, S3-GFN, keeps the generator sequence-based (SMILES) and starts from a rich chemical prior. Instead of baking synthesizability into the MDP as a hard constraint, we induce it through soft distributional post-training. The core mechanism is simple: - maintain positive / negative replay buffers add a contrastive auxiliary loss - suppress unsynthesizable regions while preserving reward-seeking behavior (3/4)

English

108

Minsu Kim retweetledi

Ayhan Suleymanzade@ayhozade·21 Nis

On my way to #ICLR2026 🇧🇷✈️ Hmu if you want to chat about latent/continuous reasoning and flow/diffusion language models. I’ll be presenting #MUX: → compress reasoning into continuous latent space → multiplex multiple reasoning paths → fewer tokens, better reasoning

English

1.3K

Minsu Kim retweetledi

ICLR@iclr_conf·23 Nis

Announcing the #ICLR2026 Outstanding Paper Awards 🏆 Congratulations to:

English

84.9K

Minsu Kim retweetledi

Sungjin Ahn@SungjinAhn_·31 Mar

We are seeking a highly motivated postdoctoral researcher to work on fundamental challenges toward AGI, particularly in reasoning, abstraction, and world modeling. The position also offers potential opportunities for co-advising with Yoshua Bengio (Mila) and/or Mengye Ren (NYU). Research areas include: • World Model Learning & Planning • Compositional Generalization & Neuro-Symbolic World Learning • Causal Discovery, Reasoning, and Abstraction This position is supported by the InnoCORE Fellowship Program 2026, with: • Competitive salary of KRW 90M+ (~USD 60K+) • Renewable yearly contract For more information and recent publications: mlml.kaist.ac.kr If you are interested, please send me your CV by email.

English

10.4K

Keşfet

@JunyeobB @pyross0000 @mengyer @Yoshua_Bengio @SungjinAhn_ @HyeonahKimm @alexhdezgcia @elonmusk