Jaesik Yoon

70 posts

Jaesik Yoon

@jaesikyoon_

Senior Machine Learning Developer at SAP and a Ph.D. student advised by Prof. @sungjinahn_ at MLML. Working for General AI in terms of product and research.

Republic of Korea Katılım Temmuz 2023

189 Takip Edilen176 Takipçiler

Jaesik Yoon retweetledi

Sungjin Ahn@SungjinAhn_·13h

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: arxiv.org/abs/2605.19376 🌐 Project page: ahn-ml.github.io/gram-website w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

English

155

1.1K

109.4K

Jaesik Yoon retweetledi

Sungjin Ahn@SungjinAhn_·1d

KAIST AI (College of AI) is hiring! If you are attending ICML 2026 in Seoul and are interested in faculty or postdoc positions at KAIST AI Computing (and CS), feel free to reach out by filling out this short interest form: forms.gle/i9WRweMX56Va8m… We are looking for researchers across broad areas of AI and Computer Science, including ML, NLP, CV, HCI, Systems and more. Please share with anyone who may be interested!

English

16.4K

Jaesik Yoon retweetledi

Yoonho Lee@yoonholeee·30 Mar

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English

282

1.7K

571.2K

Jaesik Yoon retweetledi

Sungjin Ahn@SungjinAhn_·31 Mar

We are seeking a highly motivated postdoctoral researcher to work on fundamental challenges toward AGI, particularly in reasoning, abstraction, and world modeling. The position also offers potential opportunities for co-advising with Yoshua Bengio (Mila) and/or Mengye Ren (NYU). Research areas include: • World Model Learning & Planning • Compositional Generalization & Neuro-Symbolic World Learning • Causal Discovery, Reasoning, and Abstraction This position is supported by the InnoCORE Fellowship Program 2026, with: • Competitive salary of KRW 90M+ (~USD 60K+) • Renewable yearly contract For more information and recent publications: mlml.kaist.ac.kr If you are interested, please send me your CV by email.

English

10.3K

Jaesik Yoon retweetledi

Justin Deschenaux@jdeschena·20 Mar

Interested in our work on Ψ-samplers? Make sure to join on Monday!

Discrete Diffusion Reading Group@diffusion_llms

📢 Mar 23 (Mon): The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum ☯️The Diffusion Duality (Duo) (ICML 2025) showed that uniform-state discrete diffusion arises from Gaussian diffusion. 🔮The new Chapter II paper (ICLR 2026) introduces Ψ-samplers: non-Markovian predictor-corrector samplers for arbitrary noise priors! Unlike ancestral sampling which plateaus, Ψ-samplers exhibit improved test-time scaling, beating MDLM on language generation (OpenWebText) and image generation (CIFAR-10). ⚡️The authors also reformulated the Gaussian curriculum from Duo, reducing its training time by 25% while matching perplexity and downstream accuracy. This Monday, Justin Deschenaux (@jdeschena) will present his paper, published with collaborators Caglar Gulcehre (@caglarml) and Subham Sahoo (@ssahoo_) Paper link: arxiv.org/abs/2602.21185

English

3.5K

Jaesik Yoon retweetledi

Demis Hassabis@demishassabis·10 Mar

Ten years ago, AlphaGo’s legendary match in Seoul heralded the start of the modern era in AI. Its famous ‘Move 37’ signaled to us that AI techniques were ready to tackle real-world problems in areas like science - and ideas inspired by these methods are critical to building AGI

English

175

504

3.6K

715.1K

Jaesik Yoon retweetledi

Sungjin Ahn@SungjinAhn_·3 Mar

Understanding LoRA as Knowledge Memory 🚀 Can we save new LLM facts directly into LoRA weights? While recent works are hastily treating LoRA as a plug-and-play knowledge memory, the fundamental mechanics governing its capacity and composability have remained largely unexplored. 🤯We asked the hard question: Can an adapter meant for task adaptation actually serve as a reliable store for precise, declarative knowledge? To find out, we ran the first systematic empirical study mapping the design space of LoRA-based memory. The shocking reality is that treating LoRA as a memory unit can catastrophically fail in certain settings if you blindly trust it. ✅ Rather than proposing a single architecture, our paper provides practical guidance on its hidden operational boundaries —from characterizing finite storage capacity limits to the harsh realities of multi-module scaling and merging interference. Check out our systematic map of when LoRA memory succeeds, and exactly when it breaks! 🧑🏻‍💻Led by my fantastic students @SeungjuBack (KAIST) and @DongwooLee00 (KAIST), in collaboration with Samsung SDS. arxiv.org/abs/2603.01097

English

187

11.4K

Jaesik Yoon retweetledi

Sungjin Ahn@SungjinAhn_·21 Oca

한국형 독자 파운데이션 모델(독파모) 사업을 보며 드는 생각 -- 독파모 사업이 잘했느냐 못했느냐를 평가하려는 글은 아니다. 다만 조금 다른 관점의 질문을 던져보고 싶다. (독파모 사업에 대해 여러 의견이 있지만, 개인적으로는 필요한 시도라고 생각한다.) 내 질문은 이것이다. 우리가 정말로 확보해야 하는 것은 무엇일까? 딥시크(DeepSeek)와 같은 대형 언어 모델 그 자체일까, 아니면 딥시크와 같은 결과물을 지속적으로 만들어낼 수 있는 연구 역량과 연구팀일까. 여러 자원과 노력을 투입하면 딥시크와 유사한 모델을 한 번쯤 따라잡는 데 성공할 수도 있다. 하지만 그 다음은 무엇일까? 그들이 다시 first move를 하면, 우리는 또 fast following을 반복하게 되지 않을까. 이 패턴을 우리는 이미 수십 년 동안 반복해오지 않았나. 우리의 역사가 보여주는 것은 분명하다. fast-following을 아무리 열심히 해도 first-move 역량이 저절로 생기지는 않는다. fast-following에 최적화된 조직은 결국 fast-following을 더 잘하게 될 뿐이다. 오히려 내 경험상, 그렇게 축적된 성공 경험과 관성, 사고의 틀은 새로운 first move를 상상하고 그 위험을 감내하는 데 방해가 되는 경우가 많다. 그래서 질문은 이렇게 바뀌어야 하지 않을까 싶다. “왜 우리는 딥시크 같은 모델이 없을까?”가 아니라, “왜 우리는 세계에 내놓을 만한 딥시크 같은 연구팀이 없을까?” 딥시크를 보며 느끼는 점은, 이 모델이 운 좋게 한 번 터진 결과물이 아니라는 것이다. 딥시크가 진짜 무서운 이유는 단일 모델이 아니라, first-move 혁신을 지속할 수 있는 역량을 이미 갖추고 있다는 점이다. 많은 분들이 딥시크 모델 하나만 떠올리지만, 이 분야를 연구하는 입장에서 더 인상적인 것은 딥시크가 이후에도 구글 딥마인드와 견줄 만한 수준의 혁신을 계속 만들어낼 수 있는 연구 문화와 인력, 그리고 시스템을 갖춘 조직이라는 사실이다. 딥시크뿐만 아니라, 최근 일본의 Sakana AI 역시 세계 최고 수준의 first-move 연구 혁신을 지속적으로 만들어내는 사례로 보인다. 우리가 새들을 빨리, 많이 모으기 위해 달콤한 모이를 열심히 뿌리는 동안, 우리의 경쟁자들은 더 좋은 숲을 가꾸는 데 집중하고 있다. 모이를 먹은 새들은 언젠가 날아가지만, 숲을 찾은 새들은 그곳에 둥지를 튼다. 왜 우리 인재들은 다른 나라의 숲을 찾아 떠날까. 설령 우리가 딥시크와 유사한 모델을 확보하더라도, 그 숲이 없다면 인재들은 결국 떠날 것이다. 하지만 딥시크와 같은 회사들이 많아진다면, 상황은 달라지지 않을까. 독파모 다음은 한국형 "딥시크형 지속혁신 연구회사 육성사업"을 하면 어떨까?

한국어

3.6K

Jaesik Yoon retweetledi

Sungjin Ahn@SungjinAhn_·7 Ara

We may need to consider splitting the review process into academic and industry tracks in future AI/ML conferences. I’ve seen many genuinely good ideas get rejected simply because the experiments aren’t “large-scale” enough—which in practice often means industry-lab-scale resources that most academic groups cannot access. An academic track could allow reviewers to focus more on conceptual novelty and proof-of-concept potential, rather than scale. This might help preserve innovative ideas that would otherwise be abandoned due to resource asymmetry.

English

14.9K

Jaesik Yoon@jaesikyoon_·1 Ara

@tkipf Please check dm when you are okay!

English

Thomas Kipf@tkipf·28 Kas

I'll be at NeurIPS all week -- reach out if you want to chat! Would love to chat especially if you work on world models (in particular for the physical world / robotics), visual reasoning, or controls for video gen.

English

162

17.2K

Jaesik Yoon@jaesikyoon_·1 Ara

@BowenJin13 Looks interesting!

English

181

Bowen Jin@BowenJin13·1 Ara

I’ll be at #NeurIPS2025 from 12/3 to 12/5. Excited to catch up with old friends and meet new ones! We will present our work on RL for LLM latent reasoning: arxiv.org/abs/2505.18454 📍 Poster #312 🗓️ Wed, Dec 3 ⏰ 11 a.m. – 2 p.m. PST Location: Exhibit Hall C/D/E Come say hi!

English

7.1K

Jaesik Yoon@jaesikyoon_·1 Ara

I'm visiting San Diego to present the following papers at the upcoming #NeurIPS2025 ! - Adaptive Cyclic Diffusion (Wed. Morning #3514) - ✨ Compositional Monte Carlo Tree Diffusion (Spotlight, Thu. Afternoon #3712) - ✨ Fast Monte Carlo Tree Diffusion (Spotlight, Fri. Morning #3609) Please stop by if you are interested in Diffusion-based Planning, Generative Search, or Reasoning with Generative Modeling. I am also open to coffee chats on various topics beyond my research interests. Please feel free to email me if you'd like to connect!

English

2.3K

Jaesik Yoon@jaesikyoon_·26 Kas

@johnlyzhou Thank you! Hope to see you in NeurIPS soon!

English

John Zhou@johnlyzhou·26 Kas

@jaesikyoon_ Hi Jaesik, I really enjoyed your MCTD works and would love to chat more about it at NeurIPS!

English

105

Jaesik Yoon@jaesikyoon_·25 Kas

I’ll be attending NeurIPS next week. Happy to connect and discuss ideas around diffusion-based planning, generative search, and reasoning with generative models!

English

558

Jaesik Yoon@jaesikyoon_·25 Kas

@DongyeongKim3 좋은 포스팅 감사합니다!

한국어

463

Jaesik Yoon retweetledi

Dongyeong Kim@DongyeongKim3·24 Kas

최근 gemini 3의 기록적인 발전에 CUDA기반 생태계없이 이를 이루어낸 것을 보고 JAX/XLA와 TPU에 관심을 가지시는 분이 많은 것 같습니다. 오늘의 뻘글로 왜 구글은 TPU를 개발하게 되었고, 기존에 잘 알려진 Tensorflow를 버리고 JAX/XLA를 사용하게 되었는 지 이야기를 풀어보고자 합니다. (1편)

한국어

122

313

28.6K

Jaesik Yoon@jaesikyoon_·4 Kas

🧠 Our core question: "How can we extend MCTD to longer, more complex compositional planning tasks, beyond its trained trajectory lengths?" 💡 Our solution (C-MCTD): We solve this problem with plan-level tree search, and boost its efficiency via parallelization and amortization. It has been accepted as a Spotlight at the upcoming #neurips2025 . 📄 ArXiv: arxiv.org/abs/2510.21361 🌐 Project Page: jaesikyoon.com/c-mctd-page/ This work was advised by @SungjinAhn_ and co-worked with a great colleague @hyeonscho . Huge thanks to them and MLML members!

English

7.5K

Jaesik Yoon@jaesikyoon_·24 Eki

Why should diffusion language models be confined to a discrete token space? We studied how to overcome this limitation by applying a 'loophole' for updating continuous latents during the denoising process. Curious about our findings? Check out our paper, "Loopholing Discrete Diffusion"! Huge thanks to my advisor @SungjinAhn_ and @pyross0000 (amazing achievement as an undergraduate!), and our wonderful collaborators @jdeschena and @caglarml !

Sungjin Ahn@SungjinAhn_

🚨 Check out our new paper on next generation language modeling via "loopholing" discrete diffusion! 🤯 Surprisingly, our loopholing diffusion achieved a huge performance improvement, finally making it match (or even surpass) autoregressive models! ✅ How? We introduce the "loopholing" mechanism — a discrete diffusion that introduces a deterministic bypass alongside the stochastic path to break the sampling wall. 👨🏻‍💻 Led by my fantastic student Mingyu (@pyross0000, KAIST) and @jaesikyoon_ (KAIST), in collaboration with Justin Deschenaux (EPFL) and Caglar Gulcehre (EPFL, Microsoft). 📄 arXiv: arxiv.org/abs/2510.19304 🌐 Project: sites.google.com/view/lddms/home

English

2.1K

Keşfet

@JunyeobB @pyross0000 @minsuuukim @mengyer @Yoshua_Bengio @SungjinAhn_ @SeungjuBack @DongwooLee00