Junsoo Ha

51 posts

Junsoo Ha

@kuc2477

A graduate student @SNUVL; ML + Game Theory

대한민국 서울 Katılım Aralık 2015

807 Takip Edilen156 Takipçiler

Junsoo Ha retweetledi

Tengyu Ma@tengyuma·23 Nis

To solve hard open math problems, we need AI models to train and self-improve indefinitely without more external data. Humans can self-improve, so AI should as well if it imitates humans. So we let AI also conjecture, prove, and also be self-guided with some tastes.

Luke Bailey@LukeBailey181

Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass @4 of a model 100x bigger.

English

349

41.2K

Junsoo Ha retweetledi

Stat.ML Papers@StatMLPapers·22 Nis

Generalization at the Edge of Stability ift.tt/7AhNfSk

English

3.3K

Junsoo Ha@kuc2477·31 Mar

@YunseokJANG_kr 형 잘 지내시죠? 너무 축하드려요!

한국어

Junsoo Ha@kuc2477·4 Şub

@dayoon12161 Congrats!

English

159

Dayoon Ko@dayoon12161·4 Şub

🎉 Our paper has been accepted to #ICLR2026! 😆💖 This work was done during my internship at LG AI Research – Superintelligence Lab. As summarized in the project: Deep research requires broad evidence coverage and reliable synthesis. HybridDeepSearcher achieves both by parallel retrieval for breadth with sequential reasoning for depth, supporting scalable search. 🔗 Project page: hybriddeepsearcher.github.io 📄 OpenReview: openreview.net/forum?id=rXpTZ… Huge thanks to my mentors and co-workers for their guidance and support throughout this project. We also plan to release related work soon. Stay tuned! 😊

English

4.7K

Junsoo Ha retweetledi

Damek@damekdavis·3 Ara

New paper studies when spectral gradient methods (e.g., Muon) help in deep learning: 1. We identify a pervasive form of ill-conditioning in DL: post-activations matrices are low-stable rank. 2. We then explain why spectral methods can perform well despite this. Long thread

English

339

98.5K

Junsoo Ha retweetledi

Jingfeng Wu@uuujingfeng·2 Ara

Together with @yuxiangw_cs and Maryam Fazel, we are excited to present our tutorial "Theoretical Insights on Training Instability in Deep Learning" tomorrow at #NeurIPS2025! Link: uuujf.github.io/instability/ *picture generated by Gemini

English

155

18.3K

Junsoo Ha retweetledi

Pedro Domingos@pmddomingos·16 Kas

TL;DR: Programming solves problems in P, and AI in NP.

Andrej Karpathy@karpathy

Sharing an interesting recent conversation on AI's impact on the economy. AI has been compared to various historical precedents: electricity, industrial revolution, etc., I think the strongest analogy is that of AI as a new computing paradigm (Software 2.0) because both are fundamentally about the automation of digital information processing. If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you'd look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually). With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made). The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense). Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.

English

565

135.1K

Junsoo Ha retweetledi

Xinghan Li@XinghanLi66·6 Kas

Adam prefers a different minimizer than SGD (exemplified below), but how? 🤔 Our NeurIPS 2025 Paper: Based on our Slow SDE approximation of Adam, we show that under label noise Adam implicitly minimizes tr(Diag(H)^½), whereas prior works showed that SGD minimizes tr(H). 🧵1/n

GIF

English

24.4K

Junsoo Ha retweetledi

Igor Babuschkin@ibab·29 Eki

A common mistake that AI companies make nowadays is to not give their engineers enough time and mental calm to do their best work. Constant deadlines, pressure and distractions from daily AI news are poison for writing good code and systems that scale well. That’s why most AI APIs and products have reliability issues. A good company culture that mixes excellence with focus and enough rest leads to faster and better results. The best example of how to do it well is the early Google culture from 1998 which resulted in one of the largest scale and most reliable services on the web in just a few short years. Founders should copy some of the strategies that Larry and Sergey used. They are still underrated IMO despite their huge reputation.

English

136

194

3.1K

466.7K

Junsoo Ha retweetledi

Atli Kosson@AtliKosson·23 Eki

The Maximal Update Parameterization (µP) allows LR transfer from small to large models, saving costly tuning. But why is independent weight decay (IWD) essential for it to work? We find µP stabilizes early training (like an LR warmup), but IWD takes over in the long term! 🧵

English

337

77.2K

Junsoo Ha retweetledi

Dylan Foster 🐢@canondetortugas·24 Eki

Led by my amazing intern Fan Chen, with awesome team Audrey Huang (@auddery), Noah Golowich (@GolowichNoah), Sadhika Malladi (@SadhikaMalladi), Adam Block, Jordan Ash (@jordan_t_ash), and Akshay Krishnamurthy. Paper: arxiv.org/abs/2510.15020 Thread below.

English

2.1K

Junsoo Ha retweetledi

Ernest Ryu@ErnestRyu·22 Eki

I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)

English

347

2.2K

1.3M

Junsoo Ha retweetledi

Sham Kakade@ShamKakade6·19 Eki

1/6 Introducing Seesaw: a principled batch size scheduling algo. Seesaw achieves theoretically optimal serial run time given a fixed compute budget and also matches the performance of cosine annealing at fixed batch size.

English

247

42.4K

Junsoo Ha retweetledi

Aryeh Kontorovich@aryehazan·17 Eki

a very simple result but darn useful

English

316

23.8K

Junsoo Ha retweetledi

Konstantin Mishchenko@konstmish·15 Eki

Weight decay changes the training objective because the decay update can be conflicting the gradient update, so the equilibrium is no longer where the gradient is zero. This paper proposes a single-line edit that applies weight decay in a way that preserves the stationary points.

English

224

16.6K

Junsoo Ha retweetledi

Jingfeng Wu@uuujingfeng·26 Eyl

sharing a new paper w Peter Bartlett, @jasondeanlee, @ShamKakade6, Bin Yu ppl talking about implicit regularization, but how good is it? We show its surprisingly effective, that GD dominates ridge for all linear regression, w/ more cool stuff on GD vs SGD arxiv.org/abs/2509.17251

English

199

138.5K

Junsoo Ha retweetledi

Simons Foundation@SimonsFdn·18 Ağu

Our new Simons Collaboration on the Physics of Learning and Neural Computation will employ and develop powerful tools from #physics, #math, computer science and theoretical #neuroscience to understand how large neural networks learn, compute, scale, reason and imagine: simonsfoundation.org/2025/08/18/sim…

English

232

173.1K

Junsoo Ha@kuc2477·11 Ağu

@JhnSonny Congrats!

English

Jaehyeon Son@JaehyeonSon0·11 Ağu

📢 Life Update 🤖 I'll be joining Georgia Tech as a PhD student this fall, where I'll be focusing on embodied AI and robotics. I can't wait to begin this new chapter, become part of the vibrant community, and contribute to the field!

English

728

Junsoo Ha retweetledi

Statistics (Machine Learning) Papers@StatsPapers·4 Ağu

EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes. arxiv.org/abs/2508.00180

English

3.8K

Junsoo Ha@kuc2477·25 Tem

@miniapeur Not sure if this counts, but Nesterov acceleration. I know how to derive it, and I’m aware of different types of interpretations. But honestly I don’t think I understand why it should work at all in the first place.

English

Mathieu@miniapeur·24 Tem

What is one mathematical concept that took you longer than expected to understand?

English

194

463

77K

Keşfet

@YunseokJANG_kr @dayoon12161 @yuxiangw_cs @auddery @GolowichNoah @SadhikaMalladi @jordan_t_ash @jasondeanlee