Junsoo Ha

51 posts

Junsoo Ha

Junsoo Ha

@kuc2477

A graduate student @SNUVL; ML + Game Theory

대한민국 서울 Katılım Aralık 2015
807 Takip Edilen156 Takipçiler
Junsoo Ha retweetledi
Tengyu Ma
Tengyu Ma@tengyuma·
To solve hard open math problems, we need AI models to train and self-improve indefinitely without more external data. Humans can self-improve, so AI should as well if it imitates humans. So we let AI also conjecture, prove, and also be self-guided with some tastes.
Luke Bailey@LukeBailey181

Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass@4 of a model 100x bigger.

English
7
48
349
41.2K
Dayoon Ko
Dayoon Ko@dayoon12161·
🎉 Our paper has been accepted to #ICLR2026! 😆💖 This work was done during my internship at LG AI Research – Superintelligence Lab. As summarized in the project: Deep research requires broad evidence coverage and reliable synthesis. HybridDeepSearcher achieves both by parallel retrieval for breadth with sequential reasoning for depth, supporting scalable search. 🔗 Project page: hybriddeepsearcher.github.io 📄 OpenReview: openreview.net/forum?id=rXpTZ… Huge thanks to my mentors and co-workers for their guidance and support throughout this project. We also plan to release related work soon. Stay tuned! 😊
English
4
5
59
4.7K
Junsoo Ha retweetledi
Damek
Damek@damekdavis·
New paper studies when spectral gradient methods (e.g., Muon) help in deep learning: 1. We identify a pervasive form of ill-conditioning in DL: post-activations matrices are low-stable rank. 2. We then explain why spectral methods can perform well despite this. Long thread
Damek tweet media
English
11
68
339
98.5K
Junsoo Ha retweetledi
Pedro Domingos
Pedro Domingos@pmddomingos·
TL;DR: Programming solves problems in P, and AI in NP.
Andrej Karpathy@karpathy

Sharing an interesting recent conversation on AI's impact on the economy. AI has been compared to various historical precedents: electricity, industrial revolution, etc., I think the strongest analogy is that of AI as a new computing paradigm (Software 2.0) because both are fundamentally about the automation of digital information processing. If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you'd look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually). With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made). The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense). Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.

English
33
34
565
135.1K
Junsoo Ha retweetledi
Xinghan Li
Xinghan Li@XinghanLi66·
Adam prefers a different minimizer than SGD (exemplified below), but how? 🤔 Our NeurIPS 2025 Paper: Based on our Slow SDE approximation of Adam, we show that under label noise Adam implicitly minimizes tr(Diag(H)^½), whereas prior works showed that SGD minimizes tr(H). 🧵1/n
GIF
English
2
5
46
24.4K
Junsoo Ha retweetledi
Igor Babuschkin
Igor Babuschkin@ibab·
A common mistake that AI companies make nowadays is to not give their engineers enough time and mental calm to do their best work. Constant deadlines, pressure and distractions from daily AI news are poison for writing good code and systems that scale well. That’s why most AI APIs and products have reliability issues. A good company culture that mixes excellence with focus and enough rest leads to faster and better results. The best example of how to do it well is the early Google culture from 1998 which resulted in one of the largest scale and most reliable services on the web in just a few short years. Founders should copy some of the strategies that Larry and Sergey used. They are still underrated IMO despite their huge reputation.
English
136
194
3.1K
466.7K
Junsoo Ha retweetledi
Atli Kosson
Atli Kosson@AtliKosson·
The Maximal Update Parameterization (µP) allows LR transfer from small to large models, saving costly tuning. But why is independent weight decay (IWD) essential for it to work? We find µP stabilizes early training (like an LR warmup), but IWD takes over in the long term! 🧵
Atli Kosson tweet media
English
12
51
337
77.2K
Junsoo Ha retweetledi
Ernest Ryu
Ernest Ryu@ErnestRyu·
I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)
English
85
347
2.2K
1.3M
Junsoo Ha retweetledi
Sham Kakade
Sham Kakade@ShamKakade6·
1/6 Introducing Seesaw: a principled batch size scheduling algo. Seesaw achieves theoretically optimal serial run time given a fixed compute budget and also matches the performance of cosine annealing at fixed batch size.
Sham Kakade tweet media
English
2
33
247
42.4K
Junsoo Ha retweetledi
Aryeh Kontorovich
Aryeh Kontorovich@aryehazan·
a very simple result but darn useful
Aryeh Kontorovich tweet media
English
7
19
316
23.8K
Junsoo Ha retweetledi
Konstantin Mishchenko
Konstantin Mishchenko@konstmish·
Weight decay changes the training objective because the decay update can be conflicting the gradient update, so the equilibrium is no longer where the gradient is zero. This paper proposes a single-line edit that applies weight decay in a way that preserves the stationary points.
Konstantin Mishchenko tweet media
English
6
22
224
16.6K
Junsoo Ha retweetledi
Jingfeng Wu
Jingfeng Wu@uuujingfeng·
sharing a new paper w Peter Bartlett, @jasondeanlee, @ShamKakade6, Bin Yu ppl talking about implicit regularization, but how good is it? We show its surprisingly effective, that GD dominates ridge for all linear regression, w/ more cool stuff on GD vs SGD arxiv.org/abs/2509.17251
Jingfeng Wu tweet media
English
11
34
199
138.5K
Jaehyeon Son
Jaehyeon Son@JaehyeonSon0·
📢 Life Update 🤖 I'll be joining Georgia Tech as a PhD student this fall, where I'll be focusing on embodied AI and robotics. I can't wait to begin this new chapter, become part of the vibrant community, and contribute to the field!
English
1
0
20
728
Junsoo Ha
Junsoo Ha@kuc2477·
@miniapeur Not sure if this counts, but Nesterov acceleration. I know how to derive it, and I’m aware of different types of interpretations. But honestly I don’t think I understand why it should work at all in the first place.
English
0
0
1
40
Mathieu
Mathieu@miniapeur·
What is one mathematical concept that took you longer than expected to understand?
English
194
20
463
77K