J. Lyle Kim

192 posts

J. Lyle Kim

J. Lyle Kim

@jlylekim

Quantum computing research scientist @jpmorgan | PhD @RiceCompSci | BA @UChicago | Optimization, quantum computing, machine learning | 🇰🇷

New York, USA Katılım Kasım 2020
333 Takip Edilen246 Takipçiler
Alex Shtoff
Alex Shtoff@AlexShtf·
@jlylekim Is there any optimizer work that is not about "optimizing for language models"? :)
English
2
0
1
199
J. Lyle Kim
J. Lyle Kim@jlylekim·
🚨New paper: Anytime Training with Schedule-Free Spectral Optimization🚨 We introduce SF-NorMuon, a schedule-free spectral method that outperforms or matches heavily tuned AdamW across 125M and 772M parameter language models.
J. Lyle Kim tweet media
English
8
21
126
14.1K
J. Lyle Kim
J. Lyle Kim@jlylekim·
Joint work with @anujsapte, Pranav Deshpande, Niraj Kumar, and Shouvanik Chakrabarti at Global Technology Applied Research, JPMorganChase.
English
0
1
5
453
J. Lyle Kim
J. Lyle Kim@jlylekim·
The takeaway? High-quality checkpoints at any point during the run without committing to a fixed training horizon upfront. Read the full paper here: arxiv.org/abs/2605.23061
English
1
1
8
613
J. Lyle Kim retweetledi
Aaron Defazio
Aaron Defazio@aaron_defazio·
🚨 New Paper 🚨 ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models A few modifications to Schedule-Free Learning make it completely LR tuning free, and allow it to greatly outperform schedules for long duration training! arxiv.org/abs/2605.19095…
Aaron Defazio tweet media
English
7
56
416
82K
J. Lyle Kim retweetledi
OptimaLab
OptimaLab@optimalab1·
During neural network training, the loss landscape gets sharper until it hits a ceiling. GD pins right at the ceiling. SGD settles below it — and the gap grows as you shrink the batch. Why? We now have the answer. arxiv.org/abs/2604.21016 🧵 Blog: akyrillidis.github.io/aiowls/stochas…
OptimaLab tweet media
English
7
61
414
36.5K
Francesco Orabona
Francesco Orabona@bremen79·
Us: We solve 1 open problem in FamousPaper'18 Reviewer: IrrelevantPaper'21 solved it! Us: Nope, IrrelevantPaper'21 is a different problem Reviewer: But other open problems in FamousPaper'18 were solved, this means the one you solved is useless Us: ... What would you do? #ICML26
English
5
0
35
15.9K
J. Lyle Kim retweetledi
The Big Karuna
The Big Karuna@Nondual8·
Hi Madison, I understand this well. I’m 66 years old. My first computer was a DEC PDP-8 with an ASR 33 teletype as the input output device. For most of my career I’ve written embedded software for medical devices – clinical ultrasound systems, and defibrillators. It’s taking me about a year to adjust to the double whammy of being somewhat, but not entirely replaceable by AI and also generally being judged as too old and used up. In fact, neither of those characteristics or attributes defined who I am. I will say that it’s going to be interesting to see how this unfolds. You and I just happened to be members of the field where this effect is felt first. Yet I think it will sweep all knowledge work. Well, it’s taken some time. I’m at peace. Life just unfolds. And I would urge you to not identify too strongly with what you think of as you. You are more than a software engineer and software engineering does not define you at all. Easier said than done, perhaps. But I would urge you to just sit with the uncomfortable feelings if you have them. And when you feel like it invest a little bit of thought in the question, how can I be useful to others around me? That is where I am spending my time. Wishing you peace of mind and ease in the world.
English
16
16
552
53.6K
J. Lyle Kim retweetledi
Elon Musk
Elon Musk@elonmusk·
For quality of life, it is better to err on the side of being an optimist and wrong, rather than a pessimist and right
English
12.2K
30K
251.5K
48.8M
J. Lyle Kim
J. Lyle Kim@jlylekim·
How does SuperGrok perform for math-ish research? I see a lot of comparison between other models but rarely see any tweets about SuperGrok for research
English
0
0
0
61
Deep
Deep@rather_deep·
@deviparikh No, math is deterministic, writing is art. Only if you knew how thoughts shape language & language shapes thoughts you'd understand it. Each choice you make while writing is a decision that shows your view, sometimes accurately, disregarding it means you let go of fine control
English
1
0
2
117
Devi Parikh
Devi Parikh@deviparikh·
Just because an AI wrote it, doesn’t mean I don’t mean it. Just because I used AI, doesn’t mean I don’t care. It’s like saying, “If you really cared about this contract, you’d do the budget math by hand.” Me being efficient with my time doesn’t mean I respect yours less. All that matters is — was it good? The problem with AI slop isn’t the AI. It’s the slop. If it took you 15 minutes to read and it was bad, I wasted your time. Regardless of whether it took me 5 minutes or 3 hours to write, with or without AI.
English
19
16
175
16.7K
J. Lyle Kim
J. Lyle Kim@jlylekim·
@deviparikh Isn't the analogy more like "If you really cared about this contract, you'd actually read it instead of asking AI to summarize "? I presume you don't write everything with AI. Why not?
English
1
0
2
207
dan
dan@irl_danB·
Anthropic is a billion-dollar company????? a billion???????? dollars??????????????? for literally... matrix. multiplication.
English
95
35
2.7K
183.7K
J. Lyle Kim retweetledi
alz
alz@alz_zyd_·
as a UChicago alum, one of my great life regrets is I took a real SOSC (power) but a fake HUM (language and the human). UChi kids: take real SOSC and HUM classes! You're at one of the few places in America where you can still get a good classical education, don't miss your chance
English
9
11
219
11.7K