Youngseog Chung

122 posts

Youngseog Chung

Youngseog Chung

@YoungseogC

PhD student at @mldcmu, @AutonLab | Jazz enthusiast | Tennis player

Pittsburgh, PA Katılım Aralık 2020
649 Takip Edilen455 Takipçiler
Sabitlenmiş Tweet
Youngseog Chung
Youngseog Chung@YoungseogC·
Soft MoE provides a unique take on mixture-of-experts by mixing tokens and mixing the expert outputs. Does this “soft” mixing cause implicit biases that affect representation power or expert specialization? What does expert specialization even mean for Soft MoE? 🧵(1/N)
Youngseog Chung tweet media
English
1
10
25
3.7K
Youngseog Chung retweetledi
Paul Liang
Paul Liang@pliang279·
The attention mechanism in Transformers just got a major upgrade 🧠 📄 arxiv.org/abs/2602.21371 Introducing Interleaved Head Attention - instead of keeping your attention heads independent, share information between attention heads for improved reasoning and compositionality. Pseudo-queries, pseudo-keys, and pseudo-values as learned as linear combinations of the original heads, so interaction across heads happens before attention is computed. IHA stays compatible with efficient implementations like FlashAttention ⚡ Significant improvements on reasoning and long-context tasks: 📊 +5.8% on GSM8K (Maj@16) 📊 +2.8% on MATH-500 (Maj@16) 📊 +27% at 4k on RULER 📊 +32% at 8k 📊 +112% at 16k Theoretical results show that IHA strictly generalizes multi-head attention.
Chanakya Ekbote@thecekbote

🧵 [1/14]: Talking-Heads Attention by @NoamShazeer et al. showed something interesting: maybe attention heads shouldn’t be fully isolated. 🧠 That got us thinking: If communication across heads matters, what is the right way for heads to communicate, especially from a one-layer reasoning perspective? 🔗⚙️ That question led us to Interleaved Head Attention (IHA) ✨ 📄 Paper link: arxiv.org/pdf/2602.21371

English
7
48
375
49.8K
Youngseog Chung retweetledi
Satya Nadella
Satya Nadella@satyanadella·
We’re bringing our growing MAI model family to every developer in Foundry, including … · MAI-Transcribe-1, most accurate transcription model in world across 25 languages · MAI-Voice-1, natural, expressive speech generation · MAI-Image-2, our most capable image model yet Start building: microsoft.ai/news/today-wer…
GIF
English
220
286
1.8K
304.7K
Youngseog Chung retweetledi
Satya Nadella
Satya Nadella@satyanadella·
Introducing Critique, a new multi-model deep research system in M365 Copilot. You can use multiple models together to generate optimal responses and reports.
English
429
514
4.2K
1.4M
Youngseog Chung retweetledi
SK
SK@Djoko_UTD·
This Danil Medvedev shot 😭😭🤣
Dansk
23
117
1.8K
87.8K
Youngseog Chung retweetledi
Satya Nadella
Satya Nadella@satyanadella·
Great to see our new image model from our Superintelligence team rolling out in Copilot and coming soon to Foundry for enterprise customers.
Mustafa Suleyman@mustafasuleyman

Our new image generator MAI-Image-2 is out! Available now on MAI Playground for everything from lifelike realism to detailed infographics. Our team has been pushing immensely hard for this release, and we are now among the top models out there: #3 family on @arena. Check out the details in our blog: microsoft.ai/news/introduci… It's shipping soon in Copilot and Bing Image Creator, as well as Microsoft Foundry. Really proud of our progress on models and products - stay tuned for new releases and come join us on our Superintelligence mission!

English
177
97
811
154.5K
Youngseog Chung retweetledi
Mustafa Suleyman
Mustafa Suleyman@mustafasuleyman·
Our new image generator MAI-Image-2 is out! Available now on MAI Playground for everything from lifelike realism to detailed infographics. Our team has been pushing immensely hard for this release, and we are now among the top models out there: #3 family on @arena. Check out the details in our blog: microsoft.ai/news/introduci… It's shipping soon in Copilot and Bing Image Creator, as well as Microsoft Foundry. Really proud of our progress on models and products - stay tuned for new releases and come join us on our Superintelligence mission!
Mustafa Suleyman tweet mediaMustafa Suleyman tweet mediaMustafa Suleyman tweet mediaMustafa Suleyman tweet media
English
156
124
598
239.7K
Youngseog Chung retweetledi
Microsoft AI
Microsoft AI@MicrosoftAI·
Meet MAI‑Image‑2. Built with creatives, for real creative work. Ranked #5 on @arena’s text‑to‑image leaderboard. Available now: msft.it/6014QUCBe
English
61
123
847
115.2K
Youngseog Chung
Youngseog Chung@YoungseogC·
@minchoi I already told you, Pareto front buddy, where outward on all axes is considered good
English
1
0
2
96
Min Choi
Min Choi@minchoi·
@YoungseogC because you were focused on the red circle, while you missed the y-axis
English
1
0
0
128
Youngseog Chung
Youngseog Chung@YoungseogC·
@minchoi Wha exactly is the crime and how is it bad or misleading?
English
1
0
1
116
Youngseog Chung retweetledi
Zhikai Zhang
Zhikai Zhang@Zhikai273·
🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralR…
English
162
637
4.1K
1.4M
Youngseog Chung retweetledi
YixuanEvenXu
YixuanEvenXu@YixuanEvenXu·
Recent debates highlight a key issue: how do you actually prove distillation? If you want to claim a model was distilled from your outputs, scientifically and with rigorous statistical guarantees, you should consider Antidistillation Fingerprinting (ADFP). 👇
YixuanEvenXu@YixuanEvenXu

🧬 Distillation enables efficient emulation of LLMs, but verifying provenance remains a critical challenge. Introducing Antidistillation Fingerprinting (ADFP): A principled approach that aligns signals with student learning dynamics. 👇 (1/6)

English
0
3
10
2.1K
Youngseog Chung retweetledi
Rishub Jain
Rishub Jain@shubadubadub·
🚀 1) Apply here to work with me (Feb-May 2026): sparai.org/projects/sp26/… !! After a successful Fall ‘25 round with SPAR (paper soon), we’re continuing to improve evaluation quality via Human-AI collaboration this Spring! And we're scaling up from 13->20 mentees. The pitch 👇
English
1
1
2
156
Youngseog Chung retweetledi
Stephanie Milani
Stephanie Milani@steph_milani·
📣 Honored to be selected as Honorable Mention for the @SCSatCMU Distinguished Dissertation Award!! Thanks to my advisor @fangf07 & committee Geoff Gordon, @hongshenus, @katjahofmann, & @OriolVinyalsML (+ other mentors and collaborators) for their support 🖤 & congrats to Juncheng, Tim, and Brian 🎉
Stephanie Milani tweet media
English
9
5
113
11.5K
Youngseog Chung retweetledi
Mark Chen
Mark Chen@markchen90·
Excited to start OpenAI for Physics w/ @ALupsasca @kevinweil @aleks_madry and @merettm! I sat with @ALupsasca when GPT-5 reproduced his latest research paper, and we both felt parallels to watching AlphaGo play move 37. It's nearly impossible to be a world class chess player without studying AI engines today. Soon, I believe the same will be true for academic research.
Alex Lupsasca@ALupsasca

After GPT-5 Pro launched, I gave it that same problem. To my utter shock, it rediscovered the result in <30min! See for yourself: chatgpt.com/share/68b006eb… It’s not flawless (it needs priming on the flat-space case before tackling the full problem) but the leap is incredible.

English
29
96
1.1K
191.7K
Youngseog Chung retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrej Karpathy tweet media
English
690
3.4K
24.2K
5.8M
Youngseog Chung retweetledi
Mustafa Suleyman
Mustafa Suleyman@mustafasuleyman·
Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. microsoft.ai/news/introduci…
Mustafa Suleyman tweet mediaMustafa Suleyman tweet media
English
36
78
506
146.8K
Youngseog Chung retweetledi
Dylan Sam
Dylan Sam@dylanjsam·
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
Dylan Sam tweet media
English
8
90
357
62.5K