Youngseog Chung

122 posts

Youngseog Chung

@YoungseogC

PhD student at @mldcmu, @AutonLab | Jazz enthusiast | Tennis player

Pittsburgh, PA Katılım Aralık 2020

649 Takip Edilen455 Takipçiler

Sabitlenmiş Tweet

Youngseog Chung@YoungseogC·4 Eyl

Soft MoE provides a unique take on mixture-of-experts by mixing tokens and mixing the expert outputs. Does this “soft” mixing cause implicit biases that affect representation power or expert specialization? What does expert specialization even mean for Soft MoE? 🧵(1/N)

English

3.7K

Youngseog Chung retweetledi

Paul Liang@pliang279·11 Nis

The attention mechanism in Transformers just got a major upgrade 🧠 📄 arxiv.org/abs/2602.21371 Introducing Interleaved Head Attention - instead of keeping your attention heads independent, share information between attention heads for improved reasoning and compositionality. Pseudo-queries, pseudo-keys, and pseudo-values as learned as linear combinations of the original heads, so interaction across heads happens before attention is computed. IHA stays compatible with efficient implementations like FlashAttention ⚡ Significant improvements on reasoning and long-context tasks: 📊 +5.8% on GSM8K (Maj@16) 📊 +2.8% on MATH-500 (Maj@16) 📊 +27% at 4k on RULER 📊 +32% at 8k 📊 +112% at 16k Theoretical results show that IHA strictly generalizes multi-head attention.

Chanakya Ekbote@thecekbote

🧵 [1/14]: Talking-Heads Attention by @NoamShazeer et al. showed something interesting: maybe attention heads shouldn’t be fully isolated. 🧠 That got us thinking: If communication across heads matters, what is the right way for heads to communicate, especially from a one-layer reasoning perspective? 🔗⚙️ That question led us to Interleaved Head Attention (IHA) ✨ 📄 Paper link: arxiv.org/pdf/2602.21371

English

375

49.8K

Youngseog Chung retweetledi

Satya Nadella@satyanadella·2 Nis

We’re bringing our growing MAI model family to every developer in Foundry, including … · MAI-Transcribe-1, most accurate transcription model in world across 25 languages · MAI-Voice-1, natural, expressive speech generation · MAI-Image-2, our most capable image model yet Start building: microsoft.ai/news/today-wer…

GIF

English

220

286

1.8K

304.7K

Youngseog Chung retweetledi

Satya Nadella@satyanadella·30 Mar

Introducing Critique, a new multi-model deep research system in M365 Copilot. You can use multiple models together to generate optimal responses and reports.

English

429

514

4.2K

1.4M

Youngseog Chung retweetledi

SK@Djoko_UTD·23 Mar

This Danil Medvedev shot 😭😭🤣

Dansk

117

1.8K

87.8K

Youngseog Chung retweetledi

Satya Nadella@satyanadella·19 Mar

Great to see our new image model from our Superintelligence team rolling out in Copilot and coming soon to Foundry for enterprise customers.

Mustafa Suleyman@mustafasuleyman

Our new image generator MAI-Image-2 is out! Available now on MAI Playground for everything from lifelike realism to detailed infographics. Our team has been pushing immensely hard for this release, and we are now among the top models out there: #3 family on @arena. Check out the details in our blog: microsoft.ai/news/introduci… It's shipping soon in Copilot and Bing Image Creator, as well as Microsoft Foundry. Really proud of our progress on models and products - stay tuned for new releases and come join us on our Superintelligence mission!

English

177

811

154.5K

Youngseog Chung retweetledi

Mustafa Suleyman@mustafasuleyman·19 Mar

English

156

124

598

239.7K

Youngseog Chung retweetledi

Microsoft AI@MicrosoftAI·19 Mar

Meet MAI‑Image‑2. Built with creatives, for real creative work. Ranked #5 on @arena’s text‑to‑image leaderboard. Available now: msft.it/6014QUCBe

English

123

847

115.2K

Youngseog Chung@YoungseogC·20 Mar

@minchoi I already told you, Pareto front buddy, where outward on all axes is considered good

English

Min Choi@minchoi·20 Mar

@YoungseogC because you were focused on the red circle, while you missed the y-axis

English

128

Min Choi@minchoi·20 Mar

Hi... I'd like to report a chart crime...

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

250

44.9K

Youngseog Chung@YoungseogC·20 Mar

@minchoi Wha exactly is the crime and how is it bad or misleading?

English

116

Min Choi@minchoi·20 Mar

@YoungseogC no

1.3K

Youngseog Chung retweetledi

Zhikai Zhang@Zhikai273·15 Mar

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralR…

English

162

637

4.1K

1.4M

Youngseog Chung retweetledi

Sebastian Raschka@rasbt·15 Mar

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

202

1.5K

8.2K

725.3K

Youngseog Chung retweetledi

YixuanEvenXu@YixuanEvenXu·25 Şub

Recent debates highlight a key issue: how do you actually prove distillation? If you want to claim a model was distilled from your outputs, scientifically and with rigorous statistical guarantees, you should consider Antidistillation Fingerprinting (ADFP). 👇

YixuanEvenXu@YixuanEvenXu

🧬 Distillation enables efficient emulation of LLMs, but verifying provenance remains a critical challenge. Introducing Antidistillation Fingerprinting (ADFP): A principled approach that aligns signals with student learning dynamics. 👇 (1/6)

English

2.1K

Youngseog Chung retweetledi

Asher Trockman@ashertrockman·23 Şub

I'm hiring a student researcher to work on RL and RLM-flavored things. DM me if interested

Aman@Amank1412

Google Student Researcher Program 2026 is now OPEN! Work on REAL AI/ML projects with: • Google Research • DeepMind • Google Cloud Open to: Bachelors / Masters / PhD Duration: 3–12 months Deadline: March 31 If you're serious about AI, this is your shot. Apply here google.com/about/careers/…

English

565

126.1K

Youngseog Chung retweetledi

Rishub Jain@shubadubadub·30 Ara

🚀 1) Apply here to work with me (Feb-May 2026): sparai.org/projects/sp26/… !! After a successful Fall ‘25 round with SPAR (paper soon), we’re continuing to improve evaluation quality via Human-AI collaboration this Spring! And we're scaling up from 13->20 mentees. The pitch 👇

English

156

Youngseog Chung retweetledi

Stephanie Milani@steph_milani·8 Kas

📣 Honored to be selected as Honorable Mention for the @SCSatCMU Distinguished Dissertation Award!! Thanks to my advisor @fangf07 & committee Geoff Gordon, @hongshenus, @katjahofmann, & @OriolVinyalsML (+ other mentors and collaborators) for their support 🖤 & congrats to Juncheng, Tim, and Brian 🎉

English

113

11.5K

Youngseog Chung retweetledi

Mark Chen@markchen90·16 Eki

Excited to start OpenAI for Physics w/ @ALupsasca @kevinweil @aleks_madry and @merettm! I sat with @ALupsasca when GPT-5 reproduced his latest research paper, and we both felt parallels to watching AlphaGo play move 37. It's nearly impossible to be a world class chess player without studying AI engines today. Soon, I believe the same will be true for academic research.

Alex Lupsasca@ALupsasca

After GPT-5 Pro launched, I gave it that same problem. To my utter shock, it rediscovered the result in <30min! See for yourself: chatgpt.com/share/68b006eb… It’s not flawless (it needs priming on the flat-space case before tackling the full problem) but the leap is incredible.

English

1.1K

191.7K

Youngseog Chung retweetledi

Andrej Karpathy@karpathy·13 Eki

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

English

690

3.4K

24.2K

5.8M

Youngseog Chung retweetledi

Mustafa Suleyman@mustafasuleyman·13 Eki

Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. microsoft.ai/news/introduci…

English

506

146.8K

Youngseog Chung retweetledi

Dylan Sam@dylanjsam·16 Eyl

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

English

357

62.5K

Keşfet

@arena @minchoi @SCSatCMU @fangf07 @hongshenus @katjahofmann @OriolVinyalsML @ALupsasca