Sangdoo Yun

340 posts

Sangdoo Yun

@oodgnas

Research director @ Naver AI Lab

Katılım Ocak 2010

199 Takip Edilen251 Takipçiler

Sangdoo Yun retweetledi

AK@_akhaliq·17 Mar

Grounding World Simulation Models in a Real-World Metropolis paper: huggingface.co/papers/2603.15…

English

17.1K

Sangdoo Yun@oodgnas·17 Mar

IMO, this seems to be well aligned with why model merging works.

Yulu Gan@yule_gan

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu

English

2.1K

Sangdoo Yun retweetledi

Junyoung Seo@jyseo_cv·17 Mar

What if a world model could render not an imagined place, but the actual city? We introduce Seoul World Model, the first world simulation model grounded in a real-world metropolis. TL;DR: We made a world model RAG over millions of street-views. proj: seoul-world-model.github.io

English

207

1.5K

169.8K

Sangdoo Yun retweetledi

Kimi.ai@Kimi_Moonshot·16 Mar

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

333

2.1K

13.5K

4.9M

Sangdoo Yun retweetledi

Ross Wightman@wightmanr·6 Mar

Time flies. After almost 4 years at @huggingface , I’m moving on. A major part of that chapter was timm, which I sold to the company and continued to build. For anyone relying on it, I’ve agreed to collaborate on bug fixes and basic maintenance, but new feature development will likely cease. It was a meaningful chapter, and I’m thankful for the opportunity to grow timm over that time. AI is moving incredibly fast, and I’m excited to focus on new ideas and opportunities that feel like the right fit for this moment. There will be significant decisions for me ahead. I look forward to more of the serendipitous collaborations (e.g. OpenCLIP, ResNet Strikes Back, HTTY ViT) that I’ve enjoyed in the past. I’m currently working on a long overdue OpenCLIP refactoring that I hope will be useful for all and make it easier to add new model + objective combinations.

English

445

28.1K

Sangdoo Yun@oodgnas·30 Oca

Paper: arxiv.org/abs/2601.17668 Code: github.com/Janghyun1230/F… Project page: janghyun1230.github.io/fastkvzip/ @JangHyun_k, @karusun1, Sangdoo Yun

Filipino

Sangdoo Yun@oodgnas·30 Oca

Validated on Qwen2.5/3 and Gemma3 families across long-context & reasoning tasks, showing its efficiency.

English

Sangdoo Yun@oodgnas·30 Oca

We’re releasing *Fast KVzip*: Compress KV cache by up to 70% with negligible performance loss. The main idea is a lightweight "sink-attention" gating mechanism. During the forward pass, it decides which KV pairs are worth keeping.

English

Sangdoo Yun@oodgnas·22 Oca

@giffmana @jeremyphoward @wen_kaiyue @SanghyukChun @coallaoh good old days... but we are making a new optimizer one as well haha

English

Lucas Beyer (bl16)@giffmana·22 Oca

no, it unfortunately was underrated. It came out at a time where there was an overload of useless adam alternatives and people were tired. I looked at it only because I knew that team was goated from several previous great works they did. It was between BiT and ViT, when we (our team) were struggling with exploding weights at scale, so I really wanted to try it and was thinking if we show its benefits at scale it may take off. But then I got distracted by finishing a few other projects first and never got around to it, and it faded away...

English

312

Kaiyue Wen@wen_kaiyue·21 Oca

(1/n) Introducing Hyperball — an optimizer wrapper that keeps weight & update norm constant and lets you control the effective (angular) step size directly. Result: sustained speedups across scales + strong hyperparameter transfer.

English

118

685

195.5K

Sangdoo Yun@oodgnas·22 Oca

@giffmana @wen_kaiyue @SanghyukChun @coallaoh thanks for remembering our burried paper, lucas :)

English

117

Lucas Beyer (bl16)@giffmana·22 Oca

@wen_kaiyue isn't this pretty much the same as this paper from a few years ago? @SanghyukChun @coallaoh etal

English

6.9K

Sangdoo Yun retweetledi

Artificial Analysis@ArtificialAnlys·10 Ara

Motif Technologies, a 🇰🇷 Korean AI lab, has just launched Motif-2-12.7B-Reasoning, a 12.7B open weights reasoning model that scores 45 on the Artificial Analysis Intelligence Index and is now the leading model from Korea Key benchmarking takeaways: ➤ Open weights: Motif-2-12.7B-Reasoning is open weights and is a relatively small model at 12.7B parameters. This marks a shift for the Korean model ecosystem, which has historically been more closed relative to Chinese open weights releases ➤ Strengths in Instruction Following and Competition Math: Motif-2-12.7B-Reasoning scores 57% on IFBench and 80% on AIME2025, comparable to Claude 4.5 Haiku in performance in these two benchmarks and highlighting an emerging strength in mathematical reasoning and agentic capabilities ➤ AI activity is accelerating in South Korea: Motif Technologies’ Motif-2-12.7B-Reasoning sets a new high in intelligence for Korean AI labs and is the latest in a string of notable 2025 releases. The model compares favorably with LG Research’s EXAONE 4.0 32B (Intelligence Score: 43) and Upstage’s Solar Pro 2 (Intelligence Score: 38). The country’s pace of innovation continues to quicken, supported by government incentives and a rapidly expanding AI ecosystem ➤ High token usage: The model used the most tokens to run our Artificial Analysis Intelligence Index evaluations at 200M tokens. This has implications for cost and latency See below for further analysis:

English

186

54.2K

Sangdoo Yun@oodgnas·7 Kas

@chriswolfvision @WeinzaepfelP Beautiful figure :)

English

Christian Wolf (🦋🦋🦋)@chriswolfvision·4 Tem

In a new paper led by Gianluca Monaci, with @WeinzaepfelP and myself, we explore the relationship between rel pose estimation and image goal navigation and study diff. architectures: late fusion, channel cat, space2depth and cross-attention. arxiv.org/abs/2507.01667 🧵1/5

English

10.9K

Sangdoo Yun retweetledi

Tommaso Green@tommasogreen·4 Kas

Our #EMNLP2025 paper “Leaky Thoughts 🫗” uncovers that Large Reasoning Models (LRMs) can easily leak sensitive information hidden inside their “thoughts”. 📢 You can find our poster on Friday 7th at 10:30-12:00 in Hall C3! 📄 aclanthology.org/2025.emnlp-mai…

English

2.9K

Sangdoo Yun@oodgnas·29 Eki

Hmm.. is this a good time to learn diffusion models? 😁

Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English

172

Sangdoo Yun@oodgnas·27 Eki

@chanwoopark20 As always, the main problem is how to become an expert at *reviewing* stuff (text/code/image...) correctly. That would be more important question for future education.

English

Chanwoo Park@chanwoopark20·26 Eki

(1/4) An example from Ernest highlights an essential point. a) GPT or AI can undoubtedly accelerate scientific discovery — people already know this fact But more importantly, b) domain knowledge still matters. You must know what the right research question is

Ernest Ryu@ErnestRyu

I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)

English

7.3K

Sangdoo Yun retweetledi

DailyPapers@HuggingPapers·26 Eki

RL makes MLLMs see better than SFT New research by NAVER AI Lab & KAIST shows that Reinforcement Learning fundamentally reshapes MLLMs' vision encoders. RL leads to stronger, precisely localized visual representations, boosting performance on vision-related tasks & even outperforming larger models!