Sangdoo Yun

340 posts

Sangdoo Yun

Sangdoo Yun

@oodgnas

Research director @ Naver AI Lab

Katılım Ocak 2010
199 Takip Edilen251 Takipçiler
Sangdoo Yun retweetledi
Junyoung Seo
Junyoung Seo@jyseo_cv·
What if a world model could render not an imagined place, but the actual city? We introduce Seoul World Model, the first world simulation model grounded in a real-world metropolis. TL;DR: We made a world model RAG over millions of street-views. proj: seoul-world-model.github.io
English
42
207
1.5K
169.8K
Sangdoo Yun retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
333
2.1K
13.5K
4.9M
Sangdoo Yun retweetledi
Ross Wightman
Ross Wightman@wightmanr·
Time flies. After almost 4 years at @huggingface , I’m moving on. A major part of that chapter was timm, which I sold to the company and continued to build. For anyone relying on it, I’ve agreed to collaborate on bug fixes and basic maintenance, but new feature development will likely cease. It was a meaningful chapter, and I’m thankful for the opportunity to grow timm over that time. AI is moving incredibly fast, and I’m excited to focus on new ideas and opportunities that feel like the right fit for this moment. There will be significant decisions for me ahead. I look forward to more of the serendipitous collaborations (e.g. OpenCLIP, ResNet Strikes Back, HTTY ViT) that I’ve enjoyed in the past. I’m currently working on a long overdue OpenCLIP refactoring that I hope will be useful for all and make it easier to add new model + objective combinations.
English
36
20
445
28.1K
Sangdoo Yun
Sangdoo Yun@oodgnas·
Validated on Qwen2.5/3 and Gemma3 families across long-context & reasoning tasks, showing its efficiency.
Sangdoo Yun tweet mediaSangdoo Yun tweet media
English
1
0
1
31
Sangdoo Yun
Sangdoo Yun@oodgnas·
We’re releasing *Fast KVzip*: Compress KV cache by up to 70% with negligible performance loss. The main idea is a lightweight "sink-attention" gating mechanism. During the forward pass, it decides which KV pairs are worth keeping.
Sangdoo Yun tweet media
English
1
0
2
43
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
no, it unfortunately was underrated. It came out at a time where there was an overload of useless adam alternatives and people were tired. I looked at it only because I knew that team was goated from several previous great works they did. It was between BiT and ViT, when we (our team) were struggling with exploding weights at scale, so I really wanted to try it and was thinking if we show its benefits at scale it may take off. But then I got distracted by finishing a few other projects first and never got around to it, and it faded away...
English
1
0
3
312
Kaiyue Wen
Kaiyue Wen@wen_kaiyue·
(1/n) Introducing Hyperball — an optimizer wrapper that keeps weight & update norm constant and lets you control the effective (angular) step size directly. Result: sustained speedups across scales + strong hyperparameter transfer.
Kaiyue Wen tweet media
English
27
118
685
195.5K
Sangdoo Yun retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Motif Technologies, a 🇰🇷 Korean AI lab, has just launched Motif-2-12.7B-Reasoning, a 12.7B open weights reasoning model that scores 45 on the Artificial Analysis Intelligence Index and is now the leading model from Korea Key benchmarking takeaways: ➤ Open weights: Motif-2-12.7B-Reasoning is open weights and is a relatively small model at 12.7B parameters. This marks a shift for the Korean model ecosystem, which has historically been more closed relative to Chinese open weights releases ➤ Strengths in Instruction Following and Competition Math: Motif-2-12.7B-Reasoning scores 57% on IFBench and 80% on AIME2025, comparable to Claude 4.5 Haiku in performance in these two benchmarks and highlighting an emerging strength in mathematical reasoning and agentic capabilities ➤ AI activity is accelerating in South Korea: Motif Technologies’ Motif-2-12.7B-Reasoning sets a new high in intelligence for Korean AI labs and is the latest in a string of notable 2025 releases. The model compares favorably with LG Research’s EXAONE 4.0 32B (Intelligence Score: 43) and Upstage’s Solar Pro 2 (Intelligence Score: 38). The country’s pace of innovation continues to quicken, supported by government incentives and a rapidly expanding AI ecosystem ➤ High token usage: The model used the most tokens to run our Artificial Analysis Intelligence Index evaluations at 200M tokens. This has implications for cost and latency See below for further analysis:
Artificial Analysis tweet media
English
10
36
186
54.2K
Christian Wolf (🦋🦋🦋)
Christian Wolf (🦋🦋🦋)@chriswolfvision·
In a new paper led by Gianluca Monaci, with @WeinzaepfelP and myself, we explore the relationship between rel pose estimation and image goal navigation and study diff. architectures: late fusion, channel cat, space2depth and cross-attention. arxiv.org/abs/2507.01667 🧵1/5
Christian Wolf (🦋🦋🦋) tweet media
English
2
2
23
10.9K
Sangdoo Yun retweetledi
Tommaso Green
Tommaso Green@tommasogreen·
Our #EMNLP2025 paper “Leaky Thoughts 🫗” uncovers that Large Reasoning Models (LRMs) can easily leak sensitive information hidden inside their “thoughts”. 📢 You can find our poster on Friday 7th at 10:30-12:00 in Hall C3! 📄 aclanthology.org/2025.emnlp-mai…
English
1
3
10
2.9K
Sangdoo Yun
Sangdoo Yun@oodgnas·
Hmm.. is this a good time to learn diffusion models? 😁
Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English
0
0
1
172
Sangdoo Yun
Sangdoo Yun@oodgnas·
@chanwoopark20 As always, the main problem is how to become an expert at *reviewing* stuff (text/code/image...) correctly. That would be more important question for future education.
English
1
0
1
98
Sangdoo Yun retweetledi
DailyPapers
DailyPapers@HuggingPapers·
RL makes MLLMs see better than SFT New research by NAVER AI Lab & KAIST shows that Reinforcement Learning fundamentally reshapes MLLMs' vision encoders. RL leads to stronger, precisely localized visual representations, boosting performance on vision-related tasks & even outperforming larger models!
GIF
English
4
15
64
6.8K
Sangdoo Yun
Sangdoo Yun@oodgnas·
Two popular post-training recipes for MLLMs--RL and SFT. What do they actually do for MLLMs? We reveal that: RL makes MLLMs *see* better than SFT.
Sangdoo Yun tweet media
English
1
3
15
693