Yongyi Yang

37 posts

Yongyi Yang

Yongyi Yang

@YongyiYang7

Katılım Ekim 2022
116 Takip Edilen86 Takipçiler
Sabitlenmiş Tweet
Yongyi Yang
Yongyi Yang@YongyiYang7·
DeepSeek's recent mHC (Manifold-Constrained Hyper-Connections) proposes to stabilize residual hyper-connections via 20 Sinkhorn-Knopp (SK) iterations. However, this approach requires heavily customized CUDA kernels and, does not guarantee the quality of stabilization due to approximation errors. Check our new paper: "mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations". Based on the Birkhoff-von Neumann theorem, we introduce a neat solution that can be realized with standard operators while guaranteeing exact stability.
Yongyi Yang tweet media
English
1
4
20
2.1K
carnival
carnival@carnivalaki·
carnival tweet media
ZXX
9
76
1.4K
32K
Yongyi Yang retweetledi
Jianyang Gao
Jianyang Gao@gaoj0017·
The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
98
969
6.5K
1M
Yongyi Yang retweetledi
MichiganAI
MichiganAI@michigan_AI·
Major milestone in Yongyi Yang's academic journey.🎓 TOMORROW Yongyi Yang will present a dissertation defense on "Structures in Deep Learning: Representations, Learning Dynamics, and Efficient Algorithms." MARCH 26 @ 3:00pm ET cse.engin.umich.edu/event/structur…
MichiganAI tweet media
English
0
2
6
599
Yongyi Yang retweetledi
Do2mi多多酱
Do2mi多多酱@Do2mi1·
这可太爽了
Do2mi多多酱 tweet media
日本語
576
1.4K
23.5K
8.7M
Yongyi Yang
Yongyi Yang@YongyiYang7·
Why persist in projecting unconstrained matrices onto the constrained convex set? The Birkhoff-von Neumann theorem offers a simple and robust parameterization of doubly stochastic matrices. By directly using this parameterization, we completely skip the SK iterations, avoid the hassle of specialized kernels, and ensure precise doubly stochasticity by construction. Experiments confirm that mHC-lite achieves the same (or better) stabilizing effect on training as mHC, while being significantly more efficient and easier to implement.
Yongyi Yang tweet media
English
1
0
1
242
Yongyi Yang
Yongyi Yang@YongyiYang7·
DeepSeek's recent mHC (Manifold-Constrained Hyper-Connections) proposes to stabilize residual hyper-connections via 20 Sinkhorn-Knopp (SK) iterations. However, this approach requires heavily customized CUDA kernels and, does not guarantee the quality of stabilization due to approximation errors. Check our new paper: "mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations". Based on the Birkhoff-von Neumann theorem, we introduce a neat solution that can be realized with standard operators while guaranteeing exact stability.
Yongyi Yang tweet media
English
1
4
20
2.1K
Yongyi Yang
Yongyi Yang@YongyiYang7·
📄 Read the full paper: arxiv.org/abs/2510.02670 Cheers to my amazing co-authors -- Tomaso Poggio, Isaac L. Chuang, and Liu Ziyin @LiuZiyin10 We hope this work helps build a deeper understanding of training, and sparks new bridges between topology, physics, and deep learning.
English
0
0
1
154
Yongyi Yang
Yongyi Yang@YongyiYang7·
🧩This framework is architecture- and optimizer-agnostic. As long as the model has a permutation invariance on neurons and the optimizer is gradient-based (e.g. SGD, Adam), the theory applies. This suggests topology as a universal tool for analyzing learning dynamics.
English
1
0
0
136
Yongyi Yang
Yongyi Yang@YongyiYang7·
check out our new paper "Topological Invariance and Breakdown in Learning." We prove that training exhibits a topological phase transition with a critical learning rate separating topology-preserving and topology-simplifying regimes.. 🧵👇
Yongyi Yang tweet media
English
1
0
5
626
Yi Ma
Yi Ma@YiMaTweets·
I conjecture that, in the foreseeable future, there would/should be only two forms of (academic) publication exist: first posting on arXiv and then writing an open-source book. All intermediate steps or forms can be or should be forwent.
English
6
6
100
10.2K
Yongyi Yang retweetledi
∀ugust
∀ugust@ModalMetamodel·
Them hoes was tryna figure out if erry compact Hausdorff space wit at least two points and no isolated points got a cardinality of at least 2^ℵ₀. 😸😸😸
∀ugust tweet media
English
10
187
1.6K
55.5K
Yongyi Yang retweetledi
朝潮
朝潮@ashashio·
To the end of the world
朝潮 tweet media
English
9
981
4.4K
81.6K
Yongyi Yang
Yongyi Yang@YongyiYang7·
(4/5) The key idea behind our theory is Double Convergence: hidden states cluster by token identity as context grows (context-wise convergence), and these clusters evolve across layers toward a low-frequency signal over the input's graph structure (layer-wise convergence).
Yongyi Yang tweet media
English
1
0
2
273
Yongyi Yang
Yongyi Yang@YongyiYang7·
What drives in-context learning in LLMs? New paper: Provable Low-Frequency Bias of In-Context Learning of Representations. We show LLMs have a low-frequency bias when learning representations in context, offering a theoretical answer to several previously open questions. 🧵👇
English
1
8
28
5.7K