Yongyi Yang (@YongyiYang7) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Yongyi Yang@YongyiYang7·12 Oca

DeepSeek's recent mHC (Manifold-Constrained Hyper-Connections) proposes to stabilize residual hyper-connections via 20 Sinkhorn-Knopp (SK) iterations. However, this approach requires heavily customized CUDA kernels and, does not guarantee the quality of stabilization due to approximation errors. Check our new paper: "mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations". Based on the Birkhoff-von Neumann theorem, we introduce a neat solution that can be realized with standard operators while guaranteeing exact stability.

English

1

4

20

2.1K

Yongyi Yang@YongyiYang7·27 Mar

@carnivalaki 还真是（

中文

0

1.4K

carnival@carnivalaki·27 Mar

ZXX

9

76

1.4K

32K

Yongyi Yang retweetledi

Jianyang Gao@gaoj0017·27 Mar

The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

98

969

6.5K

1M

Yongyi Yang retweetledi

MichiganAI@michigan_AI·25 Mar

Major milestone in Yongyi Yang's academic journey.🎓 TOMORROW Yongyi Yang will present a dissertation defense on "Structures in Deep Learning: Representations, Learning Dynamics, and Efficient Algorithms." MARCH 26 @ 3:00pm ET cse.engin.umich.edu/event/structur…

English

0

2

6

599

Yongyi Yang@YongyiYang7·5 Şub

@onimushi 我喜欢你🥰

中文

0

149

鬼虫兵庫 Hyogo Onimushi@Onimushi·4 Şub

サイズがッ！小さいんだがッ！？

日本語

23

584

6.6K

112.1K

Yongyi Yang retweetledi

Do2mi多多酱@Do2mi1·27 Oca

这可太爽了

日本語

576

1.4K

23.5K

8.7M

Yongyi Yang@YongyiYang7·12 Oca

📜Read the full paper: arxiv.org/abs/2601.05732 🥂 Cheers to my wonderful collaborator @gaoj0017, who has been working with me like crazy for the whole past week!

English

0

1

2

273

Yongyi Yang@YongyiYang7·12 Oca

Why persist in projecting unconstrained matrices onto the constrained convex set? The Birkhoff-von Neumann theorem offers a simple and robust parameterization of doubly stochastic matrices. By directly using this parameterization, we completely skip the SK iterations, avoid the hassle of specialized kernels, and ensure precise doubly stochasticity by construction. Experiments confirm that mHC-lite achieves the same (or better) stabilizing effect on training as mHC, while being significantly more efficient and easier to implement.

English

1

0

1

242

Yongyi Yang@YongyiYang7·12 Oca

DeepSeek's recent mHC (Manifold-Constrained Hyper-Connections) proposes to stabilize residual hyper-connections via 20 Sinkhorn-Knopp (SK) iterations. However, this approach requires heavily customized CUDA kernels and, does not guarantee the quality of stabilization due to approximation errors. Check our new paper: "mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations". Based on the Birkhoff-von Neumann theorem, we introduce a neat solution that can be realized with standard operators while guaranteeing exact stability.

English

1

4

20

2.1K

Yongyi Yang@YongyiYang7·10 Ara

📄 Read the full paper: arxiv.org/abs/2510.02670 Cheers to my amazing co-authors -- Tomaso Poggio, Isaac L. Chuang, and Liu Ziyin @LiuZiyin10 We hope this work helps build a deeper understanding of training, and sparks new bridges between topology, physics, and deep learning.

English

0

1

154

Yongyi Yang@YongyiYang7·10 Ara

🧩This framework is architecture- and optimizer-agnostic. As long as the model has a permutation invariance on neurons and the optimizer is gradient-based (e.g. SGD, Adam), the theory applies. This suggests topology as a universal tool for analyzing learning dynamics.

English

1

0

136

Yongyi Yang@YongyiYang7·10 Ara

check out our new paper "Topological Invariance and Breakdown in Learning." We prove that training exhibits a topological phase transition with a critical learning rate separating topology-preserving and topology-simplifying regimes.. 🧵👇

English

1

0

5

626

Yongyi Yang@YongyiYang7·18 Kas

@YiMaTweets Someone needs to initiate this process

English

0

109

Yi Ma@YiMaTweets·18 Kas

I conjecture that, in the foreseeable future, there would/should be only two forms of (academic) publication exist: first posting on arXiv and then writing an open-source book. All intermediate steps or forms can be or should be forwent.

English

6

100

10.2K

Yongyi Yang retweetledi

∀ugust@ModalMetamodel·9 Eki

Them hoes was tryna figure out if erry compact Hausdorff space wit at least two points and no isolated points got a cardinality of at least 2^ℵ₀. 😸😸😸

English

10

187

1.6K

55.5K

Yongyi Yang retweetledi

朝潮@ashashio·4 Ağu

To the end of the world

English

9

981

4.4K

81.6K

Yongyi Yang@YongyiYang7·30 Tem

(5/5) Read the full paper: arxiv.org/pdf/2507.13540 Joint work with @weihu_ @Hidenori8Tanaka

English

0

2

218

Yongyi Yang@YongyiYang7·30 Tem

(4/5) The key idea behind our theory is Double Convergence: hidden states cluster by token identity as context grows (context-wise convergence), and these clusters evolve across layers toward a low-frequency signal over the input's graph structure (layer-wise convergence).

English

1

0

2

273

Yongyi Yang@YongyiYang7·30 Tem

What drives in-context learning in LLMs? New paper: Provable Low-Frequency Bias of In-Context Learning of Representations. We show LLMs have a low-frequency bias when learning representations in context, offering a theoretical answer to several previously open questions. 🧵👇

English

1

8

28

5.7K

Yongyi Yang

Keşfet