Morris Yau

15 posts

Morris Yau

Morris Yau

@MorrisYau

@MIT @Google Phd candidate in Computer Science doing research in foundational aspects of ML and NLP.

Katılım Şubat 2022
70 Takip Edilen314 Takipçiler
Morris Yau
Morris Yau@MorrisYau·
Oral at Neurips 2025! The optimization of linear attention admits nearly perfect theoretical characterization. Hope our work inspires new perspectives in Transformer learning and scaling architectural choices. arxiv.org/abs/2410.10101
Morris Yau tweet media
English
8
29
235
18.9K
Morris Yau
Morris Yau@MorrisYau·
See our paper for why this training diagram is "all you need" to understand modern RNN's (parallelizable training, constant time decode).
Morris Yau tweet media
English
1
1
8
1K
Morris Yau
Morris Yau@MorrisYau·
Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918
Morris Yau tweet media
English
3
38
193
38.6K
Morris Yau
Morris Yau@MorrisYau·
🤔 When increasing the number of parameters in a "real" transformer by a fixed budget, should we add more heads or more layers? Our synthetic data experiments reveal that adding heads is more effective than deepening the network.
English
1
0
7
1.1K
Morris Yau
Morris Yau@MorrisYau·
🧐 Is there a learning algorithm that rapidly finds the best fit transformer parameters to any dataset? arxiv.org/pdf/2410.10101
Morris Yau tweet media
English
1
25
191
31.8K
Morris Yau
Morris Yau@MorrisYau·
I suppose this begins my shameless journey of self promotion.
English
2
0
5
769