Morris Yau

@MorrisYau

@MIT @Google Phd candidate in Computer Science doing research in foundational aspects of ML and NLP.

Katılım Şubat 2022

70 Takip Edilen314 Takipçiler

Morris Yau@MorrisYau·19 Eyl

@akyurekekin @maojiayuan @jacobandreas @StefanieJegelka @joshtenenbaum

QAM

704

Morris Yau@MorrisYau·19 Eyl

Oral at Neurips 2025! The optimization of linear attention admits nearly perfect theoretical characterization. Hope our work inspires new perspectives in Transformer learning and scaling architectural choices. arxiv.org/abs/2410.10101

English

235

18.9K

Morris Yau@MorrisYau·14 Haz

Huge thanks to an incredible group of collaborators! @sharut_gupta @vneoncourse @jacobandreas @StefanieJegelka Kazuki Irie

English

827

Morris Yau@MorrisYau·14 Haz

See our paper for why this training diagram is "all you need" to understand modern RNN's (parallelizable training, constant time decode).

English

Morris Yau@MorrisYau·14 Haz

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

English

193

38.6K

Morris Yau@MorrisYau·2 Kas

🙏 Huge thanks to @akyurekekin @maojiayuan @joshtenenbaum @StefanieJegelka @jacobandreas

English

900

Morris Yau@MorrisYau·2 Kas

🤔 When increasing the number of parameters in a "real" transformer by a fixed budget, should we add more heads or more layers? Our synthetic data experiments reveal that adding heads is more effective than deepening the network.

English

1.1K

Morris Yau@MorrisYau·2 Kas

🧐 Is there a learning algorithm that rapidly finds the best fit transformer parameters to any dataset? arxiv.org/pdf/2410.10101

English

191

31.8K

Morris Yau@MorrisYau·19 Eki

I suppose this begins my shameless journey of self promotion.

English

769

Keşfet

@akyurekekin @maojiayuan @jacobandreas @StefanieJegelka @joshtenenbaum @sharut_gupta @vneoncourse @elonmusk