
Excited to see our paper "Rethinking the expressive power of GNNs via graph biconnectivity" accepted as an 𝗼𝗿𝗮𝗹 𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 (notable-top 5%) at #ICLR2023! arxiv.org/abs/2301.09505 Joint work with @Roger98079446, Liwei Wang, and Di He 1/n
Shengjie Luo
119 posts

@Roger98079446
PhD Student @pku1898, interested in Machine Learning

Excited to see our paper "Rethinking the expressive power of GNNs via graph biconnectivity" accepted as an 𝗼𝗿𝗮𝗹 𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 (notable-top 5%) at #ICLR2023! arxiv.org/abs/2301.09505 Joint work with @Roger98079446, Liwei Wang, and Di He 1/n




here are the most important points from today's ilya sutskever podcast: - superintelligence in 5-20 years - current scaling will stall hard; we're back to real research - superintelligence = super-fast continual learner, not finished oracle - models generalize 100x worse than humans, the biggest AGI blocker - need completely new ML paradigm (i have ideas, can't share rn) - AI impact will hit hard, but only after economic diffusion - breakthroughs historically needed almost no compute - SSI has enough focused research compute to win - current RL already eats more compute than pre-training




🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning. 📖 For more details, check out our paper here: arxiv.org/abs/2502.11089









