Akshay Kumar

@aksh0135

PhD student @UMNews | Undergrad @IITKanpur. Research interest: deep learning theory, optimization

Katılım Kasım 2018

122 Takip Edilen6 Takipçiler

Sabitlenmiş Tweet

Akshay Kumar@aksh0135·28 Eki

Excited to share our recent work introducing 𝗡𝗲𝘂𝗿𝗼𝗻 𝗣𝘂𝗿𝘀𝘂𝗶𝘁 (𝗡𝗣) - a greedy algorithm for training neural networks. arxiv.org/abs/2509.12154

English

220

Akshay Kumar retweetledi

Journal of Machine Learning Research@JmlrOrg·5 Şub

'Towards Understanding Gradient Flow Dynamics of Homogeneous Neural Networks Beyond the Origin', by Akshay Kumar, Jarvis Haupt. jmlr.org/papers/v26/25-… #gradients #gradient #flow

English

1.3K

Akshay Kumar@aksh0135·28 Eki

ii) Although NP is inspired by the GF dynamics in the small init regime, it is NOT equivalent to GF (the reasons are technical — see Sec. 5.3). Nonetheless, it could offer valuable insights into how neural networks build features during training. 13/n

English

Akshay Kumar@aksh0135·28 Eki

A few important caveats - i) NP currently applies to homogeneous activations, which includes (Leaky) ReLU and its higher powers (max(x,0)^p). It also does not yet cover ResNets or Transformers, but we plan to extend NP to more general architectures. 12/n

English

Akshay Kumar@aksh0135·28 Eki

Excited to share our recent work introducing 𝗡𝗲𝘂𝗿𝗼𝗻 𝗣𝘂𝗿𝘀𝘂𝗶𝘁 (𝗡𝗣) - a greedy algorithm for training neural networks. arxiv.org/abs/2509.12154

English

220

Akshay Kumar@aksh0135·28 Eki

Finally, we leverage these theoretical insights to build a greedy algorithm for training deep networks. More about that in the next thread.

English

Akshay Kumar@aksh0135·28 Eki

Extending the theory rigorously to ReLU is an important future direction. Another important direction is to study training dynamics of non-homogeneous architectures such as ResNets and Transformers.

English

Akshay Kumar@aksh0135·28 Eki

In a series of works, we study the dynamics of gradient flow arising from minimizing the training loss in supervised learning. We focus on homogeneous neural networks trained with small initialization. This (long) thread summarizes our key findings. 🧵

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry