

Yanlai Yang
21 posts

@YanlaiYang
PhD student @nyuniversity @agentic_ai_lab



Research from CDS Asst Prof @mengyer and Courant PhD student @choang333 shows how the Midway Network learns object recognition and motion jointly from raw video, using motion latents and a gating unit to model real dynamics. nyudatascience.medium.com/watching-the-w…














🔍 New LLM Research 🔍 Conventional wisdom says that deep neural networks suffer from catastrophic forgetting as we train them on a sequence of data points with distribution shifts. But conventions are meant to be challenged! In our recent paper led by @YanlaiYang, we discovered a curious behavior in overparameterized networks, especially LLMs—as we train the network on a cyclic sequence of documents, it starts to anticipate the next document and reverses the forgetting trend! ⤴️ ▶️ After 3-4 cycles, the network reverses over 90% of the forgetting right before seeing the original document again. ▶️ The amount of anticipation emerges with the size of the network. LLMs <= 160M show no anticipation. ▶️ We showed that you can reproduce such an effect in a toy network! Check out more details in our arXiv preprint on anticipatory recovery: Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training. 🚀 arxiv.org/abs/2403.09613 🚀 #LLM #AI #Research








