Bryan Chan

50 posts

Bryan Chan

@chanpyb

PhD student @rlai_lab. Prev: @GoogleDeepMind, @OcadoTechnology, @kindredai, @UofTCompSci

Edmonton Inscrit le Ekim 2020

524 Abonnements191 Abonnés

Bryan Chan@chanpyb·28 Haz

@danielwurgaft This loss/complexity tradeoff has started bothering me---I think we can be okay with (slightly) worse loss at the cost of better generalization, e.g. physics models will fail to predict noise but memorization can. Any thoughts about this, e.g. regularization, architecture, etc.?

English

Daniel Wurgaft@danielwurgaft·28 Haz

We assume two well-known facts about neural nets as computational constraints (scaling laws and simplicity bias). This allows writing a closed-form expression for the posterior odds! 6/

GIF

English

217

Daniel Wurgaft@danielwurgaft·28 Haz

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

GIF

English

8.2K

Bryan Chan@chanpyb·26 Haz

@puneeshdeora @bhavya_vasudeva Great work! Our work arxiv.org/abs/2410.23042 shows that asymptotically ICL will choose the best prediction mode, but we didn't address anything about which is preferred when they're equally good---do you think it has to do with k=1 converging faster than k=3 in the MC case?

English

Puneesh Deora@puneeshdeora·26 Haz

We also probe: • model size 🏋️‍♂️ • skewed training mixtures ⚖️ • context length 📏 • LSTMs For more details check out our paper: 🔗📜arxiv.org/pdf/2506.19351 Work done with amazing collaborators: @bhavya_vasudeva , Tina Behnia, Christos Thrampoulidis.

English

743

Puneesh Deora@puneeshdeora·26 Haz

We train transformers on tasks from hierarchical complexity categories—simple ✖️ complex. Example #1 👉 order-1 vs. order-3 Markov chains. Result: The model identifies the order and switches between bigram and tetragram stats on the fly.

English

1.3K

Bryan Chan@chanpyb·5 Haz

@Stone_Tao You can also look at the streaming setting: arxiv.org/pdf/2411.15370 and arxiv.org/pdf/2410.14606?

English

Bryan Chan@chanpyb·5 Haz

@Stone_Tao There are few but this is what I have on my mind: arxiv.org/pdf/1712.01275 I think with smaller buffer size the data gets closer to on policy data, and larger -> more off policy

English

339

Stone Tao@Stone_Tao·5 Haz

Is there any research on replay buffer sizes for off policy (sac) RL algorithms? How come we have just assumed that 1M replay buffer size is reasonable for the majority of tasks we test on (in robotics typically). Why not smaller? Can we make it smaller?

English

14.3K

Bryan Chan@chanpyb·22 Nis

I will be presenting this paper on how models trade-off in-context and in-weight learning at #ICLR2025 Drop by on Saturday and I’ll be happy to chat!

Bryan Chan@chanpyb

Excited to share that our work on understanding when ICL emerges has been accepted to #ICLR2025 ! Submission for preview: openreview.net/forum?id=aKJr5…

English

1.3K

Bryan Chan retweeté

Association for Computing Machinery@TheOfficialACM·5 Mar

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD

English

457

1.5K

454.7K

Bryan Chan@chanpyb·22 Oca

Excited to share that our work on understanding when ICL emerges has been accepted to #ICLR2025 ! Submission for preview: openreview.net/forum?id=aKJr5…

Bryan Chan@chanpyb

LLMs can leverage context information, i.e., in-context learning (ICL) or memorize solutions, i.e., in-weight learning (IWL) for prediction, but when do they happen? 1/N

English

1.5K

Bryan Chan@chanpyb·9 Ara

Thanks @m_wulfmeier ! We were surprised to see that SAC-X is just very robust. Something that was interesting to us that we didn’t further investigate: Learning from examples ended up being more efficient than using reward. Let’s chat at #NeurIPS2024 if there’s a chance?

Markus Wulfmeier@m_wulfmeier

Here's a fascinating paper by @domo_mr_roboto's group linking hierarchical reinforcement learning and cheaply-obtainable auxiliary tasks arxiv.org/abs/2407.03311 Better exploration with minimal engineering effort remains a critical challenge (even for RLHF/AIF) - reminiscent of our efforts on SAC-X and intrinsic rewards through representation learning (VAE, Transporter, etc.) arxiv.org/abs/2011.01758 Excited to see more progress in this space! #robotics #reinforcementlearning

English

358

Bryan Chan@chanpyb·4 Ara

@anianruoss One immediate observation I have is that there seems to be no boundary between two demonstration sequence (list. 1). Would it not be problematic because the model can’t tell they are different demonstrations without further training?

English

Anian Ruoss@anianruoss·3 Ara

Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (arxiv.org/abs/2412.01441). 🧵 1/N

English

9.8K

Bryan Chan@chanpyb·2 Ara

@daibond_alpha @iclr_conf 3. Both scores 3 and 5 are somewhat due to experimental results like significance and benchmarks. (3) is interesting because former provides no empirical insight, while latter provides some, arguably claiming "maybe" the theory applies. Which one is more important/contribution?

English

505

Bryan Chan@chanpyb·2 Ara

@daibond_alpha @iclr_conf Some interesting observations here: 1. It seems like the latter has "shorter reviews" and imo generally of lower quality than those of the former 2. The expectations seem to be different, maybe due to different primary areas? 3. ...

English

1.8K

Bryan Chan@chanpyb·28 Kas

@m_wulfmeier @NeurIPSConf I’ll be at NeurIPS too! Let’s chat about robot learning!

English

Markus Wulfmeier@m_wulfmeier·25 Kas

Looking forward to my first #NeurIPS in four years! Massive progress in large scale decision making over the last years. LLMs start looking like robot control and vice versa! Ping if you're around and want to chat! @NeurIPSConf

English

3.4K

Bryan Chan retweeté

Mohamed Elsayed@mhmd_elsaye·22 Kas

Would you believe that deep RL can work without replay buffers, target networks, or batch updates? Our recent work gets deep RL agents to learn from a continuous stream of data one sample at a time without storing any sample. Joint work with @Gautham529 and @rupammahmood.

GIF

English

106

626

162.2K

Bryan Chan retweeté

Gautham Vasan@Gautham529·26 Kas

Our NeurIPS paper is now on arXiv: We introduce Action Value Gradient (AVG), a novel incremental deep RL method that learns in real-time, one sample at a time — no batch updates, target networks or a replay buffer! Co-authors @mhmd_elsaye @bellingerc @white_martha @rupammahmood

GIF

English

9.6K

Bryan Chan@chanpyb·23 Kas

@c_voelcker @usmananwar391 What alternative are you using? I think I can see some limitations with the IQM approach but unsure what you think to address it

English

Claas Voelcker@c_voelcker·23 Kas

@usmananwar391 “Horribly misapplied” is probably an overstatement, I’m still salty about ICLR reviews and discussing why IQM is a bad metric with people.

English

533

Claas Voelcker@c_voelcker·22 Kas

I genuinely can’t tell if this is a joke…

Anthropic@AnthropicAI

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…

English

649

199.3K

Bryan Chan@chanpyb·22 Kas

@c_voelcker @apsarathchandar What gets me is that they claim that AI model evals don’t include any uncertainty/statistics…

English

384

Claas Voelcker@c_voelcker·22 Kas

@apsarathchandar I somehow take more offense at them framing this as "research" than the actual blog post 😂 big "rediscovering trapezoid rule" vibes diabetesjournals.org/care/article/1…

English

6.3K

Bryan Chan retweeté

REAL - Robotics and Embodied AI Lab@MontrealRobots·21 Kas

Hey all! We are thrilled to have @chanpyb from @UAlberta for this week's seminar! The talk is titled: "Why can't we use reinforcement learning for image-based robotic manipulation?". See you at 11:30AM ET! @MontrealRobotics" target="_blank" rel="nofollow noopener">youtube.com/@MontrealRobot… #rl #manipulation, #imitationLearning

REAL - Robotics and Embodied AI Lab tweet media

English

1.9K

Bryan Chan@chanpyb·2 Kas

@c_voelcker Haha yes, mine is also a rant about the same thing.

English

Claas Voelcker@c_voelcker·2 Kas

@chanpyb It’s proud ranting, what this platform was built for!

English

114

Claas Voelcker@c_voelcker·1 Kas

Fun update: I said "run an experiment for another domain since this phenomenon might be specific to the examples you picked". The LLM chastised me for not giving a concrete example. However, the next sentence in my comment, which the LLM didn't cite, was the concrete example.

Claas Voelcker@c_voelcker

Hey, @iclr_conf . Standard policy for experimenting would be to ask for consent from participants and explain the setup (e.g. what systems are being used exactly) thoroughly. I don't think we should be legitimizing the use of LLMs in the review process. blog.iclr.cc/2024/10/09/icl…

English

693

Bryan Chan@chanpyb·31 Eki

@XinyiChen2 I think this line of work will lead us to a better understanding of how LLMs work and further lead us to new ideas in designing training algorithms for various LLMs. N/N Arxiv Link: arxiv.org/abs/2410.23042

English

Bryan Chan@chanpyb·31 Eki

@XinyiChen2 Of course, we have also conducted experiments on a synthetic dataset, Omniglot, and fine-tuned a LLM with a small number of prompts to corroborate our theoretical findings. 7/N

English

Bryan Chan@chanpyb·31 Eki

LLMs can leverage context information, i.e., in-context learning (ICL) or memorize solutions, i.e., in-weight learning (IWL) for prediction, but when do they happen? 1/N

English

358

Découvrir

@danielwurgaft @puneeshdeora @bhavya_vasudeva @Stone_Tao @m_wulfmeier @anianruoss @daibond_alpha @iclr_conf