Bryan Chan

50 posts

Bryan Chan

Bryan Chan

@chanpyb

PhD student @rlai_lab. Prev: @GoogleDeepMind, @OcadoTechnology, @kindredai, @UofTCompSci

Edmonton Inscrit le Ekim 2020
524 Abonnements191 Abonnés
Bryan Chan
Bryan Chan@chanpyb·
@danielwurgaft This loss/complexity tradeoff has started bothering me---I think we can be okay with (slightly) worse loss at the cost of better generalization, e.g. physics models will fail to predict noise but memorization can. Any thoughts about this, e.g. regularization, architecture, etc.?
English
1
0
0
45
Daniel Wurgaft
Daniel Wurgaft@danielwurgaft·
We assume two well-known facts about neural nets as computational constraints (scaling laws and simplicity bias). This allows writing a closed-form expression for the posterior odds! 6/
GIF
English
2
0
5
217
Daniel Wurgaft
Daniel Wurgaft@danielwurgaft·
🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/
GIF
English
1
17
54
8.2K
Bryan Chan
Bryan Chan@chanpyb·
@puneeshdeora @bhavya_vasudeva Great work! Our work arxiv.org/abs/2410.23042 shows that asymptotically ICL will choose the best prediction mode, but we didn't address anything about which is preferred when they're equally good---do you think it has to do with k=1 converging faster than k=3 in the MC case?
English
1
0
0
91
Puneesh Deora
Puneesh Deora@puneeshdeora·
We also probe: • model size 🏋️‍♂️ • skewed training mixtures ⚖️ • context length 📏 • LSTMs For more details check out our paper: 🔗📜arxiv.org/pdf/2506.19351 Work done with amazing collaborators: @bhavya_vasudeva , Tina Behnia, Christos Thrampoulidis.
English
2
1
10
743
Puneesh Deora
Puneesh Deora@puneeshdeora·
We train transformers on tasks from hierarchical complexity categories—simple ✖️ complex. Example #1 👉 order-1 vs. order-3 Markov chains. Result: The model identifies the order and switches between bigram and tetragram stats on the fly.
Puneesh Deora tweet media
English
1
1
8
1.3K
Bryan Chan
Bryan Chan@chanpyb·
@Stone_Tao There are few but this is what I have on my mind: arxiv.org/pdf/1712.01275 I think with smaller buffer size the data gets closer to on policy data, and larger -> more off policy
English
1
0
2
339
Stone Tao
Stone Tao@Stone_Tao·
Is there any research on replay buffer sizes for off policy (sac) RL algorithms? How come we have just assumed that 1M replay buffer size is reasonable for the majority of tasks we test on (in robotics typically). Why not smaller? Can we make it smaller?
English
14
3
97
14.3K
Bryan Chan retweeté
Association for Computing Machinery
Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD
English
32
457
1.5K
454.7K
Bryan Chan
Bryan Chan@chanpyb·
Thanks @m_wulfmeier ! We were surprised to see that SAC-X is just very robust. Something that was interesting to us that we didn’t further investigate: Learning from examples ended up being more efficient than using reward. Let’s chat at #NeurIPS2024 if there’s a chance?
Markus Wulfmeier@m_wulfmeier

Here's a fascinating paper by @domo_mr_roboto's group linking hierarchical reinforcement learning and cheaply-obtainable auxiliary tasks arxiv.org/abs/2407.03311 Better exploration with minimal engineering effort remains a critical challenge (even for RLHF/AIF) - reminiscent of our efforts on SAC-X and intrinsic rewards through representation learning (VAE, Transporter, etc.) arxiv.org/abs/2011.01758 Excited to see more progress in this space! #robotics #reinforcementlearning

English
0
0
5
358
Bryan Chan
Bryan Chan@chanpyb·
@anianruoss One immediate observation I have is that there seems to be no boundary between two demonstration sequence (list. 1). Would it not be problematic because the model can’t tell they are different demonstrations without further training?
English
1
0
0
90
Anian Ruoss
Anian Ruoss@anianruoss·
Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (arxiv.org/abs/2412.01441). 🧵 1/N
Anian Ruoss tweet media
English
4
13
75
9.8K
Bryan Chan
Bryan Chan@chanpyb·
@daibond_alpha @iclr_conf 3. Both scores 3 and 5 are somewhat due to experimental results like significance and benchmarks. (3) is interesting because former provides no empirical insight, while latter provides some, arguably claiming "maybe" the theory applies. Which one is more important/contribution?
English
0
0
2
505
Bryan Chan
Bryan Chan@chanpyb·
@daibond_alpha @iclr_conf Some interesting observations here: 1. It seems like the latter has "shorter reviews" and imo generally of lower quality than those of the former 2. The expectations seem to be different, maybe due to different primary areas? 3. ...
English
2
0
2
1.8K
Markus Wulfmeier
Markus Wulfmeier@m_wulfmeier·
Looking forward to my first #NeurIPS in four years! Massive progress in large scale decision making over the last years. LLMs start looking like robot control and vice versa! Ping if you're around and want to chat! @NeurIPSConf
English
3
3
27
3.4K
Bryan Chan retweeté
Mohamed Elsayed
Mohamed Elsayed@mhmd_elsaye·
Would you believe that deep RL can work without replay buffers, target networks, or batch updates? Our recent work gets deep RL agents to learn from a continuous stream of data one sample at a time without storing any sample. Joint work with @Gautham529 and @rupammahmood.
GIF
English
9
106
626
162.2K
Bryan Chan retweeté
Gautham Vasan
Gautham Vasan@Gautham529·
Our NeurIPS paper is now on arXiv: We introduce Action Value Gradient (AVG), a novel incremental deep RL method that learns in real-time, one sample at a time — no batch updates, target networks or a replay buffer! Co-authors @mhmd_elsaye @bellingerc @white_martha @rupammahmood
GIF
English
2
21
93
9.6K
Bryan Chan
Bryan Chan@chanpyb·
@c_voelcker @usmananwar391 What alternative are you using? I think I can see some limitations with the IQM approach but unsure what you think to address it
English
1
0
0
47
Claas Voelcker
Claas Voelcker@c_voelcker·
@usmananwar391 “Horribly misapplied” is probably an overstatement, I’m still salty about ICLR reviews and discussing why IQM is a bad metric with people.
English
2
0
0
533
Claas Voelcker
Claas Voelcker@c_voelcker·
@chanpyb It’s proud ranting, what this platform was built for!
English
1
0
0
114
Claas Voelcker
Claas Voelcker@c_voelcker·
Fun update: I said "run an experiment for another domain since this phenomenon might be specific to the examples you picked". The LLM chastised me for not giving a concrete example. However, the next sentence in my comment, which the LLM didn't cite, was the concrete example.
Claas Voelcker@c_voelcker

Hey, @iclr_conf . Standard policy for experimenting would be to ask for consent from participants and explain the setup (e.g. what systems are being used exactly) thoroughly. I don't think we should be legitimizing the use of LLMs in the review process. blog.iclr.cc/2024/10/09/icl…

English
1
0
2
693
Bryan Chan
Bryan Chan@chanpyb·
@XinyiChen2 I think this line of work will lead us to a better understanding of how LLMs work and further lead us to new ideas in designing training algorithms for various LLMs. N/N Arxiv Link: arxiv.org/abs/2410.23042
English
0
0
1
64
Bryan Chan
Bryan Chan@chanpyb·
@XinyiChen2 Of course, we have also conducted experiments on a synthetic dataset, Omniglot, and fine-tuned a LLM with a small number of prompts to corroborate our theoretical findings. 7/N
English
1
0
1
79
Bryan Chan
Bryan Chan@chanpyb·
LLMs can leverage context information, i.e., in-context learning (ICL) or memorize solutions, i.e., in-weight learning (IWL) for prediction, but when do they happen? 1/N
English
1
0
1
358