Bo Dai

223 posts

Bo Dai banner
Bo Dai

Bo Dai

@daibond_alpha

Assistant Professor at @gtcse, Research Scientist at @GoogleDeepMind | ex @googlebrain

California, USA Katılım Ekim 2012
793 Takip Edilen2.8K Takipçiler
Bo Dai
Bo Dai@daibond_alpha·
@Haolun_Wu0203 Hi Haolun, we have considered even more aggressive setting: finetune the model with only access to the sample of LLM arxiv.org/pdf/2402.08219. The setting with last layer logits we considered as gray box.
English
0
0
1
237
Haolun Wu
Haolun Wu@Haolun_Wu0203·
🚀 New Research Alert: Logits are All We Need to Adapt Closed Models 🔒Many commercial Large Language Models (LLMs), e.g., GPT-4, are closed-source, limiting developers to steer content generation. 🤔Can we adapt closed-source LLMs when fine-tuning or accessing their internal weights is not possible? Check out our work by @gaurushh, @Haolun_Wu0203, Subhojyoti, @sanmikoyejo from Stanford @stai_research. 1/n
Haolun Wu tweet media
English
16
34
196
27.8K
Bo Dai
Bo Dai@daibond_alpha·
@shaneguML 1, Gamify the problem with clear reward; 2, RL
English
0
0
3
743
Shane Gu
Shane Gu@shaneguML·
ChatGPT/RLHF was a distraction (though necessary). o3/gemini-2-flash-thinking/RL is the real game. Glad the community finally went over the local optimum. There's no looking back; it's all RL until AGI (for some domains). (Slide: Dave Silver)
Shane Gu tweet media
English
4
25
161
16.2K
Bo Dai
Bo Dai@daibond_alpha·
@denny_zhou The power of RL with clear target
English
0
0
4
616
Denny Zhou
Denny Zhou@denny_zhou·
any benchmark—including ARC-AGI—can be rapidly solved, as long as the task provides a clear evaluation metric that can be used as a reward signal during fine-tuning.
English
63
65
1.1K
111.1K
Bo Dai
Bo Dai@daibond_alpha·
@shaneguML Value function in RL is also approximating gradients
English
0
0
3
192
Shane Gu
Shane Gu@shaneguML·
I chose RL because MuProp couldn't make Arvind/Ilya/Quoc neural programmer work better. So instead of approximating gradients, I decided to go RL. It was inspired by e2e training of tool-augmented language agent. Ofc another reason: Sergey Levine, and I'll talk another time.
English
1
1
15
3.2K
Shane Gu
Shane Gu@shaneguML·
Second thing Ilya told me was Tim Lillicrap, a Canadian neuroscientist turned DeepMind researcher, his random synaptic feedback. He was imagining end-to-end toolformer where you differentiate through both weights and discrete Google search in 2015. This was why I wrote MuProp (mu used to be Greek mu for mean-field network, but inspired by MuZero, I later tell people it's actually mu meaning nothingness in Japanese) - I wrote MuProp with him dustintran.com/blog/muprop-un… (concurrent work from John Schulman. Acceptance with zero experiments arxiv.org/abs/1506.05254 - Arvind's work. We were interns together who stayed up late. His work is the first non toy work where I applied newly developed MuProp (and failed) arxiv.org/abs/1511.04834 - Ilya also had another intern (19 year old Stanford undergrad) working on differentiating through Google search - Tim's work nature.com/articles/ncomm…. And Tim is my manager right now at DeepMind. My condition for coming back to Google was to report to Tim Lillicrap and that was the second best decision I made in 2023.
Shane Gu@shaneguML

9 years ago I was interning under Ilya at Google Brain. First thing he told me is Solomonoff Induction and AIXI, i.e. why prediction leads to understanding. GPT came from sentiment neurons. The level of clarity he and Geoff had was always inspiring.

English
3
9
170
40.1K
Yuchen Zhuang
Yuchen Zhuang@yuchen_zhuang·
Excited to present HYDRA 🐉 at #NeurIPS2024! 🚀 Our novel model-factorization framework combines personal behavior patterns 👤 with global knowledge 🌐 for truly personalized LLM generation. Achieves 9%+ gains over SOTA across 5 tasks 🏆 using personalized RAG. Learn more: arxiv.org/pdf/2406.02888
Yuchen Zhuang tweet media
English
2
16
67
5.4K
Bryan Chan
Bryan Chan@chanpyb·
@daibond_alpha @iclr_conf Some interesting observations here: 1. It seems like the latter has "shorter reviews" and imo generally of lower quality than those of the former 2. The expectations seem to be different, maybe due to different primary areas? 3. ...
English
2
0
2
1.8K
Bo Dai
Bo Dai@daibond_alpha·
@damekdavis @CsabaSzepesvari I see. The additional K steps of vanilla gradient descent updates will be helpful in this case. Interesting!
English
0
0
1
147
Damek
Damek@damekdavis·
@daibond_alpha @CsabaSzepesvari So the method in our work is dealing with the (2a) case in theorem 1. Think of y^2 + x^4. Here \xi = 3/4, so linear convergence does not hold GNGD according to the theorem.
Damek tweet media
English
1
0
1
214
Damek
Damek@damekdavis·
New work exponentially accelerates gradient descent w/ adaptive steps, which alternate * small constant steps and * exponentially large Polyak steps Theory works for losses w/ quartic growth (and beyond). Exampls incl. overparam matrix sensing and toy neural nets paper below
Damek tweet media
English
6
33
288
54.7K
Durk Kingma
Durk Kingma@dpkingma·
Personal news: I'm joining @AnthropicAI! 😄 Anthropic's approach to AI development resonates significantly with my own beliefs; looking forward to contributing to Anthropic's mission of developing powerful AI systems responsibly. Can't wait to work with their talented team, including a number of great ex-colleagues from OpenAI and Google, and tackle the challenges ahead!
English
107
88
2.8K
348K