Bo Dai

223 posts

Bo Dai

@daibond_alpha

Assistant Professor at @gtcse, Research Scientist at @GoogleDeepMind | ex @googlebrain

California, USA Katılım Ekim 2012

793 Takip Edilen2.8K Takipçiler

Sabitlenmiş Tweet

Bo Dai@daibond_alpha·12 Ara

RL is so back, as David Silver predicted. m.youtube.com/watch?v=pkpJMN…

English

178

32.6K

Bo Dai@daibond_alpha·5 Mar

@Haolun_Wu0203 Hi Haolun, we have considered even more aggressive setting: finetune the model with only access to the sample of LLM arxiv.org/pdf/2402.08219. The setting with last layer logits we considered as gray box.

English

237

Haolun Wu@Haolun_Wu0203·4 Mar

🚀 New Research Alert: Logits are All We Need to Adapt Closed Models 🔒Many commercial Large Language Models (LLMs), e.g., GPT-4, are closed-source, limiting developers to steer content generation. 🤔Can we adapt closed-source LLMs when fine-tuning or accessing their internal weights is not possible? Check out our work by @gaurushh, @Haolun_Wu0203, Subhojyoti, @sanmikoyejo from Stanford @stai_research. 1/n

English

196

27.8K

Bo Dai@daibond_alpha·21 Ara

RL is sparkling again.

Bo Dai@daibond_alpha

RL is so back, as David Silver predicted. m.youtube.com/watch?v=pkpJMN…

English

5.9K

Bo Dai@daibond_alpha·21 Ara

@shaneguML 1, Gamify the problem with clear reward; 2, RL

English

743

Shane Gu@shaneguML·21 Ara

ChatGPT/RLHF was a distraction (though necessary). o3/gemini-2-flash-thinking/RL is the real game. Glad the community finally went over the local optimum. There's no looking back; it's all RL until AGI (for some domains). (Slide: Dave Silver)

English

161

16.2K

Bo Dai@daibond_alpha·21 Ara

@denny_zhou The power of RL with clear target

English

616

Denny Zhou@denny_zhou·21 Ara

any benchmark—including ARC-AGI—can be rapidly solved, as long as the task provides a clear evaluation metric that can be used as a reward signal during fine-tuning.

English

1.1K

111.1K

Bo Dai@daibond_alpha·12 Ara

@shaneguML Value function in RL is also approximating gradients

English

192

Shane Gu@shaneguML·11 Ara

I chose RL because MuProp couldn't make Arvind/Ilya/Quoc neural programmer work better. So instead of approximating gradients, I decided to go RL. It was inspired by e2e training of tool-augmented language agent. Ofc another reason: Sergey Levine, and I'll talk another time.

English

3.2K

Shane Gu@shaneguML·11 Ara

Second thing Ilya told me was Tim Lillicrap, a Canadian neuroscientist turned DeepMind researcher, his random synaptic feedback. He was imagining end-to-end toolformer where you differentiate through both weights and discrete Google search in 2015. This was why I wrote MuProp (mu used to be Greek mu for mean-field network, but inspired by MuZero, I later tell people it's actually mu meaning nothingness in Japanese) - I wrote MuProp with him dustintran.com/blog/muprop-un… (concurrent work from John Schulman. Acceptance with zero experiments arxiv.org/abs/1506.05254 - Arvind's work. We were interns together who stayed up late. His work is the first non toy work where I applied newly developed MuProp (and failed) arxiv.org/abs/1511.04834 - Ilya also had another intern (19 year old Stanford undergrad) working on differentiating through Google search - Tim's work nature.com/articles/ncomm…. And Tim is my manager right now at DeepMind. My condition for coming back to Google was to report to Tim Lillicrap and that was the second best decision I made in 2023.

Shane Gu@shaneguML

9 years ago I was interning under Ilya at Google Brain. First thing he told me is Solomonoff Induction and AIXI, i.e. why prediction leads to understanding. GPT came from sentiment neurons. The level of clarity he and Geoff had was always inspiring.

English

170

40.1K

Bo Dai@daibond_alpha·10 Ara

@yuchen_zhuang @haotiansun014 @yue___yu @chaozhangcs I will be there for helping the presentation. :)

English

137

Yuchen Zhuang@yuchen_zhuang·10 Ara

(🧵6/N) Many thanks to our awesome collaborators: @haotiansun014, @yue___yu, Rushi Qiang, Qifan Wang, @chaozhangcs, @daibond_alpha Unfortunately, I will miss this year's #NeurIPS2024. But I am open to chat about LLMs for agentic reasoning and planning virtually. DM or email me📨!

English

412

Yuchen Zhuang@yuchen_zhuang·10 Ara

Excited to present HYDRA 🐉 at #NeurIPS2024! 🚀 Our novel model-factorization framework combines personal behavior patterns 👤 with global knowledge 🌐 for truly personalized LLM generation. Achieves 9%+ gains over SOTA across 5 tasks 🏆 using personalized RAG. Learn more: arxiv.org/pdf/2406.02888

English

5.4K

Bo Dai@daibond_alpha·2 Ara

@chanpyb @iclr_conf The lesson is selecting the primary area carefully.

English

1.5K

Bryan Chan@chanpyb·2 Ara

@daibond_alpha @iclr_conf Some interesting observations here: 1. It seems like the latter has "shorter reviews" and imo generally of lower quality than those of the former 2. The expectations seem to be different, maybe due to different primary areas? 3. ...

English

1.8K

Bo Dai@daibond_alpha·11 Eki

We scale up the diffusion transformer for SoTA performance with fast inference!

Haotian Sun@haotiansun014

(3/N) EC-DIT outperforms dense models while maintaining competitive inference speed. Our largest model (64 experts) hits a GenEval score of 71.68%🔝, with around 23% additional overhead to the dense model.

English

5.7K

Bo Dai@daibond_alpha·4 Eki

I did not even have 10 submissions…. There are two different “Bo Dai”.

Peter Richtarik@peter_richtarik

Source: papercopilot.com/paper-list/neu…

English

137

35.1K

Bo Dai@daibond_alpha·3 Eki

@damekdavis @CsabaSzepesvari I see. The additional K steps of vanilla gradient descent updates will be helpful in this case. Interesting!

English

147

Damek@damekdavis·3 Eki

@daibond_alpha @CsabaSzepesvari So the method in our work is dealing with the (2a) case in theorem 1. Think of y^2 + x^4. Here \xi = 3/4, so linear convergence does not hold GNGD according to the theorem.

English

214

Damek@damekdavis·3 Eki

New work exponentially accelerates gradient descent w/ adaptive steps, which alternate * small constant steps and * exponentially large Polyak steps Theory works for losses w/ quartic growth (and beyond). Exampls incl. overparam matrix sensing and toy neural nets paper below

English

288

54.7K

Bo Dai@daibond_alpha·3 Eki

@damekdavis @CsabaSzepesvari Looking forward to your feedback!

English

153

Damek@damekdavis·3 Eki

@daibond_alpha @CsabaSzepesvari Neat, I'll take a look!

English

359

Bo Dai@daibond_alpha·2 Eki

@dpkingma @AnthropicAI Congrats to you and @AnthropicAI !

English

400

Durk Kingma@dpkingma·1 Eki

Personal news: I'm joining @AnthropicAI! 😄 Anthropic's approach to AI development resonates significantly with my own beliefs; looking forward to contributing to Anthropic's mission of developing powerful AI systems responsibly. Can't wait to work with their talented team, including a number of great ex-colleagues from OpenAI and Google, and tackle the challenges ahead!

English

107

2.8K

348K

Bo Dai@daibond_alpha·3 Ağu

@Lumi871 @GTCSE Welcome, Lu!

English

293

Bo Dai@daibond_alpha·19 Haz

Please consider joining us to explore the frontier on generative foundation model for decision making, planning, and reasoning.

Hanjun Dai@hanjundai

Our team (w/Dale, @daibond_alpha, @mengjiao_yang + others) at Google DeepMind is looking to hire. If you are interested in foundation models+decision making, and making real-world impact through Gemini and cloud solutions, please consider applying through boards.greenhouse.io/deepmind/jobs/…

English

4.6K

Bo Dai@daibond_alpha·5 Haz

My Twitter Interaction Circle ➡️ infinityweet.me/interaction-ci…

English

2.8K

Bo Dai@daibond_alpha·5 Haz

Our black-box adaptation for LLMs has been accepted to #ICML2024. We provide offline and online learning strategy for a value function to pivot the LLMs decoding procedure, only with the access to the output sentences of LLMs.

Bo Dai@daibond_alpha

We make local private adaptation of GPT possible!

English

2.6K

Keşfet

@Haolun_Wu0203 @gaurushh @sanmikoyejo @stai_research @shaneguML @denny_zhou @yuchen_zhuang @haotiansun014