Geoffrey Cideron

20 posts

Geoffrey Cideron

@CdrGeo

Research Engineer at Google DeepMind. Spent time at FAIR London, INRIA Lille, and Instadeep.

เข้าร่วม Mart 2019

411 กำลังติดตาม229 ผู้ติดตาม

Geoffrey Cideron@CdrGeo·10 Eki

S/O to my amazing collaborators!! @ramealexandre @OlivierBachem @sarah_perrin_ @_andrea_agos @johanferret Romuald Elie and Sertan Girgin

English

206

Geoffrey Cideron@CdrGeo·10 Eki

Happy to introduce our new paper "Diversity-Rewarded CFG Distillation". We combine distillation, a novel diversity reward, and model merging to improve the quality-diversity tradeoff of MusicLM. arxiv: arxiv.org/abs/2410.06084 More info:

Alexandre Ramé@ramealexandre

An AI will win a Nobel price someday✨. Yet currently, alignment reduces creativity. Our new @GoogleDeepMind paper "diversity-rewarded CFG distillation" improves quality AND diversity for music, via distillation of test-time compute, RL with a diversity reward, and model merging. arxiv: arxiv.org/abs/2410.06084 website: google-research.github.io/seanet/musiclm…

English

2.6K

Geoffrey Cideron รีทวีตแล้ว

Robert Dadashi@robdadashi·8 Nis

I am very happy to announce that Gemma 1.1 Instruct 2B and “7B” are out! Here are a few details about the new models: 1/11

English

362

324.6K

Geoffrey Cideron รีทวีตแล้ว

Robert Dadashi@robdadashi·21 Şub

I am so proud to see Gemma released today! I have had a fantastic time working on post-training and RLHF with an amazing team. Cannot wait to see what the community builds with these models!

Google DeepMind@GoogleDeepMind

Introducing Gemma: a family of lightweight, state-of-the-art open models for developers and researchers to build with AI. 🌐 We’re also releasing tools to support innovation and collaboration - as well as to guide responsible use. Get started now. → dpmd.ai/3UJu1Y1

English

8.5K

Geoffrey Cideron รีทวีตแล้ว

Johan Ferret@johanferret·8 Şub

Online feedback is crucial for alignment, so we propose a simple recipe to make any direct alignment method (think DPO / IPO / SLiC-HF) online using AI feedback 🧙‍♂️ In human evals, online methods yield on avg 66% wins, 28% ties and 6% losses vs offline methods (on TL;DR) 👀

AK@_akhaliq

Direct Language Model Alignment from Online AI Feedback paper page: huggingface.co/papers/2402.04… Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy. In this study, we posit that online feedback is key and improves DAP methods. Our method, online AI feedback (OAIF), uses an LLM as annotator: on each training iteration, we sample two responses from the current model and prompt the LLM annotator to choose which one is preferred, thus providing online feedback. Despite its simplicity, we demonstrate via human evaluation in several tasks that OAIF outperforms both offline DAP and RLHF methods. We further show that the feedback leveraged in OAIF is easily controllable, via instruction prompts to the LLM annotator.

English

4.3K

Geoffrey Cideron รีทวีตแล้ว

AK@_akhaliq·7 Şub

Google presents MusicRL Aligning Music Generation to Human Preferences paper page: huggingface.co/papers/2402.04… propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.

English

276

41.2K

Geoffrey Cideron รีทวีตแล้ว

Neil Zeghidour@neilzegh·7 Şub

Very proud of the work done by @CdrGeo , one of my last projects at Google. When we released MusicLM in May ’23, we incorporated a feedback system to realize the first ever large-scale, organic improvement of music generation through RLHF. 🎶🧵

Geoffrey Cideron@CdrGeo

Happy to introduce our paper MusicRL, the first music generation system finetuned with human preferences. Paper link: arxiv.org/abs/2402.04229

English

6.7K

Geoffrey Cideron@CdrGeo·7 Şub

Shoutout to my amazing collaborators: @_andrea_agos, @neilzegh, @leonardhussenot, @OlivierBachem, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, @zalanborsos, Brian McWilliams, Victor Ungureanu, Olivier Pietquin , Matthieu Geist.

Suomi

518

Geoffrey Cideron@CdrGeo·7 Şub

Samples can be found at google-research.github.io/seanet/musiclm….

English

531

Geoffrey Cideron@CdrGeo·7 Şub

Happy to introduce our paper MusicRL, the first music generation system finetuned with human preferences. Paper link: arxiv.org/abs/2402.04229

English

22.2K

Geoffrey Cideron รีทวีตแล้ว

Johan Ferret@johanferret·5 Haz

Our #ACL2023 paper "Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback" is now on arXiv! tl;dr - we improve the factuality of summaries via RL, without human feedback! 📜 arxiv.org/abs/2306.00186 Thread (1/10) 👇

English

30.8K

Geoffrey Cideron รีทวีตแล้ว

ëugene kharitonov 🏴‍☠️@n0mad_0·29 Mar

We* are looking for a Student Researcher** to work with us on a project in intersection of modeling/generating speech/audio, NLP, and representation learning. *AudioLM team @ Google Research (@zalanborsos, @neilzegh, myself and many others!) **not-last-year PhD student

English

17.8K

Geoffrey Cideron รีทวีตแล้ว

Robert Dadashi@robdadashi·23 Eyl

Very proud to contribute to making RL agents more accessible and reproducible!

Google DeepMind@GoogleDeepMind

Acme, a framework for distributed RL research, has been updated to be cleaner, more modular, and to support more agents - including offline & imitation. Try it yourself! GitHub: dpmd.ai/acme-github Quickstart: dpmd.ai/acme-quickstart V2 Paper: dpmd.ai/acme-paper 1/

English

Geoffrey Cideron รีทวีตแล้ว

Olivier Bachem@OlivierBachem·15 Eyl

A common belief is that text auto encoders produce badly structured latent spaces with holes. We were surprised to find that using round-trip translations (e.g. en->de->en) one can obtain nicely structured latent spaces. Check out arxiv.org/pdf/2209.06792….

English

Geoffrey Cideron รีทวีตแล้ว

Johan Ferret@johanferret·17 Mar

Excited to announce that our #AAMAS2022 paper "Lazy-MDPs: Towards Interpretable RL by Learning When to Act" is on arXiv! 🦥 tl;dr - we introduce lazy-MDPs, modified MDPs that allow agents to defer decision-making to a third-party policy 📜 arxiv.org/abs/2203.08542 🧵👇

English

Geoffrey Cideron@CdrGeo·30 Ara

It was great to work with @AmartyaSanyal, @_rockt, and @egrefen at FAIR London. This line of research is fascinating! Thank you for the opportunity! Additional gratitude to @RCalandra for the support and advice.

Edward Grefenstette@egrefen

I've been thinking a lot about this work recently, esp. the fascinating ML problems that emerge when you want to solve it without generating doc/env variants. Ongoing work on this with @AmartyaSanyal+@CdrGeo who I had the pleasure of remotely hosting as interns this year. [3/14]

English

Geoffrey Cideron รีทวีตแล้ว

Xuedong F.C.J.S Shang@AbsolutSamuel·8 Tem

Matteo Hessel and @OriolVinyalsML giving talks on Deep RL and games at #RLSS2019 @DeepMindAI

English

ค้นพบ

@ramealexandre @OlivierBachem @sarah_perrin_ @_andrea_agos @johanferret @neilzegh @zalanborsos @AmartyaSanyal