Leonard Berrada

25 posts

Leonard Berrada

@LeonardBerrada

Senior Research Scientist @GoogleDeepMind

Katılım Haziran 2019

313 Takip Edilen181 Takipçiler

Leonard Berrada retweetledi

Oriol Vinyals@OriolVinyalsML·25 Mar

Introducing Gemini 2.5 Pro Experimental! 🎉 Our newest Gemini model has stellar performance across math and science benchmarks. It’s an incredible model for coding and complex reasoning, and it’s #1 on the @lmarena_ai leaderboard by a drastic 40 ELO margin. Only a handful of model releases have leaped ahead so strongly in ELO. 📈 ELO score differences map directly to win rate: e.g. a 400 ELO difference yields a ~91% win rate. Incredible that since 1.5, just a year ago, we jumped 200 ELO (300 since 1.0). Here’s a fun example where Gemini 2.5 Pro writes code to create an animated swarm of colorful boids swimming in a rotating hexagon. 💫 Try the model for free today in AI Studio. It’s also available to Gemini Advanced users in @geminiapp. aistudio.google.com/app/prompts/ge… Blog: goo.gle/4c3NitO

English

163

1.1K

211.7K

Leonard Berrada retweetledi

Demis Hassabis@demishassabis·25 Mar

Gemini 2.5 Pro is an awesome state-of-the-art model, no.1 on LMArena by a whopping +39 ELO points, with significant improvements across the board in multimodal reasoning, coding & STEM. You can try it out now in AI Studio ai.dev & @GeminiApp with Gemini Advanced

Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English

214

1.8K

315.7K

Leonard Berrada retweetledi

Logan Kilpatrick@OfficialLoganK·25 Mar

Introducing Gemini 2.5 Pro, the world's most powerful model, with unified reasoning capabilities + all the things you love about Gemini (long context, tools, etc) Available as experimental and for free right now in Google AI Studio + API, with pricing coming very soon!

English

263

440

332.9K

Leonard Berrada retweetledi

Soham De@sohamde_·9 Nis

Releasing RecurrentGemma - one of the strongest 2B-param open models designed for fast inference on long sequences and massive throughput! Both pre-trained and IT checkpoints available + code - try them out here! Code: github.com/google-deepmin… Weights: kaggle.com/models/google/…

English

202

47.6K

Leonard Berrada retweetledi

Samuel L Smith@SamuelMLSmith·9 Nis

Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences

English

271

177.9K

Leonard Berrada retweetledi

Soham De@sohamde_·14 Mar

Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! arxiv.org/abs/2402.19427 My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!

English

305

48.5K

Leonard Berrada retweetledi

AK@_akhaliq·1 Mar

Google presents Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN

English

510

121.8K

Leonard Berrada@LeonardBerrada·1 Mar

Very happy to see the paper finally out! Griffin is a really strong and fast contender in the competitive LLM space. Feeling lucky to work with this world class team.

Aleksandar Botev@botev_mg

We present Griffin: A hybrid model mixing a gated linear recurrence with local attention. This combination is extremely effective: it preserves all the efficient benefits of linear RNNs and the expressiveness of transformers. Scaled up to 14B! arxiv.org/abs/2402.19427

English

188

Leonard Berrada@LeonardBerrada·23 Oca

@judyhshen Link to the paper: arxiv.org/abs/2308.10888, and to the code: github.com/google-deepmin…

English

Leonard Berrada@LeonardBerrada·23 Oca

Excellent article by @judyhshen recapping our paper on differentially private image classification w/ high accuracy and low disparity. The fact that it is possible at all to obtain such low disparities was a surprise to me, and a highlight of her great internship work last year!

Judy Shen@judyhshen

We need to rethink the belief that all privacy-preserving models are inherently more discriminatory. I give a high-level overview of why - in this @mtlaiethics blog post based on my summer internship work at @GoogleDeepMind montrealethics.ai/unlocking-accu… 1/5

English

244

Leonard Berrada retweetledi

Samuel L Smith@SamuelMLSmith·26 Eki

ConvNets Match Vision Transformers at Scale: arxiv.org/abs/2310.16764 We scale NFNet pre-training on JFT-4B from 0.4 to 110k TPU-v4 core hours. After fine-tuning, our largest model achieves 90.4% ImageNet Top-1, competitive with ViTs pre-trained for similar compute budgets. 1/3

English

118

96.4K

Leonard Berrada retweetledi

Google DeepMind@GoogleDeepMind·17 Haz

Training with differential privacy (DP) prevents models from leaking sensitive training data, but it often incurs a large drop in accuracy. In recent work, our team substantially improved the performance of image classification with DP. Read more: dpmd.ai/dm-dp-sgd 1/

English

302

Leonard Berrada retweetledi

Google DeepMind@GoogleDeepMind·30 Eki

Excited to share #NeurIPS2020 papers on efficient and tight neural network verification, based on efficient solvers for LP and SDP relaxations. Implementations of these in JAX are also available as part of the new jax_verify library, described here: bit.ly/2TE1Qcc

English

341

Leonard Berrada@LeonardBerrada·14 Tem

TLDR: if the loss of your deep learning task can go to zero, you might want to give ALI-G a try. It can spare you the hassle of tuning a learning-rate schedule. Overall, we hope that this is a useful step towards easier and more reliable optimisation algorithms for deep learning.

English

Leonard Berrada@LeonardBerrada·14 Tem

We provide experiments on a variety of architectures (Differentiable Neural Computer, ResNets, bi-LSTMs) and datasets (SNLI, SVHN, CIFAR-10/100, ImageNet). More details in the paper: proceedings.icml.cc/static/paper_f… (arxiv version to be updated soon as well).

English

Leonard Berrada@LeonardBerrada·14 Tem

This week at #ICML2020 , I'll be presenting Adaptive Learning-rates for Interpolation with Gradients, aka ALI-G. ALI-G is designed to automatically adapt the learning-rate of SGD for deep learning when the loss can go to zero.

English

Leonard Berrada@LeonardBerrada·22 Şub

@ID_AA_Carmack This can be seen as an example of a more general trade-off in optimization: how "good" the descent direction vs how easy it is to compute

English

Leonard Berrada@LeonardBerrada·22 Şub

@ID_AA_Carmack Then one can locally model the problem up to second order, which takes into account some level of interaction between layers. This yields Newton-type methods, but these are more computationally expensive

English

John Carmack@ID_AA_Carmack·22 Şub

It bugs me a little that the gradient calculated by backprop in a neural network isn't actually the "steepest descent", because the partial derivatives between layers interact. Of course, optimizers are adapting everything anyway, but I wonder if there might be a structural hint.

English

271

Keşfet

@lmarena_ai @geminiapp @GeminiApp @judyhshen @ID_AA_Carmack @elonmusk @BarackObama @taylorswift13