Leonard Berrada

25 posts

Leonard Berrada

Leonard Berrada

@LeonardBerrada

Senior Research Scientist @GoogleDeepMind

Katılım Haziran 2019
313 Takip Edilen181 Takipçiler
Leonard Berrada retweetledi
Oriol Vinyals
Oriol Vinyals@OriolVinyalsML·
Introducing Gemini 2.5 Pro Experimental! 🎉 Our newest Gemini model has stellar performance across math and science benchmarks. It’s an incredible model for coding and complex reasoning, and it’s #1 on the @lmarena_ai leaderboard by a drastic 40 ELO margin. Only a handful of model releases have leaped ahead so strongly in ELO. 📈 ELO score differences map directly to win rate: e.g. a 400 ELO difference yields a ~91% win rate. Incredible that since 1.5, just a year ago, we jumped 200 ELO (300 since 1.0). Here’s a fun example where Gemini 2.5 Pro writes code to create an animated swarm of colorful boids swimming in a rotating hexagon. 💫 Try the model for free today in AI Studio. It’s also available to Gemini Advanced users in @geminiapp. aistudio.google.com/app/prompts/ge… Blog: goo.gle/4c3NitO
English
49
163
1.1K
211.7K
Leonard Berrada retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Gemini 2.5 Pro is an awesome state-of-the-art model, no.1 on LMArena by a whopping +39 ELO points, with significant improvements across the board in multimodal reasoning, coding & STEM. You can try it out now in AI Studio ai.dev & @GeminiApp with Gemini Advanced
Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English
73
214
1.8K
315.7K
Leonard Berrada retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Gemini 2.5 Pro, the world's most powerful model, with unified reasoning capabilities + all the things you love about Gemini (long context, tools, etc) Available as experimental and for free right now in Google AI Studio + API, with pricing coming very soon!
Logan Kilpatrick tweet media
English
263
440
4K
332.9K
Leonard Berrada retweetledi
Soham De
Soham De@sohamde_·
Releasing RecurrentGemma - one of the strongest 2B-param open models designed for fast inference on long sequences and massive throughput! Both pre-trained and IT checkpoints available + code - try them out here! Code: github.com/google-deepmin… Weights: kaggle.com/models/google/…
English
1
38
202
47.6K
Leonard Berrada retweetledi
Samuel L Smith
Samuel L Smith@SamuelMLSmith·
Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences
Samuel L Smith tweet media
English
9
63
271
177.9K
Leonard Berrada retweetledi
Soham De
Soham De@sohamde_·
Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! arxiv.org/abs/2402.19427 My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!
English
12
65
305
48.5K
Leonard Berrada retweetledi
AK
AK@_akhaliq·
Google presents Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN
AK tweet media
English
7
94
510
121.8K
Leonard Berrada
Leonard Berrada@LeonardBerrada·
Very happy to see the paper finally out! Griffin is a really strong and fast contender in the competitive LLM space. Feeling lucky to work with this world class team.
Aleksandar Botev@botev_mg

We present Griffin: A hybrid model mixing a gated linear recurrence with local attention. This combination is extremely effective: it preserves all the efficient benefits of linear RNNs and the expressiveness of transformers. Scaled up to 14B! arxiv.org/abs/2402.19427

English
0
0
11
188
Leonard Berrada
Leonard Berrada@LeonardBerrada·
Excellent article by @judyhshen recapping our paper on differentially private image classification w/ high accuracy and low disparity. The fact that it is possible at all to obtain such low disparities was a surprise to me, and a highlight of her great internship work last year!
Judy Shen@judyhshen

We need to rethink the belief that all privacy-preserving models are inherently more discriminatory. I give a high-level overview of why - in this @mtlaiethics blog post based on my summer internship work at @GoogleDeepMind montrealethics.ai/unlocking-accu… 1/5

English
1
0
3
244
Leonard Berrada retweetledi
Samuel L Smith
Samuel L Smith@SamuelMLSmith·
ConvNets Match Vision Transformers at Scale: arxiv.org/abs/2310.16764 We scale NFNet pre-training on JFT-4B from 0.4 to 110k TPU-v4 core hours. After fine-tuning, our largest model achieves 90.4% ImageNet Top-1, competitive with ViTs pre-trained for similar compute budgets. 1/3
Samuel L Smith tweet media
English
4
18
118
96.4K
Leonard Berrada retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Training with differential privacy (DP) prevents models from leaking sensitive training data, but it often incurs a large drop in accuracy. In recent work, our team substantially improved the performance of image classification with DP. Read more: dpmd.ai/dm-dp-sgd 1/
Google DeepMind tweet media
English
3
52
302
0
Leonard Berrada retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Excited to share #NeurIPS2020 papers on efficient and tight neural network verification, based on efficient solvers for LP and SDP relaxations. Implementations of these in JAX are also available as part of the new jax_verify library, described here: bit.ly/2TE1Qcc
Google DeepMind tweet media
English
3
66
341
0
Leonard Berrada
Leonard Berrada@LeonardBerrada·
TLDR: if the loss of your deep learning task can go to zero, you might want to give ALI-G a try. It can spare you the hassle of tuning a learning-rate schedule. Overall, we hope that this is a useful step towards easier and more reliable optimisation algorithms for deep learning.
English
0
1
2
0
Leonard Berrada
Leonard Berrada@LeonardBerrada·
We provide experiments on a variety of architectures (Differentiable Neural Computer, ResNets, bi-LSTMs) and datasets (SNLI, SVHN, CIFAR-10/100, ImageNet). More details in the paper: proceedings.icml.cc/static/paper_f… (arxiv version to be updated soon as well).
English
1
0
1
0
Leonard Berrada
Leonard Berrada@LeonardBerrada·
This week at #ICML2020 , I'll be presenting Adaptive Learning-rates for Interpolation with Gradients, aka ALI-G. ALI-G is designed to automatically adapt the learning-rate of SGD for deep learning when the loss can go to zero.
Leonard Berrada tweet media
English
1
2
8
0
Leonard Berrada
Leonard Berrada@LeonardBerrada·
@ID_AA_Carmack This can be seen as an example of a more general trade-off in optimization: how "good" the descent direction vs how easy it is to compute
English
0
0
1
0
Leonard Berrada
Leonard Berrada@LeonardBerrada·
@ID_AA_Carmack Then one can locally model the problem up to second order, which takes into account some level of interaction between layers. This yields Newton-type methods, but these are more computationally expensive
English
1
0
3
0
John Carmack
John Carmack@ID_AA_Carmack·
It bugs me a little that the gradient calculated by backprop in a neural network isn't actually the "steepest descent", because the partial derivatives between layers interact. Of course, optimizers are adapting everything anyway, but I wonder if there might be a structural hint.
English
30
33
271
0