Sebastian Borgeaud

36 posts

Sebastian Borgeaud

Sebastian Borgeaud

@borgeaud_s

Research Engineer @GoogleDeepMind Lead for Gemini pre-training

Katılım Temmuz 2015
272 Takip Edilen2.5K Takipçiler
Sebastian Borgeaud retweetledi
Dmitry (Dima) Lepikhin
Dmitry (Dima) Lepikhin@lepikhin·
We have amazing cadence of pushing frontier forward! *hiring in Performace (team is industry SOTA by a big margin)
Arena.ai@arena

Gemini 2.5 Pro #1 across ALL categories, tied #1 with Grok-3/GPT-4.5 for Hard Prompts and Coding, and edged out across all others to take the lead 🏇🏆

English
2
10
113
18.9K
Sebastian Borgeaud retweetledi
Joost van Amersfoort
Joost van Amersfoort@joost_v_amersf·
Interested in helping us make Gemini Pro even better? The Gemini pre-training team is looking for a Research Scientist in London to push the boundaries of LLM scaling: understanding, predicting, and improving. ♊️🚀 Apply here: boards.greenhouse.io/deepmind/jobs/…
Google DeepMind@GoogleDeepMind

2.0 Pro Experimental is our best model yet for coding and complex prompts, refined with your feedback. 🤝 It has a better understanding of world-knowledge and comes with our largest context window yet of 2 million tokens - meaning it can analyze large amounts of information.

English
0
19
64
33K
Sebastian Borgeaud
Sebastian Borgeaud@borgeaud_s·
This also explains our very small confidence intervals. We initialize each bootstrap sample with the parameters from a full fit, but these would terminate almost immediately due to the tolerance hyper-parameters being off. Fixing those also gives us reasonable confidence bounds.
English
3
0
29
4.8K
Sebastian Borgeaud
Sebastian Borgeaud@borgeaud_s·
Either using the sum of losses or changing the tolerance parameters fixes the issue. With this we can match the results you found. All 3 approaches now give the same estimates!
English
1
0
25
3.3K
Sebastian Borgeaud
Sebastian Borgeaud@borgeaud_s·
Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)
Tamay Besiroglu@tamaybes

The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)

English
10
35
243
77.2K
Sebastian Borgeaud retweetledi
Oriol Vinyals
Oriol Vinyals@OriolVinyalsML·
Gemini 1.5 has arrived. Pro 1.5 with 1M tokens available as an experimental feature via AI Studio and Vertex AI in private preview. Then there’s this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 🤯10M 🤯 tokens for text. From Shannon’s 1950s bi-gram models (2 tokens), and after being mesmerized by LSTMs many years ago able to model 200 tokens, it feels almost impossible that I would be talking about hundreds of thousands of tokens in context length, let alone millions. ♊️💙 Tech report: goo.gle/GeminiV1-5
Oriol Vinyals tweet media
Demis Hassabis@demishassabis

In December we began the Gemini Era, and we’ve continued to make relentless progress since. Today we’re thrilled to introduce the next generation: Gemini 1.5 - hugely enhanced performance, highly efficient architecture & long-context length breakthrough blog.google/technology/ai/…

English
55
167
870
380.7K
Sebastian Borgeaud retweetledi
Laurent Sifre
Laurent Sifre@laurentsifre·
Join us at the Chinchilla poster tomorrow to discuss LLMs and compute optimal scaling! Wed 30 Nov 4:30 p.m. CST — 6 p.m. Hall J #639 #NeurIPS2022
Laurent Sifre tweet media
New Orleans, LA 🇺🇸 English
1
10
51
0
Sebastian Borgeaud
Sebastian Borgeaud@borgeaud_s·
I'm at NeurIPS this week! Feel free to reach out if you'd like to talk about LLMs, the challenges of large scale model training or our work at @DeepMind (Chinchilla, Flamingo, RETRO, ...)
English
0
0
76
0
Sebastian Borgeaud retweetledi
Arthur Mensch
Arthur Mensch@arthurmensch·
We're presenting RETRO at 4:15pm @icmlconf with @borgeaud_s, and later today at the poster session. Add a retrieval DB to divide your model size by 10, don't miss out!
Arthur Mensch tweet media
English
1
16
122
0
Sebastian Borgeaud retweetledi
Aidan Clark
Aidan Clark@_aidan_clark_·
If you're groggily waking up at #ICML2022 and trying to figure out what to go see after the invited talk, check out the Deep Learning session (icml.cc/virtual/2022/s…) where we'll be presenting Unified Scaling Laws for Routed Language Models at 11!
English
2
3
32
0
Mitchell Gordon
Mitchell Gordon@MitchellAGordon·
"RETRO is so fast and cheap, in fact, that I cannot fathom why anyone would choose to do language modeling without retrieval." New blog post benchmarking RETRO's database! mitchgordon.me/ml/2022/07/01/…
English
5
41
287
0
Sebastian Borgeaud
Sebastian Borgeaud@borgeaud_s·
@MitchellAGordon Great blog post :) As you mention towards the end, embedding a chunk on CPU takes about 10ms so it's likely cheaper to do the embedding pre-computation on CPU rather than GPUs!
English
0
0
4
0
Sebastian Borgeaud retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
We’re announcing the Inverse Scaling Prize: a $100k grand prize + $150k in additional prizes for finding an important task where larger language models do *worse*. Link to contest details: github.com/inverse-scalin… 🧵
Ethan Perez tweet media
English
43
298
1.5K
0
Sebastian Borgeaud retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Real-world data contains complex patterns that play out over long contexts in space or time. By attending to more context with little computational overhead, Perceiver AR generates excellent results on images, text, and music: dpmd.ai/dm-perceiver-ar 1/
Google DeepMind tweet media
English
2
60
279
0