Sebastian Borgeaud

36 posts

Sebastian Borgeaud

@borgeaud_s

Research Engineer @GoogleDeepMind Lead for Gemini pre-training

Katılım Temmuz 2015

272 Takip Edilen2.5K Takipçiler

Sebastian Borgeaud retweetledi

Jack Rae@jack_w_rae·25 Mar

We are hosting a space at 12:20. Tune in, chat with us, and learn more about 2.5 Pro from the team. Perspectives from pre-training, post-training, thinking and product!

Google AI Developers@googleaidevs

Join the team behind Gemini 2.5 as they dive into the model’s thinking and coding advancements. 🎙️Space starts at 12:20pm PT. Drop your questions below. x.com/i/spaces/1MYxN…

English

7.4K

Sebastian Borgeaud retweetledi

Dmitry (Dima) Lepikhin@lepikhin·25 Mar

We have amazing cadence of pushing frontier forward! *hiring in Performace (team is industry SOTA by a big margin)

Arena.ai@arena

Gemini 2.5 Pro #1 across ALL categories, tied #1 with Grok-3/GPT-4.5 for Hard Prompts and Coding, and edged out across all others to take the lead 🏇🏆

English

113

18.9K

Sebastian Borgeaud retweetledi

Joost van Amersfoort@joost_v_amersf·5 Şub

Interested in helping us make Gemini Pro even better? The Gemini pre-training team is looking for a Research Scientist in London to push the boundaries of LLM scaling: understanding, predicting, and improving. ♊️🚀 Apply here: boards.greenhouse.io/deepmind/jobs/…

Google DeepMind@GoogleDeepMind

2.0 Pro Experimental is our best model yet for coding and complex prompts, refined with your feedback. 🤝 It has a better understanding of world-knowledge and comes with our largest context window yet of 2 million tokens - meaning it can analyze large amounts of information.

English

33K

Sebastian Borgeaud retweetledi

Paul Michel@pmichelX·11 Ara

Interested in working on Gemini pre-training? I'm hiring a research scientist to work on pre-training data @GoogleDeepMind in London: boards.greenhouse.io/deepmind/jobs/… I am unfortunately not at #NeurIPS2024 but feel free to reach out to ask questions or see the team at the booth there!

English

159

27.3K

Sebastian Borgeaud@borgeaud_s·18 Nis

This also explains our very small confidence intervals. We initialize each bootstrap sample with the parameters from a full fit, but these would terminate almost immediately due to the tolerance hyper-parameters being off. Fixing those also gives us reasonable confidence bounds.

English

4.8K

Sebastian Borgeaud@borgeaud_s·18 Nis

Either using the sum of losses or changing the tolerance parameters fixes the issue. With this we can match the results you found. All 3 approaches now give the same estimates!

English

3.3K

Sebastian Borgeaud@borgeaud_s·18 Nis

Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)

Tamay Besiroglu@tamaybes

The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)

English

243

77.2K

Sebastian Borgeaud retweetledi

Oriol Vinyals@OriolVinyalsML·15 Şub

Gemini 1.5 has arrived. Pro 1.5 with 1M tokens available as an experimental feature via AI Studio and Vertex AI in private preview. Then there’s this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 🤯10M 🤯 tokens for text. From Shannon’s 1950s bi-gram models (2 tokens), and after being mesmerized by LSTMs many years ago able to model 200 tokens, it feels almost impossible that I would be talking about hundreds of thousands of tokens in context length, let alone millions. ♊️💙 Tech report: goo.gle/GeminiV1-5

Demis Hassabis@demishassabis

In December we began the Gemini Era, and we’ve continued to make relentless progress since. Today we’re thrilled to introduce the next generation: Gemini 1.5 - hugely enhanced performance, highly efficient architecture & long-context length breakthrough blog.google/technology/ai/…

English

167

870

380.7K

Sebastian Borgeaud@borgeaud_s·18 Ara

@PS3Ydf @duolingo @duolingo help this man out pls

English

177

Sebastian Borgeaud retweetledi

Laurent Sifre@laurentsifre·30 Kas

Join us at the Chinchilla poster tomorrow to discuss LLMs and compute optimal scaling! Wed 30 Nov 4:30 p.m. CST — 6 p.m. Hall J #639 #NeurIPS2022

New Orleans, LA 🇺🇸 English

Sebastian Borgeaud@borgeaud_s·29 Kas

I'm at NeurIPS this week! Feel free to reach out if you'd like to talk about LLMs, the challenges of large scale model training or our work at @DeepMind (Chinchilla, Flamingo, RETRO, ...)

English

Sebastian Borgeaud retweetledi

Diego de las Casas@diegolascasas·21 Tem

Great session yesterday at #ICML2022! We're also pleased to announce we have released the data we used to fit the scaling laws: github.com/deepmind/scali… You can load it in your browser for free with Colab. Check it out!

Aidan Clark@_aidan_clark_

And now we’re about to present our poster! Stand 304, right by the entrance! Can’t miss us, any and all questions welcome :)

English

Sebastian Borgeaud retweetledi

Arthur Mensch@arthurmensch·19 Tem

We're presenting RETRO at 4:15pm @icmlconf with @borgeaud_s, and later today at the poster session. Add a retrieval DB to divide your model size by 10, don't miss out!

English

122

Sebastian Borgeaud retweetledi

Aidan Clark@_aidan_clark_·19 Tem

If you're groggily waking up at #ICML2022 and trying to figure out what to go see after the invited talk, check out the Deep Learning session (icml.cc/virtual/2022/s…) where we'll be presenting Unified Scaling Laws for Routed Language Models at 11!

English

Sebastian Borgeaud@borgeaud_s·16 Tem

At ICML next week to present our work on scaling retrieval for LMs, on scaling laws for routed LMs, and on long range sequence modelling. Please reach out if you want to discuss either of these :) dpmd.ai/llm-retrieval dpmd.ai/dm-routing dpmd.ai/dm-perceiver-ar

English

Mitchell Gordon@MitchellAGordon·3 Tem

"RETRO is so fast and cheap, in fact, that I cannot fathom why anyone would choose to do language modeling without retrieval." New blog post benchmarking RETRO's database! mitchgordon.me/ml/2022/07/01/…

English

287

Sebastian Borgeaud@borgeaud_s·3 Tem

@MitchellAGordon Great blog post :) As you mention towards the end, embedding a chunk on CPU takes about 10ms so it's likely cheaper to do the embedding pre-computation on CPU rather than GPUs!

English

Sebastian Borgeaud retweetledi

Ethan Perez@EthanJPerez·27 Haz

We’re announcing the Inverse Scaling Prize: a $100k grand prize + $150k in additional prizes for finding an important task where larger language models do *worse*. Link to contest details: github.com/inverse-scalin… 🧵

English

298

1.5K

Sebastian Borgeaud retweetledi

Google DeepMind@GoogleDeepMind·16 Haz

Real-world data contains complex patterns that play out over long contexts in space or time. By attending to more context with little computational overhead, Perceiver AR generates excellent results on images, text, and music: dpmd.ai/dm-perceiver-ar 1/

English

279

Keşfet

@GoogleDeepMind @PS3Ydf @duolingo @icmlconf @MitchellAGordon @elonmusk @BarackObama @taylorswift13