Anthony Chen (@_anthonychen) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Anthony Chen@_anthonychen·21 Haz

Lots of discourse around long-context language models (LCLMs) subsuming RAG and retrieval but how close are we to this paradigm shift? Introducing LOFT a 1 million token benchmark spanning 6 tasks & 35 datasets to test LCLMs’ ability to do in-context retrieval & reasoning [1/10]

Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English

1

4

22

2.3K

Anthony Chen retweetledi

Samip@industriaalist·16 Mar

kinda wild that larry page had the bitter lesson figured out in ~2007. for context, sutton published his version in 2019

English

36

87

1.1K

152.8K

Anthony Chen retweetledi

Adam Brown@A_G_I_Joe·14 Oca

New paper out today, proving a novel theorem in algebraic geometry with an internal math-specialized version of Gemini. This was a collaboration between @GoogleDeepMind (Professor Freddie Manners and @GSalafatinos, hosted by the Blueshift team) and Professors Jim Bryan, Balazs Elek, and Ravi Vakil. arxiv.org/abs/2601.07222

English

55

329

1.7K

570.8K

Anthony Chen retweetledi

Google DeepMind@GoogleDeepMind·18 Kas

This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵

English

213

1.1K

6.5K

1.7M

Anthony Chen retweetledi

Orion Weller@orionweller·29 Ağu

Instructions/reasoning are now everywhere in retrieval - we want embeddings to do it all! 🚀 But... is it even possible? 🤔 Turns out, it's not possible for single-vector models 😱 theoretically and empirically! To make it obvious we OSS a simple eval SoTA models flop on! 🧵

English

15

82

322

33.6K

Anthony Chen retweetledi

Google DeepMind@GoogleDeepMind·5 Ağu

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

English

813

2.6K

13.4K

3.7M

Anthony Chen@_anthonychen·30 May

@yizhongwyz @UTAustin @UTCompSci Congratulations Yizhong!!

Filipino

1

0

1

211

Yizhong Wang@yizhongwyz·30 May

Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

English

103

54

670

75.8K

Anthony Chen retweetledi

Google Gemini@GeminiApp·21 May

Bon appétit 🍝

Marques Brownlee@MKBHD

We're barely 2 years from Will Smith eating spaghetti...

Français

204

536

10.2K

1.1M

Anthony Chen@_anthonychen·11 Nis

@arvind_io @Google @GoogleDeepMind welcome to the team!

English

0

1

543

Arvind Neelakantan@arvind_io·11 Nis

thrilled to be back @Google in the @GoogleDeepMind team! The technical breadth and expertise across the whole stack (hardware->infra->deep learning->products) is truly mind-blowing. Great to see a lot of familiar faces and meet new friends. Look forward to learning a lot!

English

33

32

1.1K

97.3K

Anthony Chen retweetledi

Arena.ai@arena·25 Mar

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇

Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English

72

398

2.3K

467.3K

Anthony Chen retweetledi

Google DeepMind@GoogleDeepMind·25 Mar

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English

90

507

2.5K

1.1M

Anthony Chen@_anthonychen·20 Mar

Historic.

Computer History Museum@ComputerHistory

In partnership with Google, CHM is excited to announce the public release and long-term preservation of the source code for AlexNet, the neural network that revolutionized AI. Learn more: hubs.li/Q03cL8-40 CHM’s GitHub access to open-source code: hubs.li/Q03cL6Dp0

English

0

1

167

Anthony Chen@_anthonychen·12 Mar

GDM's work converting Gemini into a SOTA dual encoder is now out! SOTA results across all benchmarks including exceptionally strong coding performance. Check out the tech report for more details and some interesting ablations!

Jinhyuk Lee@leejnhk

🎉 Gemini Embedding is LIVE! 🎉 Try our state-of-the-art text embedding model for FREE on Vertex AI (text-embedding-large-exp-03-07; 120 QPM) & AI Studio (gemini-embedding-exp-03-07)! ➡️ APIs: bit.ly/gem-embed-vert…, bit.ly/gem-embed-aist… ➡️ Report: bit.ly/gem-embed-paper

English

0

6

381

Anthony Chen retweetledi

Jim Fan@DrJimFan·3 Oca

This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I'm in tears. AI is not supposed to be 200B weights of stress and pain. It used to be a place of coffee-infused eureka moments, of exciting late-night arxiv safaris, of wicked smart ideas that put smile on our faces. But all the incoming capital and attention seem to be forcing everyone to race to the bottom. Jensen always tells us not to use phrases like "beat this, crush that". I absolutely love this perspective. We are here to lift up an entire ecosystem, not to send anyone to oblivion. I like to think of my work as expanding the pie. We need to bake the pie first, together, the bigger the better, before dividing it. It gives me comfort knowing that our team's works moved the needle for robotics, even just by a tiny bit. AI is not a zero sum game. In fact, it is perhaps the most positive-sum game that humanity ever plays. And we as a community should act this way. Take care of each other. Send love to "competitors" - because in the grand schemes of things, we are all coauthors of an accelerated future. I never had the privilege to know Felix irl, but I loved his research taste and set up Google Scholar alert for every one of his new papers. His works in agents and VLMs had a big influence on mine. He would've been a great friend. I want to get to know him, but I couldn't any more. RIP Felix. May the next world have no wars to fight.

English

90

389

3.7K

860.3K

Anthony Chen@_anthonychen·22 Haz

Thanks for sharing our paper!

elvis@omarsar0

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Google DeepMind conducts a deep performance analysis of long-context LLMs on in-context retrieval and reasoning. They first present a benchmark with real-world tasks requiring 1M token context. Report that long context LLMs can rival state-of-the-art retrieval and RAG systems, without any explicit training on the tasks. Suggests that compositional reasoning (required in SQL-like tasks) is still challenging for these LLMs. They also encourage the need for continued research on advanced prompting strategies as they noted significant boosts in performance when applying them for long context problems. Just imagine what will be possible when we can get to 10 million context windows. The promise of using long-context LLMs without relying on external tools like retrieval systems or databases is of huge interest but has a long way to go.

English

0

4

438

Anthony Chen@_anthonychen·22 Haz

@omarsar0 @srchvrs yup we focused on "is this possible" + emphasizing ease of dumping corpora into a long-context LM vs engineering complex RAG pipelines agreed still a ways to go! but IMO it's now a matter of time/effort before LCLMs are really fast via prefix caching & more efficiency work

English

1

0

4

140

elvis@omarsar0·21 Haz

@srchvrs This report was more on capability I believe. However, it would be interesting to have a comparison on cost, latency, and how much computational resources are used. IMO, I highly doubt the long-context LLMs of today can rival other LLM+external tool systems.

English

1

0

8

857

elvis@omarsar0·21 Haz

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Google DeepMind conducts a deep performance analysis of long-context LLMs on in-context retrieval and reasoning. They first present a benchmark with real-world tasks requiring 1M token context. Report that long context LLMs can rival state-of-the-art retrieval and RAG systems, without any explicit training on the tasks. Suggests that compositional reasoning (required in SQL-like tasks) is still challenging for these LLMs. They also encourage the need for continued research on advanced prompting strategies as they noted significant boosts in performance when applying them for long context problems. Just imagine what will be possible when we can get to 10 million context windows. The promise of using long-context LLMs without relying on external tools like retrieval systems or databases is of huge interest but has a long way to go.

English

6

57

264

31.7K

Anthony Chen retweetledi

Zhuyun Dai@ZhuyunDai·21 Haz

Thrilled to unveil LOFT, our latest research showing how long-context language models like Gemini can subsume retrieval, RAG and more, right out of the box. This is the dream-retriever I've always wanted! No special training, amazing reasoning, few-shot promptable...🤯

Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English

3

9

48

7.2K

Anthony Chen retweetledi

Yi Luan@YiLuan9·21 Haz

Very happy to contribute to the Multimodal benchmarking on the LOFT project! Very excited to see with fewshot prompting only, Gemini 1.5 pro can already outperform CLIP on Coco, Flickr, by a large margin on 1M context tokens!

Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English

0

4

12

1.1K

Anthony Chen@_anthonychen·21 Haz

Thanks for sharing our work!

Aran Komatsuzaki@arankomatsuzaki

Google presents Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Long-context LM: - Often rivals SotA retrieval and RAG systems - But still struggles with areas like compositional reasoning repo: github.com/google-deepmin… abs: arxiv.org/abs/2406.13121

English

0

1

21

6.1K

Anthony Chen retweetledi

Sebastian Riedel (@riedelcastro@sigmoid.social)

Sebastian Riedel (@[email protected])@riedelcastro·21 Haz

"just put the corpus into the context"! Long context models can already match or beat various bespoke pipelines and infra in accuracy on non-trivial tasks! Hadn't expected this so soon, and honestly was hoping to milk RAG impact for a little longer 🤪

Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English

4

16

49

9.5K

Anthony Chen retweetledi

Kelvin Guu@kelvin_guu·21 Haz

Do long-context LMs obsolete retrieval, RAG, SQL and more? Excited to share our answer! arxiv.org/abs/2406.13121 from the team at @GoogleDeepMind that wrote one of the 1st papers on RAG (REALM) and repeat SOTA on retrieval (Promptagator, Gecko). w/ Gemini 1.5 Pro, the answer is 🧵

Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English

2

12

72

11.7K

Anthony Chen

Keşfet