Anthony Chen

191 posts

Anthony Chen banner
Anthony Chen

Anthony Chen

@_anthonychen

ai research phd @ucirvine

little worm in big apple Katılım Mayıs 2017
515 Takip Edilen476 Takipçiler
Sabitlenmiş Tweet
Anthony Chen
Anthony Chen@_anthonychen·
Lots of discourse around long-context language models (LCLMs) subsuming RAG and retrieval but how close are we to this paradigm shift? Introducing LOFT a 1 million token benchmark spanning 6 tasks & 35 datasets to test LCLMs’ ability to do in-context retrieval & reasoning [1/10]
Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English
1
4
22
2.3K
Anthony Chen retweetledi
Samip
Samip@industriaalist·
kinda wild that larry page had the bitter lesson figured out in ~2007. for context, sutton published his version in 2019
English
36
87
1.1K
152.8K
Anthony Chen retweetledi
Adam Brown
Adam Brown@A_G_I_Joe·
New paper out today, proving a novel theorem in algebraic geometry with an internal math-specialized version of Gemini. This was a collaboration between @GoogleDeepMind (Professor Freddie Manners and @GSalafatinos, hosted by the Blueshift team) and Professors Jim Bryan, Balazs Elek, and Ravi Vakil. arxiv.org/abs/2601.07222
English
55
329
1.7K
570.8K
Anthony Chen retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
English
213
1.1K
6.5K
1.7M
Anthony Chen retweetledi
Orion Weller
Orion Weller@orionweller·
Instructions/reasoning are now everywhere in retrieval - we want embeddings to do it all! 🚀 But... is it even possible? 🤔 Turns out, it's not possible for single-vector models 😱 theoretically and empirically! To make it obvious we OSS a simple eval SoTA models flop on! 🧵
Orion Weller tweet media
English
15
82
322
33.6K
Anthony Chen retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵
English
813
2.6K
13.4K
3.7M
Yizhong Wang
Yizhong Wang@yizhongwyz·
Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘
Yizhong Wang tweet mediaYizhong Wang tweet media
English
103
54
670
75.8K
Arvind Neelakantan
Arvind Neelakantan@arvind_io·
thrilled to be back @Google in the @GoogleDeepMind team! The technical breadth and expertise across the whole stack (hardware->infra->deep learning->products) is truly mind-blowing. Great to see a lot of familiar faces and meet new friends. Look forward to learning a lot!
English
33
32
1.1K
97.3K
Anthony Chen retweetledi
Arena.ai
Arena.ai@arena·
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇
Arena.ai tweet media
Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English
72
398
2.3K
467.3K
Anthony Chen retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf
English
90
507
2.5K
1.1M
Anthony Chen
Anthony Chen@_anthonychen·
GDM's work converting Gemini into a SOTA dual encoder is now out! SOTA results across all benchmarks including exceptionally strong coding performance. Check out the tech report for more details and some interesting ablations!
Jinhyuk Lee@leejnhk

🎉 Gemini Embedding is LIVE! 🎉 Try our state-of-the-art text embedding model for FREE on Vertex AI (text-embedding-large-exp-03-07; 120 QPM) & AI Studio (gemini-embedding-exp-03-07)! ➡️ APIs: bit.ly/gem-embed-vert…, bit.ly/gem-embed-aist… ➡️ Report: bit.ly/gem-embed-paper

English
0
0
6
381
Anthony Chen retweetledi
Jim Fan
Jim Fan@DrJimFan·
This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I'm in tears. AI is not supposed to be 200B weights of stress and pain. It used to be a place of coffee-infused eureka moments, of exciting late-night arxiv safaris, of wicked smart ideas that put smile on our faces. But all the incoming capital and attention seem to be forcing everyone to race to the bottom. Jensen always tells us not to use phrases like "beat this, crush that". I absolutely love this perspective. We are here to lift up an entire ecosystem, not to send anyone to oblivion. I like to think of my work as expanding the pie. We need to bake the pie first, together, the bigger the better, before dividing it. It gives me comfort knowing that our team's works moved the needle for robotics, even just by a tiny bit. AI is not a zero sum game. In fact, it is perhaps the most positive-sum game that humanity ever plays. And we as a community should act this way. Take care of each other. Send love to "competitors" - because in the grand schemes of things, we are all coauthors of an accelerated future. I never had the privilege to know Felix irl, but I loved his research taste and set up Google Scholar alert for every one of his new papers. His works in agents and VLMs had a big influence on mine. He would've been a great friend. I want to get to know him, but I couldn't any more. RIP Felix. May the next world have no wars to fight.
Jim Fan tweet media
English
90
389
3.7K
860.3K
Anthony Chen
Anthony Chen@_anthonychen·
@omarsar0 @srchvrs yup we focused on "is this possible" + emphasizing ease of dumping corpora into a long-context LM vs engineering complex RAG pipelines agreed still a ways to go! but IMO it's now a matter of time/effort before LCLMs are really fast via prefix caching & more efficiency work
Anthony Chen tweet media
English
1
0
4
140
elvis
elvis@omarsar0·
@srchvrs This report was more on capability I believe. However, it would be interesting to have a comparison on cost, latency, and how much computational resources are used. IMO, I highly doubt the long-context LLMs of today can rival other LLM+external tool systems.
English
1
0
8
857
elvis
elvis@omarsar0·
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Google DeepMind conducts a deep performance analysis of long-context LLMs on in-context retrieval and reasoning. They first present a benchmark with real-world tasks requiring 1M token context. Report that long context LLMs can rival state-of-the-art retrieval and RAG systems, without any explicit training on the tasks. Suggests that compositional reasoning (required in SQL-like tasks) is still challenging for these LLMs. They also encourage the need for continued research on advanced prompting strategies as they noted significant boosts in performance when applying them for long context problems. Just imagine what will be possible when we can get to 10 million context windows. The promise of using long-context LLMs without relying on external tools like retrieval systems or databases is of huge interest but has a long way to go.
elvis tweet media
English
6
57
264
31.7K
Anthony Chen retweetledi
Zhuyun Dai
Zhuyun Dai@ZhuyunDai·
Thrilled to unveil LOFT, our latest research showing how long-context language models like Gemini can subsume retrieval, RAG and more, right out of the box. This is the dream-retriever I've always wanted! No special training, amazing reasoning, few-shot promptable...🤯
Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English
3
9
48
7.2K
Anthony Chen retweetledi
Yi Luan
Yi Luan@YiLuan9·
Very happy to contribute to the Multimodal benchmarking on the LOFT project! Very excited to see with fewshot prompting only, Gemini 1.5 pro can already outperform CLIP on Coco, Flickr, by a large margin on 1M context tokens!
Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English
0
4
12
1.1K
Anthony Chen retweetledi
Sebastian Riedel (@riedelcastro@sigmoid.social)
"just put the corpus into the context"! Long context models can already match or beat various bespoke pipelines and infra in accuracy on non-trivial tasks! Hadn't expected this so soon, and honestly was hoping to milk RAG impact for a little longer 🤪
Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English
4
16
49
9.5K
Anthony Chen retweetledi
Kelvin Guu
Kelvin Guu@kelvin_guu·
Do long-context LMs obsolete retrieval, RAG, SQL and more? Excited to share our answer! arxiv.org/abs/2406.13121 from the team at @GoogleDeepMind that wrote one of the 1st papers on RAG (REALM) and repeat SOTA on retrieval (Promptagator, Gecko). w/ Gemini 1.5 Pro, the answer is 🧵
Jinhyuk Lee@leejnhk

Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more? Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks! arxiv.org/abs/2406.13121

English
2
12
72
11.7K