Ed H. Chi

10.5K posts

Ed H. Chi

@edchi

Research VP @ GoogleDeepMind. ACM Fellow.

California เข้าร่วม Ekim 2007

3.8K กำลังติดตาม12.8K ผู้ติดตาม

Ed H. Chi@edchi·26 Şub

@hyhieu226 @OpenAI @xai Good luck Hieu. Still remember the days when we worked together. Hope you recover well!

English

356

Hieu Pham@hyhieu226·26 Şub

I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.

English

1.1K

415

14.1K

1.2M

Ed H. Chi รีทวีตแล้ว

Thang Luong@lmthang·25 Şub

Thrilled to share: #Aletheia, our math research agent, just solved 6/10 notoriously hard FirstProof problems autonomously, the best result in the inaugural challenge! To me, this is even bigger than our historic IMO-gold achievement last year; these problems challenge even top mathematicians. We share our results transparently, see paper and full thoughts in the thread. 👇

English

155

913

153K

Ed H. Chi@edchi·24 Şub

@spacegrep @denny_zhou @quocleix That's why I work on Project Astra. Situated intelligence in the real world is absolutely part of the research agenda.

English

spacegrep🏳️‍🌈@spacegrep·24 Şub

@edchi @denny_zhou @quocleix What do you think of the fact that the brain, the only gold standard "proof" of AGI we have today, uses more information dense sensory signals like visual signals and acts in a local environment, & performs some (probably a good) level of thinking beyond what is done in language?

English

Ed H. Chi@edchi·12 Şub

Thanks for this recognition, Mike. /cc @denny_zhou @quocleix

Mike Knoop@mikeknoop

The original CoT paper ("lets think step by step") from Jan 2022 is equally important as the Transformer ("attention is all you need"). arxiv.org/abs/2201.11903

English

7.7K

Ed H. Chi@edchi·24 Şub

@spacegrep @denny_zhou @quocleix yes, major bugs IMHO: - the current models are generally fixed minds, and only learns and compresses new knowledge during gradient descent. - the other learning / memory mechanism is in-context learning with CoT, but the model forgets it right after. Clearly insufficient.

English

spacegrep🏳️‍🌈@spacegrep·12 Şub

@edchi @denny_zhou @quocleix Any thoughts on the future of autoregressive language models and CoT in general?

English

Ed H. Chi@edchi·24 Şub

In the social media era, kids actually feel more loneliness---ironically. As a former social computing researcher, this is deeply depressing to me. freerangekids.com/surge-in-child… h/t Kristina Lerman #WSDM 2026 keynote

English

420

Ed H. Chi@edchi·12 Şub

@denny_zhou @quocleix in my not-so-humble opinion: - 1995-2015 most important 3 ideas are: reverse indexing with mapreduce, vector space models, deep learning. - 2015-2025 most important 3 ideas: seq2seq learning/transduction with transformers, CoT fine-tuning, and refinement using RL.

English

847

Ed H. Chi@edchi·10 Şub

@zzlccc @GoogleDeepMind @YiTayML @quocleix Welcome!

English

133

Zichen Liu@zzlccc·9 Şub

Thrilled to share that I’ve joined @GoogleDeepMind to work on Gemini post-training! I feel incredibly fortunate to be cooking on this sunny island under @YiTayML's leadership, within @quocleix's broader organization. Looking forward to enjoying RL research and pushing the frontiers of Gemini alongside such a brilliant team!

English

278

44.6K

Ed H. Chi@edchi·10 Şub

@JustinAngel @denny_zhou @_jasonwei @xuezhiw @MaartenBosma @brian_ichter @xf1280 @quocleix Sadly, Kind of far for us in the south bay. That's like 1hr each way or more these days.

English

Justin Angel@JustinAngel·27 Oca

@denny_zhou @_jasonwei @xuezhiw @MaartenBosma @brian_ichter @xf1280 @edchi @quocleix Hi, we'll be discussing your CoT 2022 paper in our SF commons reading group. Any chance you'd want to show up and hang with us? luma.com/riwz1uko?tk=BE…

English

122

Ed H. Chi@edchi·10 Şub

@bendee983 @denny_zhou Actually, in any large frontier lab, there are sufficient resources to do both. The question is incentives and allocation of energy.

English

Ben Dickson@bendee983·10 Şub

@denny_zhou Don't want to read too much into this. From your post, I suppose DeepMind believes in the second approach, which sounds exciting. But I thought David Silver left because he wanted to work on new approaches. Or am I missing something here?

English

256

Denny Zhou@denny_zhou·9 Şub

Two paths to AGI: fake it, or make it. Fake it by generating massive data to hack benchmarks. Make it by achieving breakthroughs in modeling and algorithms. Label yourself.

English

158

16.5K

Ed H. Chi รีทวีตแล้ว

Google AI@GoogleAI·15 Oca

Announcing Personal Intelligence, a more personalized @GeminiApp designed just for you. How it works: — Customized: With your permission, it reasons across your @Gmail, @YouTube, @GooglePhotos, and Search apps to share hyper-relevant and context-aware responses — Secure: If enabled, you control which Google apps to connect to. This setting is off by default — Useful: From travel plans based on your Google Photos to gym recommendations based on goals you’ve shared with Gemini, you get help tailored to your world Personal Intelligence in beta is rolling out to Google AI Pro and AI Ultra subscribers in the U.S., with expansions to the free tier, more countries, and AI Mode in Search to come. Take a look at the Gemini app's personalized assistance in the clip below, then let us know what you would use it for!

English

117

263

2.3K

319.2K

Ed H. Chi รีทวีตแล้ว

News from Google@NewsFromGoogle·12 Oca

Joint Statement: Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google's Gemini models and cloud technology. These models will help power future Apple Intelligence features, including a more personalized Siri coming this year. After careful evaluation, Apple determined that Google's Al technology provides the most capable foundation for Apple Foundation Models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple's industry-leading privacy standards.

English

1.6K

6.5K

52.4K

11M

Ed H. Chi@edchi·6 Oca

Hot take: Model capability gap and switching cost will determine much of the AI development race in 2026.

English

344

Ed H. Chi@edchi·5 Oca

@ChenSun92 @GoogleDeepMind It's been great to have you in the team!! Having so much fun with our skunkwork project.

English

Chen Sun 🤖@ChenSun92·1 Oca

Happy New Year from the Bay Area! Working at @GoogleDeepMind here for almost a year now has been the privilege of a lifetime. It is not only great science; occasionally you’re reminded how sublimely beautiful the place physically is 🌄 #HappyNewYear2026

English

9.1K

Ed H. Chi@edchi·18 Ara

I remember having this debate on small vs large models a little less than 3 years ago. Amazing.

Jeff Dean@JeffDean

We’ve pushed out the Pareto frontier of efficiency vs. intelligence again. With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models, now running at Flash-level latency. This opens up entirely new categories of near real-time applications that require complex thought. It’s available in the API, and rolling out today as the default model in AI Mode in Search and Gemini app globally. Read more on the blog at: bit.ly/4pTo5YU More in thread ⬇️

English

876

Ed H. Chi รีทวีตแล้ว

Aakash Gupta@aakashgupta·14 Ara

Ilya said the quiet part out loud on Dwarkesh's pod, but most people still aren't processing what it means. Here's what's actually happening inside AI labs. Research teams have entire divisions that do nothing but create new RL training environments specifically designed to boost benchmark scores. They treat AIME, SWE-bench, and MMLU like standardized tests. The model practices 10,000 hours on competitive programming problems until every proof technique is at its fingertips. Then it fails to fix a simple bug in production without introducing two new ones. Sutskever used the perfect analogy. Student A grinds 10,000 hours of competitive programming. Memorizes every algorithm, every edge case, every proof technique. Becomes the #1 ranked competitive coder in the world. Student B practices 100 hours but has "it." Intuition. Taste. The ability to learn new things quickly. Who has the better career? Student B. Current AI models are all Student A. The benchmark gaming runs deeper than most realize. Studies have shown data contamination inflates model scores by 20-80% on popular benchmarks. The training-test boundary is porous. Models memorize answers rather than learn concepts. And when you control for contamination, much of what looks like intelligence is pattern-matching on seen data. This explains the economic puzzle Ilya pointed to. Models score 100% on AIME 2025. They hit 70%+ on GDPval beating human professionals. Yet businesses still struggle to extract value. The benchmark performance says genius. The P&L says otherwise. The sample efficiency gap tells you everything. A human teenager learns to drive any car after 10 hours. An AI model might need millions of examples and still fail on slight variations. A human learns a concept once and applies it everywhere. Models need to see the exact pattern thousands of times and still choke when the formatting changes slightly. Sutskever's diagnosis: we're moving from the "age of scaling" (2020-2025) back to the "age of research." The belief that 100x more compute would transform everything is dying. His $3B company SSI is betting that the next breakthrough comes from solving generalization, not stacking more GPUs. The labs know this. That's why the benchmark arms race is accelerating. It's easier to show impressive numbers than admit the fundamental approach might be plateauing.

Nek@Enscion25

Ilya is 100% correct .it's a pattern that keeps repeating It's very clear with GPT5.2 Overfit the model to produce impressive looking benchmarks, have it excels in a few domains, but fall flat in many others. There's not enough generalization, and even if there is, the model has been so heavily reinforced that it becomes buried .

English

305

2.1K

322.3K

Ed H. Chi@edchi·11 Ara

The best thing about #NeurIPS is hearing about the different takes on what are the major new innovations from people who went to #NeurIPS. It's a giant elephant. :)

English

548

Ed H. Chi@edchi·28 Kas

@ChenhaoTan That's a good idea. Science is about both a private and a public discourse.

English

Chenhao Tan@ChenhaoTan·17 Kas

I vibe-coded a paper evaluator over the summer for the #agents4science conference: github.com/ChicagoHAI/pap… I find its reviews useful. Sharing it thought this might be useful for the ongoing discussion about ICLR reviews. BTW, I never used it for my own paper reviewing (somehow it never came across my mind).

English

Ed H. Chi รีทวีตแล้ว

Google DeepMind@GoogleDeepMind·20 Kas

We just dropped Nano Banana Pro, built on Gemini 3. 🍌 With state-of-the-art text rendering, vast world knowledge and studio-quality creative controls, Gemini 3 Pro Image can create and edit more complex visuals, infographics and more. Here’s what’s under the hood. 🧵

English

164

589

3.9K

1.5M

Ed H. Chi@edchi·19 Kas

@a_karvonen @karpathy spatial reasoning is still definitely work in progress. :D

English

431

Adam Karvonen@a_karvonen·18 Kas

@karpathy Gemini gets 0% accuracy on this spatial reasoning test, despite being "near-human level" at other spatial reasoning benchmarks. x.com/a_karvonen/sta…

Adam Karvonen@a_karvonen

Gemini 3 Pro is still at random chance accuracy for this spatial reasoning multiple choice test, like all other AI models.

English

140

34.3K

Andrej Karpathy@karpathy·18 Kas

I played with Gemini 3 yesterday via early access. Few thoughts - First I usually urge caution with public benchmarks because imo they can be quite possible to game. It comes down to discipline and self-restraint of the team (who is meanwhile strongly incentivized otherwise) to not overfit test sets via elaborate gymnastics over test-set adjacent data in the document embedding space. Realistically, because everyone else is doing it, the pressure to do so is high. Go talk to the model. Talk to the other models (Ride the LLM Cycle - use a different LLM every day). I had a positive early impression yesterday across personality, writing, vibe coding, humor, etc., very solid daily driver potential, clearly a tier 1 LLM, congrats to the team! Over the next few days/weeks, I am most curious and on a lookout for an ensemble over private evals, which a lot of people/orgs now seem to build for themselves and occasionally report on here.

English

225

403

7.8K

1.2M

ค้นพบ

@hyhieu226 @OpenAI @xai @spacegrep @denny_zhou @quocleix @zzlccc @GoogleDeepMind