Ed H. Chi

10.5K posts

Ed H. Chi banner
Ed H. Chi

Ed H. Chi

@edchi

Research VP @ GoogleDeepMind. ACM Fellow.

California เข้าร่วม Ekim 2007
3.8K กำลังติดตาม12.8K ผู้ติดตาม
Ed H. Chi
Ed H. Chi@edchi·
@hyhieu226 @OpenAI @xai Good luck Hieu. Still remember the days when we worked together. Hope you recover well!
English
0
0
3
356
Hieu Pham
Hieu Pham@hyhieu226·
I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.
English
1.1K
415
14.1K
1.2M
Ed H. Chi รีทวีตแล้ว
Thang Luong
Thang Luong@lmthang·
Thrilled to share: #Aletheia, our math research agent, just solved 6/10 notoriously hard FirstProof problems autonomously, the best result in the inaugural challenge! To me, this is even bigger than our historic IMO-gold achievement last year; these problems challenge even top mathematicians. We share our results transparently, see paper and full thoughts in the thread. 👇
Thang Luong tweet media
English
24
155
913
153K
spacegrep🏳️‍🌈
spacegrep🏳️‍🌈@spacegrep·
@edchi @denny_zhou @quocleix What do you think of the fact that the brain, the only gold standard "proof" of AGI we have today, uses more information dense sensory signals like visual signals and acts in a local environment, & performs some (probably a good) level of thinking beyond what is done in language?
English
1
0
0
33
Ed H. Chi
Ed H. Chi@edchi·
@spacegrep @denny_zhou @quocleix yes, major bugs IMHO: - the current models are generally fixed minds, and only learns and compresses new knowledge during gradient descent. - the other learning / memory mechanism is in-context learning with CoT, but the model forgets it right after. Clearly insufficient.
English
2
0
1
40
Ed H. Chi
Ed H. Chi@edchi·
In the social media era, kids actually feel more loneliness---ironically. As a former social computing researcher, this is deeply depressing to me. freerangekids.com/surge-in-child… h/t Kristina Lerman #WSDM 2026 keynote
English
0
0
7
420
Ed H. Chi
Ed H. Chi@edchi·
@denny_zhou @quocleix in my not-so-humble opinion: - 1995-2015 most important 3 ideas are: reverse indexing with mapreduce, vector space models, deep learning. - 2015-2025 most important 3 ideas: seq2seq learning/transduction with transformers, CoT fine-tuning, and refinement using RL.
English
0
0
10
847
Zichen Liu
Zichen Liu@zzlccc·
Thrilled to share that I’ve joined @GoogleDeepMind to work on Gemini post-training! I feel incredibly fortunate to be cooking on this sunny island under @YiTayML's leadership, within @quocleix's broader organization. Looking forward to enjoying RL research and pushing the frontiers of Gemini alongside such a brilliant team!
Zichen Liu tweet media
English
47
8
278
44.6K
Ed H. Chi
Ed H. Chi@edchi·
@bendee983 @denny_zhou Actually, in any large frontier lab, there are sufficient resources to do both. The question is incentives and allocation of energy.
English
0
0
1
41
Ben Dickson
Ben Dickson@bendee983·
@denny_zhou Don't want to read too much into this. From your post, I suppose DeepMind believes in the second approach, which sounds exciting. But I thought David Silver left because he wanted to work on new approaches. Or am I missing something here?
English
1
0
0
256
Denny Zhou
Denny Zhou@denny_zhou·
Two paths to AGI: fake it, or make it. Fake it by generating massive data to hack benchmarks. Make it by achieving breakthroughs in modeling and algorithms. Label yourself.
Denny Zhou tweet media
English
29
10
158
16.5K
Ed H. Chi รีทวีตแล้ว
Google AI
Google AI@GoogleAI·
Announcing Personal Intelligence, a more personalized @GeminiApp designed just for you. How it works: — Customized: With your permission, it reasons across your @Gmail, @YouTube, @GooglePhotos, and Search apps to share hyper-relevant and context-aware responses — Secure: If enabled, you control which Google apps to connect to. This setting is off by default — Useful: From travel plans based on your Google Photos to gym recommendations based on goals you’ve shared with Gemini, you get help tailored to your world Personal Intelligence in beta is rolling out to Google AI Pro and AI Ultra subscribers in the U.S., with expansions to the free tier, more countries, and AI Mode in Search to come. Take a look at the Gemini app's personalized assistance in the clip below, then let us know what you would use it for!
English
117
263
2.3K
319.2K
Ed H. Chi รีทวีตแล้ว
News from Google
News from Google@NewsFromGoogle·
Joint Statement: Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google's Gemini models and cloud technology. These models will help power future Apple Intelligence features, including a more personalized Siri coming this year. After careful evaluation, Apple determined that Google's Al technology provides the most capable foundation for Apple Foundation Models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple's industry-leading privacy standards.
English
1.6K
6.5K
52.4K
11M
Ed H. Chi
Ed H. Chi@edchi·
Hot take: Model capability gap and switching cost will determine much of the AI development race in 2026.
English
0
0
2
344
Chen Sun 🤖
Chen Sun 🤖@ChenSun92·
Happy New Year from the Bay Area! Working at @GoogleDeepMind here for almost a year now has been the privilege of a lifetime. It is not only great science; occasionally you’re reminded how sublimely beautiful the place physically is 🌄 #HappyNewYear2026
Chen Sun 🤖 tweet media
English
5
1
80
9.1K
Ed H. Chi รีทวีตแล้ว
Aakash Gupta
Aakash Gupta@aakashgupta·
Ilya said the quiet part out loud on Dwarkesh's pod, but most people still aren't processing what it means. Here's what's actually happening inside AI labs. Research teams have entire divisions that do nothing but create new RL training environments specifically designed to boost benchmark scores. They treat AIME, SWE-bench, and MMLU like standardized tests. The model practices 10,000 hours on competitive programming problems until every proof technique is at its fingertips. Then it fails to fix a simple bug in production without introducing two new ones. Sutskever used the perfect analogy. Student A grinds 10,000 hours of competitive programming. Memorizes every algorithm, every edge case, every proof technique. Becomes the #1 ranked competitive coder in the world. Student B practices 100 hours but has "it." Intuition. Taste. The ability to learn new things quickly. Who has the better career? Student B. Current AI models are all Student A. The benchmark gaming runs deeper than most realize. Studies have shown data contamination inflates model scores by 20-80% on popular benchmarks. The training-test boundary is porous. Models memorize answers rather than learn concepts. And when you control for contamination, much of what looks like intelligence is pattern-matching on seen data. This explains the economic puzzle Ilya pointed to. Models score 100% on AIME 2025. They hit 70%+ on GDPval beating human professionals. Yet businesses still struggle to extract value. The benchmark performance says genius. The P&L says otherwise. The sample efficiency gap tells you everything. A human teenager learns to drive any car after 10 hours. An AI model might need millions of examples and still fail on slight variations. A human learns a concept once and applies it everywhere. Models need to see the exact pattern thousands of times and still choke when the formatting changes slightly. Sutskever's diagnosis: we're moving from the "age of scaling" (2020-2025) back to the "age of research." The belief that 100x more compute would transform everything is dying. His $3B company SSI is betting that the next breakthrough comes from solving generalization, not stacking more GPUs. The labs know this. That's why the benchmark arms race is accelerating. It's easier to show impressive numbers than admit the fundamental approach might be plateauing.
Nek@Enscion25

Ilya is 100% correct .it's a pattern that keeps repeating It's very clear with GPT5.2 Overfit the model to produce impressive looking benchmarks, have it excels in a few domains, but fall flat in many others. There's not enough generalization, and even if there is, the model has been so heavily reinforced that it becomes buried .

English
94
305
2.1K
322.3K
Ed H. Chi
Ed H. Chi@edchi·
The best thing about #NeurIPS is hearing about the different takes on what are the major new innovations from people who went to #NeurIPS. It's a giant elephant. :)
English
0
0
3
548
Ed H. Chi
Ed H. Chi@edchi·
@ChenhaoTan That's a good idea. Science is about both a private and a public discourse.
English
0
0
1
81
Chenhao Tan
Chenhao Tan@ChenhaoTan·
I vibe-coded a paper evaluator over the summer for the #agents4science conference: github.com/ChicagoHAI/pap… I find its reviews useful. Sharing it thought this might be useful for the ongoing discussion about ICLR reviews. BTW, I never used it for my own paper reviewing (somehow it never came across my mind).
English
3
13
53
7K
Ed H. Chi รีทวีตแล้ว
Google DeepMind
Google DeepMind@GoogleDeepMind·
We just dropped Nano Banana Pro, built on Gemini 3. 🍌 With state-of-the-art text rendering, vast world knowledge and studio-quality creative controls, Gemini 3 Pro Image can create and edit more complex visuals, infographics and more. Here’s what’s under the hood. 🧵
English
164
589
3.9K
1.5M
Andrej Karpathy
Andrej Karpathy@karpathy·
I played with Gemini 3 yesterday via early access. Few thoughts - First I usually urge caution with public benchmarks because imo they can be quite possible to game. It comes down to discipline and self-restraint of the team (who is meanwhile strongly incentivized otherwise) to not overfit test sets via elaborate gymnastics over test-set adjacent data in the document embedding space. Realistically, because everyone else is doing it, the pressure to do so is high. Go talk to the model. Talk to the other models (Ride the LLM Cycle - use a different LLM every day). I had a positive early impression yesterday across personality, writing, vibe coding, humor, etc., very solid daily driver potential, clearly a tier 1 LLM, congrats to the team! Over the next few days/weeks, I am most curious and on a lookout for an ensemble over private evals, which a lot of people/orgs now seem to build for themselves and occasionally report on here.
English
225
403
7.8K
1.2M