Pete Shaw

110 posts

Pete Shaw

Pete Shaw

@ptshaw2

Research Scientist @GoogleDeepmind

Seattle, WA Katılım Ocak 2013
517 Takip Edilen696 Takipçiler
Pete Shaw retweetledi
Archiki Prasad
Archiki Prasad@ArchikiPrasad·
🚨 I’m on the 2026 Research Scientist Job Market! I am a PhD student at UNC Chapel Hill (advised by @mohitban47) and recipient of the Apple Scholars in AI/ML PhD Fellowship. My research centers around: 🔸Reasoning & RL/Post-Training: Evaluating and interpreting the reasoning process, and improving post-training and alignment through self-generated and reward-based signals (Intrinsic Dim., ReCEVAL, ScPO, LASeR). 🔸Agents & Planning: Designing adaptive agent frameworks to that use extra test-time compute & reasoning upon failure (ADaPT, System-1.x, PRInTS). 🔸Reward & Skill Discovery in Code: Leveraging execution signals to build reliable rewards, automate debugging, and discover abstractions in code (UTGen, ReGAL). Prev (Research Intern): Google DeepMind, Meta FAIR, Allen Institute for AI (AI2), and Adobe Research. Feel free to reach out via DM or email if you’re interested, have leads, or would like to connect! 🌐 archiki.github.io 📧 archiki@cs.unc.edu #NLP #AI #JobSearch
English
15
59
344
55.3K
Pete Shaw
Pete Shaw@ptshaw2·
@AdaptiveAgents Seems like learnability challenges are more relevant than expressivity limits in the context of approximating universal compressors?
English
0
0
2
456
Pedro A. Ortega
Pedro A. Ortega@AdaptiveAgents·
The fact alone that a universal compressor is at least as long as the longest program in the class it closes over should be enough to show that artificial intelligence will forever remain a moving goal post.
English
2
1
33
13.4K
Pete Shaw retweetledi
François Chollet
François Chollet@fchollet·
The goal of AI should not be to replace human thought and human agency, but to expand them. Not everything needs to be automated.
English
143
96
940
66.5K
Pete Shaw
Pete Shaw@ptshaw2·
@fchollet This view is often used to motivate symbolic representations, but DL models can in theory also learn optimal compression if we move past parameter counting as a description length measure: arxiv.org/abs/2509.22445 But either way, hard to optimize.
English
0
0
4
188
François Chollet
François Chollet@fchollet·
To perfectly understand a phenomenon is to perfectly compress it, to have a model of it that cannot be made any simpler. If a DL model requires millions parameters to model something that can be described by a differential equation of three terms, it has not really understood it, it has merely cached the data.
English
160
153
1.6K
122.7K
Pete Shaw
Pete Shaw@ptshaw2·
Good time to plug our recent paper connecting the notion of Kolmogorov complexity to Transformers, inspired by the work of Schmidhuber and many others... 🧵
Jürgen Schmidhuber@SchmidhuberAI

English
1
0
4
345
Pete Shaw retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Our new Gemini 2.5 Computer Use model can navigate browsers just like you do. 🌐 It builds on Gemini’s visual understanding and reasoning capabilities to power agents that can click, scroll and type for you online - setting a new standard on multiple benchmarks, with faster speed.
Google DeepMind tweet media
English
107
342
2.7K
452.7K
Pete Shaw retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Our new Gemini 2.5 Computer Use model is now available in the Gemini API, setting a new standard on multiple benchmarks with lower latency. These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an important next step in building general-purpose agents. Developers can try these capabilities via API in @googleaistudio + Vertex AI.
Sundar Pichai tweet media
English
116
302
3.1K
310K
Pete Shaw retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow. It treats learning as compression, minimize bits to describe the model plus bits to describe the labels. Provides a single training target that rewards simple, compressible solutions while staying mathematically grounded. This gives a principled way to aim models at simplicity and generalization, and it explains why optimization, not capacity, is the current bottleneck. In Kolmogorov complexity, a "program" is just the shortest set of instructions that can recreate some data. A shorter program means the data or model is simpler. So when they say “a prior favoring shorter programs,” it means the model is assumed to be more likely if it can be described with fewer bits. As the Transformer gets deeper (more layers) and has more context (bigger input window), its ability to represent complex programs grows. In that limit, the paper proves that this code length becomes the best possible measure of simplicity and fit — the same way Kolmogorov complexity works in theory. “Code length” here means how many bits it takes to describe both the model and how well it fits the data. So in simple words, they are saying: if you keep increasing model size and context, this method of preferring shorter and better-fitting models gets as close as possible to the theoretical ideal of perfect compression and generalization. ---- Paper – arxiv. org/abs/2509.22445 Paper Title: "Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers"
Rohan Paul tweet media
English
9
50
287
24.3K
Pete Shaw retweetledi
Pete Shaw
Pete Shaw@ptshaw2·
We hope this work adds some conceptual clarity around how Kolmogorov complexity relates to neural networks, and provides a path towards identifying new complexity measures that enable greater compression and generalization.
English
1
1
6
548
Pete Shaw
Pete Shaw@ptshaw2·
Excited to share a new paper that aims to narrow the conceptual gap between the idealized notion of Kolmogorov complexity and practical complexity measures for neural networks.
Pete Shaw tweet media
English
1
19
122
18.3K