Paul Soulos

198 posts

Paul Soulos banner
Paul Soulos

Paul Soulos

@paulsoulos

Technical Advisor to the CEO @MicrosoftAI. Previously Neurosymbolic PhD @JhuCogsci, intern @ibmresearch and @msftresearch, and software @fitbit and @Google.

Baltimore, MD Katılım Mart 2010
602 Takip Edilen521 Takipçiler
Sabitlenmiş Tweet
Paul Soulos
Paul Soulos@paulsoulos·
I successfully defended my Ph.D. dissertation and started working at @MicrosoftAI as a Technical Advisor to the CEO! I'm excited to help make Copilot the most empowering, empathetic, and useful AI companion.
Paul Soulos tweet media
English
23
11
332
22.9K
Paul Soulos
Paul Soulos@paulsoulos·
You're comfortable with ambiguity, can write an exec memo on Monday and prototype a model evaluation on Tuesday, and communicate complex ideas clearly to any audience. You'd work across the entire AI stack: from data center infra through pre-training to evals.
English
1
0
1
92
Paul Soulos
Paul Soulos@paulsoulos·
I'm hiring a Technical Advisor for Microsoft Superintelligence, working in the Office of the CEO of Microsoft AI. The right person has deep ML research experience and thinks as much about what AI means strategically as how to make it work technically. 🧵
Paul Soulos tweet media
English
1
0
2
211
Paul Soulos
Paul Soulos@paulsoulos·
Claude Code is the REPL. Agent SDK is the main.py. One is a tool; the other is infrastructure.
English
0
0
0
104
Paul Soulos
Paul Soulos@paulsoulos·
@felixrieseberg It’s like skeuomorphic design for the AI age, the UX needs to start with the paradigms people are familiar with and eventually the tools change the way people think
English
0
0
1
144
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
This is on purpose! The latest models are almost indistinguishable from a futuristic magic trick, a little wonder behind the screen. Making that useful for humans is not the best time to also attempt to teach them five other software innovations. Files and folders are so composable, they cover so much of what humans need to do. We can always throw in the holodeck later.
Julian Lehr@julianlehr

Claude Cowork feels very retro-futuristic to me: we’re summoning hyper-intelligent agents from the future, while at the same time returning to ancient file-and-folder rituals.

English
15
10
262
28.2K
Paul Soulos
Paul Soulos@paulsoulos·
Before recursive self-improvement, we'll see recursive collaborative improvement — AI tools making AI researchers dramatically more productive. Given how good Claude Code + Opus 4.5 are, I expect Opus 5 to be a step change. Anthropic's research velocity is going to pull ahead.
English
1
0
2
125
Paul Soulos
Paul Soulos@paulsoulos·
"The very abilities that allow Claude to be used in these attacks also make it crucial for cyber defense." I’m a big fan of Anthropic’s work, but the incentives feel misaligned here. It creates a protection racket where you need to buy a shield from the person selling the sword.
Anthropic@AnthropicAI

We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: anthropic.com/news/disruptin…

English
0
0
1
132
Paul Soulos retweetledi
Mustafa Suleyman
Mustafa Suleyman@mustafasuleyman·
MAI-Image-1 has shipped 🚢 Try it now at bing.com/create or the Bing app, plus it'll generate custom art for your Story Mode audio at copilot.microsoft.com/labs/audio-exp… It really excels at: -artistic lighting/photorealistic detail -nature scenes -food! Drop your creations below ⤵️
Mustafa Suleyman tweet mediaMustafa Suleyman tweet mediaMustafa Suleyman tweet mediaMustafa Suleyman tweet media
English
29
46
277
80.7K
Paul Soulos
Paul Soulos@paulsoulos·
❣️“Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into the learning process to obtain more capable models.”
Nando de Freitas@NandoDF

The only bitter lesson is that LLMs have succeeded beyond any expert expectations. Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into the learning process to obtain more capable models. We are still far from hitting the limits in this. We are still compute hungry because there is a ton more we could achieve if only we had more compute, from experimental ablations to data acquisition and curation. Scaling is largely about data and evals. The models are now trained on almost all the web and equally large (but growing) self generated synthetic data. sifting through such vasts quantities of data (the whole of the human creation) requires formidable engineering and intelligent ideas. This is what differentiates most models. AI is finally in the hands of billions of users, and with it come billions of tasks - every reasonable user need. This scaling in tasks and evaluations is many orders of magnitude larger than pre-LLMs. Having the right architecture matters, but we know several alternatives could all work well, eg replacing attention in Transformers for RNNs and interleaving such layers with local layers. What matters is fine ablations to maximise hardware usage. This is the realm of sophisticated high-precision engineering. It encompasses semiconductor design, datacenter design, distributed systems, MFU, etc. There is fascinating work on flow matching, JEPA, sparser MoEs, etc, that is all consistent with scaling. I’m terrible at predictions, but in this we have stayed the course. There’s been pleasant surprises like the effectiveness of reasoning, which while allowing for less parameters, still demands even more compute. Sparser multimodal MoEs also will allow for better continual learning. This is an old idea, eg arxiv.org/pdf/1108.3298, which is finally being done at scale. Successful scaling is mostly about organising people into effective teams for research, development and production. They have to be teams of happy and ambitious people who put the team first. Yes, tech VCs and CEOs: work life balance matters to achieve prologued success, something I think @demishassabis did really well at @GoogleDeepMind and which I promote at @MicrosoftAI. Bitter lesson: it really is all about scaling and hard work by thousands of amazing people. Hardly bitter, but hopeful and inspiring.

English
0
0
0
130
Tom McCoy
Tom McCoy@RTomMcCoy·
So much research is being done about LLMs that it's hard to stay on top of the literature. To help with this, I've made a list of all the most important papers from the past 8 years: rtmccoy.com/pubs/ I hope you enjoy!
English
8
13
179
14.9K
Paul Soulos
Paul Soulos@paulsoulos·
While both robotics and LM can be cast as next-token prediction, the token distribution for computer agents seems more like abstract motor programs (robotics) vs. language. This puts computer use on the trajectory of robotics which is slower than LLMs. 2/2
English
0
0
1
231
Paul Soulos
Paul Soulos@paulsoulos·
Intriguing prediction from @TrentonBricken & @_sholtodouglas on @dwarkesh_sp's podcast: computer use agents "solved" in ~10 months 🖱️⌨️. This feels highly optimistic. I think that computer use is closer to robotics than language modeling. 1/2
English
1
0
2
302
John Carmack
John Carmack@ID_AA_Carmack·
TFW you flip so many matrices that you wind up with: C = (A.t @ B.t).t And realize you can just do: C = B @ A
English
35
19
630
68.7K
Paul Soulos
Paul Soulos@paulsoulos·
r/AmItheAsshole seems perfect as a resource for AI alignment: nuanced moral dilemmas, rich community debates highlighting cultural complexities (albeit U.S.-centric), and reasoning chains that explicitly end in a binary moral verdict.
English
0
0
1
264
Paul Soulos
Paul Soulos@paulsoulos·
@TrentonBricken Did you look into how circuits change for adding two long numbers where Claude produces the wrong output? I'm curious if we can get any insight into the generalizability of the circuits you identified and where things go wrong.
English
0
0
8
1.7K
Trenton Bricken
Trenton Bricken@TrentonBricken·
My favorite figure from our new Circuits papers -- "How does Claude do math?" Claude simultaneously does: 1. a back of the envelope calculation of the tens digits -- "the answer should be somewhere around 90". 2. an exact calculation of 6+9=15 using these super cool look up table features.
Trenton Bricken tweet mediaTrenton Bricken tweet mediaTrenton Bricken tweet media
Anthropic@AnthropicAI

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

English
12
113
1.1K
126.7K
Paul Soulos
Paul Soulos@paulsoulos·
Can Google please implement a computer-using agent that navigates the unsubscribe web interface on my behalf. The problem space feels pretty well defined.
Paul Soulos tweet media
English
0
0
1
318
Hamid Palangi
Hamid Palangi@hamidpalangi·
Will pre-training as we know it end?
Hamid Palangi tweet media
English
1
0
9
1.6K