Paul Soulos

198 posts

Paul Soulos

@paulsoulos

Technical Advisor to the CEO @MicrosoftAI. Previously Neurosymbolic PhD @JhuCogsci, intern @ibmresearch and @msftresearch, and software @fitbit and @Google.

Baltimore, MD Katılım Mart 2010

602 Takip Edilen521 Takipçiler

Sabitlenmiş Tweet

Paul Soulos@paulsoulos·14 Tem

I successfully defended my Ph.D. dissertation and started working at @MicrosoftAI as a Technical Advisor to the CEO! I'm excited to help make Copilot the most empowering, empathetic, and useful AI companion.

English

332

22.9K

Paul Soulos@paulsoulos·19 Şub

🗺️ Mountain View or Redmond microsoft.ai/job/technical-…

English

Paul Soulos@paulsoulos·19 Şub

You're comfortable with ambiguity, can write an exec memo on Monday and prototype a model evaluation on Tuesday, and communicate complex ideas clearly to any audience. You'd work across the entire AI stack: from data center infra through pre-training to evals.

English

Paul Soulos@paulsoulos·19 Şub

I'm hiring a Technical Advisor for Microsoft Superintelligence, working in the Office of the CEO of Microsoft AI. The right person has deep ML research experience and thinks as much about what AI means strategically as how to make it work technically. 🧵

English

211

Paul Soulos@paulsoulos·28 Oca

Claude Code is the REPL. Agent SDK is the main.py. One is a tool; the other is infrastructure.

English

104

Paul Soulos@paulsoulos·17 Oca

@felixrieseberg It’s like skeuomorphic design for the AI age, the UX needs to start with the paradigms people are familiar with and eventually the tools change the way people think

English

144

Felix Rieseberg@felixrieseberg·17 Oca

This is on purpose! The latest models are almost indistinguishable from a futuristic magic trick, a little wonder behind the screen. Making that useful for humans is not the best time to also attempt to teach them five other software innovations. Files and folders are so composable, they cover so much of what humans need to do. We can always throw in the holodeck later.

Julian Lehr@julianlehr

Claude Cowork feels very retro-futuristic to me: we’re summoning hyper-intelligent agents from the future, while at the same time returning to ancient file-and-folder rituals.

English

262

28.2K

Paul Soulos@paulsoulos·16 Oca

Before recursive self-improvement, we'll see recursive collaborative improvement — AI tools making AI researchers dramatically more productive. Given how good Claude Code + Opus 4.5 are, I expect Opus 5 to be a step change. Anthropic's research velocity is going to pull ahead.

English

125

Paul Soulos@paulsoulos·19 Kas

"The very abilities that allow Claude to be used in these attacks also make it crucial for cyber defense." I’m a big fan of Anthropic’s work, but the incentives feel misaligned here. It creates a protection racket where you need to buy a shield from the person selling the sword.

Anthropic@AnthropicAI

We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: anthropic.com/news/disruptin…

English

132

Paul Soulos retweetledi

Mustafa Suleyman@mustafasuleyman·4 Kas

MAI-Image-1 has shipped 🚢 Try it now at bing.com/create or the Bing app, plus it'll generate custom art for your Story Mode audio at copilot.microsoft.com/labs/audio-exp… It really excels at: -artistic lighting/photorealistic detail -nature scenes -food! Drop your creations below ⤵️

English

277

80.7K

Paul Soulos@paulsoulos·28 Eyl

❣️“Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into the learning process to obtain more capable models.”

Nando de Freitas@NandoDF

The only bitter lesson is that LLMs have succeeded beyond any expert expectations. Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into the learning process to obtain more capable models. We are still far from hitting the limits in this. We are still compute hungry because there is a ton more we could achieve if only we had more compute, from experimental ablations to data acquisition and curation. Scaling is largely about data and evals. The models are now trained on almost all the web and equally large (but growing) self generated synthetic data. sifting through such vasts quantities of data (the whole of the human creation) requires formidable engineering and intelligent ideas. This is what differentiates most models. AI is finally in the hands of billions of users, and with it come billions of tasks - every reasonable user need. This scaling in tasks and evaluations is many orders of magnitude larger than pre-LLMs. Having the right architecture matters, but we know several alternatives could all work well, eg replacing attention in Transformers for RNNs and interleaving such layers with local layers. What matters is fine ablations to maximise hardware usage. This is the realm of sophisticated high-precision engineering. It encompasses semiconductor design, datacenter design, distributed systems, MFU, etc. There is fascinating work on flow matching, JEPA, sparser MoEs, etc, that is all consistent with scaling. I’m terrible at predictions, but in this we have stayed the course. There’s been pleasant surprises like the effectiveness of reasoning, which while allowing for less parameters, still demands even more compute. Sparser multimodal MoEs also will allow for better continual learning. This is an old idea, eg arxiv.org/pdf/1108.3298, which is finally being done at scale. Successful scaling is mostly about organising people into effective teams for research, development and production. They have to be teams of happy and ambitious people who put the team first. Yes, tech VCs and CEOs: work life balance matters to achieve prologued success, something I think @demishassabis did really well at @GoogleDeepMind and which I promote at @MicrosoftAI. Bitter lesson: it really is all about scaling and hard work by thousands of amazing people. Hardly bitter, but hopeful and inspiring.

English

130

Paul Soulos@paulsoulos·16 Tem

@RTomMcCoy

QME

540

Tom McCoy@RTomMcCoy·16 Tem

So much research is being done about LLMs that it's hard to stay on top of the literature. To help with this, I've made a list of all the most important papers from the past 8 years: rtmccoy.com/pubs/ I hope you enjoy!

English

179

14.9K

Paul Soulos@paulsoulos·30 May

While both robotics and LM can be cast as next-token prediction, the token distribution for computer agents seems more like abstract motor programs (robotics) vs. language. This puts computer use on the trajectory of robotics which is slower than LLMs. 2/2

English

231

Paul Soulos@paulsoulos·30 May

Intriguing prediction from @TrentonBricken & @_sholtodouglas on @dwarkesh_sp's podcast: computer use agents "solved" in ~10 months 🖱️⌨️. This feels highly optimistic. I think that computer use is closer to robotics than language modeling. 1/2

English

302

Paul Soulos@paulsoulos·16 Nis

@ID_AA_Carmack Einsum notation is a great way to avoid this!

English

John Carmack@ID_AA_Carmack·16 Nis

TFW you flip so many matrices that you wind up with: C = (A.t @ B.t).t And realize you can just do: C = B @ A

English

630

68.7K

Paul Soulos@paulsoulos·3 Nis

r/AmItheAsshole seems perfect as a resource for AI alignment: nuanced moral dilemmas, rich community debates highlighting cultural complexities (albeit U.S.-centric), and reasoning chains that explicitly end in a binary moral verdict.

English

264

Paul Soulos@paulsoulos·28 Mar

@TrentonBricken Did you look into how circuits change for adding two long numbers where Claude produces the wrong output? I'm curious if we can get any insight into the generalizability of the circuits you identified and where things go wrong.

English

1.7K

Trenton Bricken@TrentonBricken·27 Mar

My favorite figure from our new Circuits papers -- "How does Claude do math?" Claude simultaneously does: 1. a back of the envelope calculation of the tens digits -- "the answer should be somewhere around 90". 2. an exact calculation of 6+9=15 using these super cool look up table features.

Anthropic@AnthropicAI

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

English

113

1.1K

126.7K

Paul Soulos@paulsoulos·25 Mar

Can Google please implement a computer-using agent that navigates the unsubscribe web interface on my behalf. The problem space feels pretty well defined.

English

318

Paul Soulos@paulsoulos·19 Şub

The speed improvements from NSA are very impressive, and I am trying to figure out why their sparse implementation actually outperforms full attention. My intuition is that the hierarchical computation is more expressive than standard full attention. Do you have any guesses?

DeepSeek@deepseek_ai

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning. 📖 For more details, check out our paper here: arxiv.org/abs/2502.11089

English

331

Paul Soulos@paulsoulos·14 Ara

@hamidpalangi Was this the test of time talk?

English

Hamid Palangi@hamidpalangi·14 Ara

Will pre-training as we know it end?

English

1.6K

Paul Soulos retweetledi

Csordás Róbert@robert_csordas·12 Ara

Come visit our poster "MoEUT: Mixture-of-Experts Universal Transformers" on Friday at 4:30 pm in East Exhibit Hall A-C #1907 on #NeurIPS2024. With Kazuki Irie, @SchmidhuberAI, @ChrisGPotts and @chrmanning.

English

3.9K

Keşfet

@felixrieseberg @RTomMcCoy @TrentonBricken @_sholtodouglas @dwarkesh_sp @ID_AA_Carmack @elonmusk @BarackObama