Devin Shah

54 posts

Devin Shah banner
Devin Shah

Devin Shah

@DevinShah16

llms @otter_ai, prev: cofounder @octane_security, bme+cs @dukeu, gene editing @stanfordmed

San Francisco, CA Katılım Ekim 2019
786 Takip Edilen179 Takipçiler
Sabitlenmiş Tweet
Devin Shah
Devin Shah@DevinShah16·
If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles? I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games. Full blogpost: dshah.dev/blog/smallworld
English
4
0
12
425
Devin Shah retweetledi
Harry Partridge
Harry Partridge@part_harry_·
One interesting point: a fixed KV cache is a MLP. Collectively, the keys form an up projection, and values form a down projection. The softmax is a nonlinearity. Therefore, we can view KV compression as a new way of producing ‘weights’. Instead of using back propagation to refine our MLPs, we can learn to produce them directly from context. This is perhaps more analogous to human learning and has the potential to be far more sample efficient.
Charlie O'Neill@oneill_c

1/ You can shrink a language model's KV cache by 200×, in a single forward pass, and it still answers correctly. At 256k context that's 36 GiB of cache down to ~360 MiB, with no change to the base model. Here's how we did it 👇

English
4
20
390
45.3K
Devin Shah retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…
English
465
2K
15.8K
7.8M
Devin Shah
Devin Shah@DevinShah16·
Given two branches with concrete projections, it reliably picks the better one. Its weakness is option generation: left to its own devices, it generates one option (the action-forward one) and never surfaces the alternative. The template's entire contribution is making option generation mandatory, which turns out to be enough to close most of the gap. Full post with much more details: dshah.dev/blog/smallworld Link to code: github.com/dshah3/smallwo…
English
0
0
0
151
Devin Shah
Devin Shah@DevinShah16·
The general finding is about what I'm calling strategic attention. It is the reflex to pull the right reasoning framework into active context at the right moment. The model has the knowledge: if you ask it "when should you decline in Small World?" it gives a correct answer. It just doesn't activate that knowledge unprompted at the decision point. The template interrupts the default action-first reasoning loop long enough for the model's own strategic thinking to engage. This maps directly to a pattern @FrontierSWE found in software engineering: Opus 4.6 solved a Pyright optimization in 11 minutes, then kept iterating for seven more hours across 95 builds, at one point losing the fix entirely before rediscovering it. If it had stopped at minute 11, it would have scored the same.
English
1
0
0
144
Devin Shah
Devin Shah@DevinShah16·
If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles? I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games. Full blogpost: dshah.dev/blog/smallworld
English
4
0
12
425
Devin Shah retweetledi
kalomaze
kalomaze@kalomaze·
REINFORCEMENT LEARNING FOR KNOWLEDGE AWARENESS
kalomaze tweet media
English
17
47
701
52K
Devin Shah
Devin Shah@DevinShah16·
An underrated aspect of language models is practically zero skill degradation over time (without inference time quantization and assuming stable compute). We have to actively practice a skill just to stay on the capability frontier.
English
0
0
3
185
Devin Shah
Devin Shah@DevinShah16·
@__tensorcore__ Congrats, and best of luck! Thanks for the amazing work building CUTLASS
English
0
0
1
488
Vijay
Vijay@__tensorcore__·
As of last week, I am no longer at NVIDIA 🧵 Leaving the CUTLASS team was extremely hard. I will dearly miss my incredible colleagues and the extremely compelling mission statement of creating the world's best accelerator programming model w/ hardware software codesign 💚
Vijay tweet media
English
16
18
370
27.2K
Alfred Wahlforss
Alfred Wahlforss@itsalfredw·
Today, Listen crossed $100M in funding. Building is easy now. Knowing what to build isn't. Our AI finds and talks to your users so you don't have to guess. See how Sweetgreen, Microsoft, and Replit use it:
English
184
87
1.3K
1.6M
Devin Shah
Devin Shah@DevinShah16·
Thanks to @modal for the easy vLLM container setup and @cursor_ai for the problem and Cursor Tab inspiration.
English
0
0
1
274
Devin Shah
Devin Shah@DevinShah16·
The team at @cursor_ai posed the problem of character prefix conditioning at the beginning of the year - today I'm releasing a short blog post and some code walking through my attempt. It was fun to learn some creative ways of sampling from language models.
Devin Shah tweet media
English
2
0
3
387