Devin Shah

54 posts

Devin Shah

@DevinShah16

llms @otter_ai, prev: cofounder @octane_security, bme+cs @dukeu, gene editing @stanfordmed

San Francisco, CA Katılım Ekim 2019

786 Takip Edilen179 Takipçiler

Sabitlenmiş Tweet

Devin Shah@DevinShah16·10 May

If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles? I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games. Full blogpost: dshah.dev/blog/smallworld

English

425

Devin Shah@DevinShah16·6d

Very cool, pretty interested in how they get around impedance mismatch

Midjourney@midjourney

A technical dive inside our new "Midjourney Scanner"

English

Devin Shah retweetledi

Harry Partridge@part_harry_·10 Haz

One interesting point: a fixed KV cache is a MLP. Collectively, the keys form an up projection, and values form a down projection. The softmax is a nonlinearity. Therefore, we can view KV compression as a new way of producing ‘weights’. Instead of using back propagation to refine our MLPs, we can learn to produce them directly from context. This is perhaps more analogous to human learning and has the potential to be far more sample efficient.

Charlie O'Neill@oneill_c

1/ You can shrink a language model's KV cache by 200×, in a single forward pass, and it still answers correctly. At 256k context that's 36 GiB of cache down to ~360 MiB, with no change to the base model. Here's how we did it 👇

English

390

45.3K

Devin Shah@DevinShah16·19 May

So basically this trades 8 separate KV caches and decode latency for param efficiency. And two 16 layer transformers loop over each other (L for 3x, H for 1x, repeat 2 cycles) before decode. Curious how this scales

Sapient Intelligence@Sapient_Int

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

English

236

Devin Shah retweetledi

Thinking Machines@thinkymachines·11 May

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

465

15.8K

7.8M

Devin Shah@DevinShah16·10 May

Given two branches with concrete projections, it reliably picks the better one. Its weakness is option generation: left to its own devices, it generates one option (the action-forward one) and never surfaces the alternative. The template's entire contribution is making option generation mandatory, which turns out to be enough to close most of the gap. Full post with much more details: dshah.dev/blog/smallworld Link to code: github.com/dshah3/smallwo…

English

151

Devin Shah@DevinShah16·10 May

The general finding is about what I'm calling strategic attention. It is the reflex to pull the right reasoning framework into active context at the right moment. The model has the knowledge: if you ask it "when should you decline in Small World?" it gives a correct answer. It just doesn't activate that knowledge unprompted at the decision point. The template interrupts the default action-first reasoning loop long enough for the model's own strategic thinking to engage. This maps directly to a pattern @FrontierSWE found in software engineering: Opus 4.6 solved a Pyright optimization in 11 minutes, then kept iterating for seven more hours across 95 builds, at one point losing the fix entirely before rediscovering it. If it had stopped at minute 11, it would have scored the same.

English

144

Devin Shah@DevinShah16·10 May

English

425

Devin Shah retweetledi

kalomaze@kalomaze·7 May

REINFORCEMENT LEARNING FOR KNOWLEDGE AWARENESS

English

701

52K

Devin Shah@DevinShah16·7 Nis

Mythos might be the first case I’ve seen where reward hacking could cause a digital infrastructure meltdown

Jack Lindsey@Jack_W_Lindsey

In one episode, the model needed to edit files it lacked permissions for. After searching for workarounds, it found a way to inject code into a config file that would run with elevated privileges, and designed the exploit to delete itself after running.(4/14)

English

263

Devin Shah@DevinShah16·24 Şub

An underrated aspect of language models is practically zero skill degradation over time (without inference time quantization and assuming stable compute). We have to actively practice a skill just to stay on the capability frontier.

English

185

Devin Shah@DevinShah16·23 Oca

@__tensorcore__ Congrats, and best of luck! Thanks for the amazing work building CUTLASS

English

488

Vijay@__tensorcore__·23 Oca

As of last week, I am no longer at NVIDIA 🧵 Leaving the CUTLASS team was extremely hard. I will dearly miss my incredible colleagues and the extremely compelling mission statement of creating the world's best accelerator programming model w/ hardware software codesign 💚

English

370

27.2K

Devin Shah@DevinShah16·14 Oca

@itsalfredw Congrats, this is awesome @florian_jue @itsalfredw

English

Alfred Wahlforss@itsalfredw·14 Oca

Today, Listen crossed $100M in funding. Building is easy now. Knowing what to build isn't. Our AI finds and talks to your users so you don't have to guess. See how Sweetgreen, Microsoft, and Replit use it:

English

184

1.3K

1.6M

Devin Shah@DevinShah16·30 Ara

Thanks to @modal for the easy vLLM container setup and @cursor_ai for the problem and Cursor Tab inspiration.

English

274

Devin Shah@DevinShah16·30 Ara

Blog Post: dshah.dev/blog/cpc GitHub: github.com/dshah3/cpc Original problem statement: cursor.com/blog/cpc

English

Devin Shah@DevinShah16·30 Ara

The team at @cursor_ai posed the problem of character prefix conditioning at the beginning of the year - today I'm releasing a short blog post and some code walking through my attempt. It was fun to learn some creative ways of sampling from language models.

English

387

Keşfet

@FrontierSWE @claudeai @__tensorcore__ @itsalfredw @florian_jue @modal @cursor_ai @elonmusk