Lee Butterman

3.4K posts

Lee Butterman

@leebutterman

🏳️‍🌈 · 🚴 · 🏞️ · 🏜️ · ⦿⪥ · 🤖🧠

he's on unceded Ohlone shores Katılım Nisan 2009

476 Takip Edilen478 Takipçiler

Sabitlenmiş Tweet

Lee Butterman@leebutterman·1 Eyl

@simonw I’ve got text embeddings of Wikipedia for semantic search in the browser at leebutterman.com/wikipedia-sear…

GIF

English

198

101.3K

Lee Butterman@leebutterman·3d

@dbreunig @isaacbmiller1 Sounds cool! I’ve found DSPy and GEPA super productive and very excited about RLMs x.com/dspyoss/status…

DSPy@DSPyOSS

leebutterman.com/2025/11/01/pro…

English

3.8K

Drew Breunig@dbreunig·25 Şub

Time for another SF DSPy meetup, this time focusing on DSPy in production use cases and RLMs. Engineers from @Dropbox and @Shopify will be sharing case studies and answering questions, and @isaacbmiller1 will walk through DSPy's RLM module. Come! luma.com/je6ewmkx

English

9.9K

Lee Butterman retweetledi

Omar Khattab@lateinteraction·25 Şub

exciting talks on dspy in production from dropbox and shopify teams at the next DSPy SF meetup!

Drew Breunig@dbreunig

English

5.3K

Lee Butterman@leebutterman·3 Şub

@jeffreyhuber Upper right is Deep Research

English

107

Jeff Huber@jeffreyhuber·3 Şub

context for agents is a hot topic > fs for everything! > we just need grep! the API is irrelevant (virtual FS exist) - the interesting part is data locality this 2x2 explains what actually needs to exist a local-remote sync engine for agent data (not unlike the Linear sync engine)

English

9.2K

Lee Butterman retweetledi

Farouk@FaroukAdeleke3·27 Oca

Open sourcing Microcode! Microcode is a context-efficient, general purpose terminal agent fully powered by a packaged `dspy.RLM()` program. Set your own OpenRouter API key via and choice of models. Supports MCP servers too with @MaximeRivest mcp2py. Because the CI/CD of the RLM engine is exposed through @modaicdev, per-user or per-codebase prompt optimization is plug and play. If you set the --verbose flag, it pretty prints the RLM's trajectories as reasoning or code within it's REPL.

English

140

15.1K

Lee Butterman@leebutterman·19 Oca

@nbaschez We’ve also found that giving Claude access to start and stop an app with multiple daemons and access to read logs is crucial for longer self directed exploration. Any reason you like pm2 over docker?

English

102

Nathan Baschez@nbaschez·18 Oca

Underrated claude code trick - use pm2 to run the dev server Makes it super easy for claude code to search logs while efficiently managing the context window

English

345

43.9K

Lee Butterman@leebutterman·6 Oca

@lateinteraction 💯. Python is often my object code, and my human intention from reading and writing in the multi turn code agent setup is the only important part and is often lost in PR comments. I know I especially don’t record praise of what the agent got right that I was worried about

English

Omar Khattab@lateinteraction·5 Oca

tl;dr I want to program, not ask someone else—something else?—to program for me. But my programs should focus on the serious, higher leverage aspects of my intent, not the minutiae.

English

Omar Khattab@lateinteraction·5 Oca

I will be so disappointed if the way we build software with AI remains vibe coding rather than a genuinely higher level of abstraction. I want to express the “code” of the system—but shorter and more pleasant—not manage an agent to write 100,000 lines of low-level slop for me.

Omar Khattab@lateinteraction

There’s a growing time and place for ‘vibe coding’, but it’s disastrous that it doesn’t record the produced system at the level of abstraction it was built. Imagine if the only way to use C was to compile isolated snippets of assembly. We need higher-level programming languages.

English

499

135.5K

Lee Butterman retweetledi

Misha@drethelin·15 Ara

I’ve heard of boba tea, but there’s a new place in the neighborhood that sells something they call Kiki tea, so I got some to try

English

1.6K

22.3K

522.9K

Lee Butterman retweetledi

Salah Alzu'bi@salahalzubi401·19 Kas

Also a big fan of GEPA! If you’re optimization runs are taking a long time you should take a look at our k-way proposal function called GEPA+. It reduces metric runs by 30-40% while improving relative accuracy by 2-5% depending on tasks! x.com/sala88232/stat…

DSPy@DSPyOSS

DSPy on a Pi: Cheap Prompt Optimization with GEPA and Qwen3 “It took me about sixteen hours on a Raspberry Pi to boost performance of chat-to-SQL using Qwen3 0.6B from 7.3% to 28.5%. Using gpt-oss:20b, to boost performance from ~60% to ~85% took 5 days.”

English

802

Lee Butterman retweetledi

Vaibhav (VB) Srivastav@reach_vb·30 Kas

This! Sometime last year I made the switch, the default is greyscale and it automatically turns to colour when I’m looking at photos and back when I close it! 100% would recommend

Cal@HoodlumCallum

holy shit putting ur phone in greyscale really does break the mind control

English

394

15.4K

3.5M

Lee Butterman retweetledi

Christopher Potts@ChrisGPotts·19 Kas

A clip from my practice run of this talk, providing more context for this slide:

Heather Miller@heathercmiller

In the age of AI for coding, Software Engineering strikes again… prompt optimization as the missing link for helping people discover latent requirements (Slide courtesy of @ChrisGPotts at the Bay Area @DSPyOSS meetup tonight)

English

26.4K

Lee Butterman@leebutterman·19 Kas

@chrisbolas @noah_vandal @DSPyOSS

QME

Chris Bolas@chrisbolas·19 Kas

@leebutterman @noah_vandal @DSPyOSS Ooh I'll definitely mess around with this when I get back home. But in the mean time you should consider an nvme enclosure, you can boot from USB now! amazon.com/SSK-Aluminum-E…

English

Lee Butterman retweetledi

DSPy@DSPyOSS·18 Kas

English

173

12.2K

Lee Butterman@leebutterman·19 Kas

@chrisbolas @noah_vandal @DSPyOSS As I’d explained a bit farther down, you want a dense model if you only have 8GB before you hit slow disk (micro sd), and gpt-oss is a MoE, most of your experts aren’t going to mix in, so a dense model like Qwen3 4B Thinking 2507 utilizes memory more

English

Chris Bolas@chrisbolas·19 Kas

@leebutterman @noah_vandal @DSPyOSS Niiice, I have the 8, didn't realize they had a 16 now!

English

Lee Butterman@leebutterman·19 Kas

@JoshPurtell I have found that GEPA is extremely sample efficient. I got from 7% to 50% accurate on chat to sql for qwen3 0.6B on a MacBook in a day. And 7% to 28% in one pass through my <300 item dataset with a Qwen3 4B reflection LLM x.com/dspyoss/status…

DSPy@DSPyOSS

English

1.1K

Josh@JoshPurtell·18 Kas

We ran a few benchmarks of GEPA and MIPROv2 across different implementations, on LangProBe datasets, at small scale. Would love to hear early thoughts / qualms / requested datasets for folks who want to dig deeper. TLDR: they work! docs.usesynth.ai/blog/prompt-op…

constantin@luckenco

has anyone of you used DSPy with GEPA in a production setting? the tutorials state around >10% improvements for most examples. seems like free performance, so i am a bit skeptic.

English

7.2K

Lee Butterman retweetledi

DSPy@DSPyOSS·18 Kas

leebutterman.com/2025/11/01/pro…

ZXX

4.8K

Lee Butterman@leebutterman·18 Kas

@noah_vandal @DSPyOSS With under 1GB to spare! But gpt-oss:20b (in Ollama) has weights of ~12GB and takes another few gigs of inference memory so I mostly had my four cores going full time on the 16GB Pi 5. Qwen 4B optimizing Qwen 4B would be even smaller. On a MacBook it is within an afternoon :)

English

Noah Vandal@noah_vandal·18 Kas

@DSPyOSS how is he running oss:20b on a pi 🤯

English

357

Lee Butterman@leebutterman·18 Kas

@karanjagtiani04 @DSPyOSS No manual tuning! That’s the whole point :) I just made a pretty basic prompt with some bland facts about the data (multiple rows per paper, one per author, read only, call out content policy violations) and GEPA did the rest, and DSPy was super easy to use to separate concerns

English

1.3K

Karan Jagtiani@karanjagtiani04·18 Kas

@DSPyOSS Sixteen hours on a Pi for that jump is pretty impressive, especially with a 0.6B model. Curious how much tuning was manual vs automated?

English

325

Lee Butterman@leebutterman·21 Eki

@lmeyerov Those piles of boring features that are getting automated are all closer to each other than making the initial proofs of concept so it’s easy to imagine that coding agents will get better at them, and that iterative polish is an iceberg of work, esp re compliance

English

lmeyerov@lmeyerov·20 Eki

Misconceptions on how vibes engineering works on the ground leads to product leaders misunderstanding velocity, tech leads ignoring a superpower, etc. A lot comes to one simple but useful mental shift in development economics: - yes you can have a 10K loc vibed prototype etc next day, but that's not what most of us do or are talking about - more practical is to recast production-grade coding from latency-centric to throughput-centric: * latency-centric: each feature still must go through QA etc, and today's AIs are often even slower per-step today. even items that can be faster might go slower b/c we can put more into them. * throughput-centric: the bigger shift is large swathes of a feature's work items can be automated away by parallel agents, so you can 2X+ your 'boring' feature count - result is individual items on roadmap can be faster or slower, but sheer volume should be up i think we'll get to latency-centric too, cutting in say 50% over next year (!) for good teams, but that's still dwarfed by the throughput-centric shift which can be many more multiples

Paul Xue@pxue

A frustrated startup founder came to me recently complaining about the devs he's been using for the past 10+ years. He's shocked at how fast we're moving on product with just a couple of people. His VP of eng and devs refuse to use AI, still builds on waterfall. Every feature takes 6-8 weeks between ideation, PRD, design, dev and testing. It blows my mind these devs still exist. But times are catching up and we're entering the "adopt or get left behind phase" The CEO and VP has been together for 10+ years, so transition isn't as clear cut. But they're now at the point where it's irresponsible to the investors not to do something. Take this however you will.

English

399

Lee Butterman@leebutterman·9 Eyl

@charles_irl These flow charts sqlite.org/lang_select.ht…

English

Charles 🎉 Frye@charles_irl·9 Eyl

what's some of your favorite technical documentation? any domain, any medium, broad sense for example some of mine: the Uline catalogs, the FastAPI docs, the IETF RFCs, Byte magazine ads

English

5.1K

Lee Butterman@leebutterman·20 Ağu

For this reason, I’ve heard tech “debt” referred to as a “unhedged call option”: the downside of the loan is unknown when you take it out. Who knows what’s lurking in that code no one understands. higherorderlogic.com/programming/20…

Jeremy Howard@jeremyphoward

@MarkovMagnifico Because of the interest-on-interest, you really shouldn't leave it indefinitely. You will pay, eventually.

English

Keşfet

@dbreunig @isaacbmiller1 @Dropbox @Shopify @jeffreyhuber @MaximeRivest @modaicdev @nbaschez