Lee Butterman

3.4K posts

Lee Butterman banner
Lee Butterman

Lee Butterman

@leebutterman

🏳️‍🌈 · 🚴 · 🏞️ · 🏜️ · ⦿⪥ · 🤖🧠

he's on unceded Ohlone shores Katılım Nisan 2009
476 Takip Edilen478 Takipçiler
Drew Breunig
Drew Breunig@dbreunig·
Time for another SF DSPy meetup, this time focusing on DSPy in production use cases and RLMs. Engineers from @Dropbox and @Shopify will be sharing case studies and answering questions, and @isaacbmiller1 will walk through DSPy's RLM module. Come! luma.com/je6ewmkx
English
2
9
43
9.9K
Jeff Huber
Jeff Huber@jeffreyhuber·
context for agents is a hot topic > fs for everything! > we just need grep! the API is irrelevant (virtual FS exist) - the interesting part is data locality this 2x2 explains what actually needs to exist a local-remote sync engine for agent data (not unlike the Linear sync engine)
Jeff Huber tweet media
English
19
2
93
9.2K
Lee Butterman retweetledi
Farouk
Farouk@FaroukAdeleke3·
Open sourcing Microcode! Microcode is a context-efficient, general purpose terminal agent fully powered by a packaged `dspy.RLM()` program. Set your own OpenRouter API key via and choice of models. Supports MCP servers too with @MaximeRivest mcp2py. Because the CI/CD of the RLM engine is exposed through @modaicdev, per-user or per-codebase prompt optimization is plug and play. If you set the --verbose flag, it pretty prints the RLM's trajectories as reasoning or code within it's REPL.
English
7
17
140
15.1K
Lee Butterman
Lee Butterman@leebutterman·
@nbaschez We’ve also found that giving Claude access to start and stop an app with multiple daemons and access to read logs is crucial for longer self directed exploration. Any reason you like pm2 over docker?
English
0
0
0
102
Nathan Baschez
Nathan Baschez@nbaschez·
Underrated claude code trick - use pm2 to run the dev server Makes it super easy for claude code to search logs while efficiently managing the context window
English
28
3
345
43.9K
Lee Butterman
Lee Butterman@leebutterman·
@lateinteraction 💯. Python is often my object code, and my human intention from reading and writing in the multi turn code agent setup is the only important part and is often lost in PR comments. I know I especially don’t record praise of what the agent got right that I was worried about
English
0
0
0
43
Omar Khattab
Omar Khattab@lateinteraction·
tl;dr I want to program, not ask someone else—something else?—to program for me. But my programs should focus on the serious, higher leverage aspects of my intent, not the minutiae.
English
8
2
47
4K
Omar Khattab
Omar Khattab@lateinteraction·
I will be so disappointed if the way we build software with AI remains vibe coding rather than a genuinely higher level of abstraction. I want to express the “code” of the system—but shorter and more pleasant—not manage an agent to write 100,000 lines of low-level slop for me.
Omar Khattab@lateinteraction

There’s a growing time and place for ‘vibe coding’, but it’s disastrous that it doesn’t record the produced system at the level of abstraction it was built. Imagine if the only way to use C was to compile isolated snippets of assembly. We need higher-level programming languages.

English
72
48
499
135.5K
Lee Butterman retweetledi
Misha
Misha@drethelin·
I’ve heard of boba tea, but there’s a new place in the neighborhood that sells something they call Kiki tea, so I got some to try
Misha tweet media
English
78
1.6K
22.3K
522.9K
Lee Butterman retweetledi
Salah Alzu'bi
Salah Alzu'bi@salahalzubi401·
Also a big fan of GEPA! If you’re optimization runs are taking a long time you should take a look at our k-way proposal function called GEPA+. It reduces metric runs by 30-40% while improving relative accuracy by 2-5% depending on tasks! x.com/sala88232/stat…
DSPy@DSPyOSS

DSPy on a Pi: Cheap Prompt Optimization with GEPA and Qwen3 “It took me about sixteen hours on a Raspberry Pi to boost performance of chat-to-SQL using Qwen3 0.6B from 7.3% to 28.5%. Using gpt-oss:20b, to boost performance from ~60% to ~85% took 5 days.”

English
1
1
11
802
Lee Butterman retweetledi
DSPy
DSPy@DSPyOSS·
DSPy on a Pi: Cheap Prompt Optimization with GEPA and Qwen3 “It took me about sixteen hours on a Raspberry Pi to boost performance of chat-to-SQL using Qwen3 0.6B from 7.3% to 28.5%. Using gpt-oss:20b, to boost performance from ~60% to ~85% took 5 days.”
DSPy tweet media
English
6
22
173
12.2K
Lee Butterman
Lee Butterman@leebutterman·
@chrisbolas @noah_vandal @DSPyOSS As I’d explained a bit farther down, you want a dense model if you only have 8GB before you hit slow disk (micro sd), and gpt-oss is a MoE, most of your experts aren’t going to mix in, so a dense model like Qwen3 4B Thinking 2507 utilizes memory more
Lee Butterman tweet media
English
2
0
1
47
Lee Butterman
Lee Butterman@leebutterman·
@JoshPurtell I have found that GEPA is extremely sample efficient. I got from 7% to 50% accurate on chat to sql for qwen3 0.6B on a MacBook in a day. And 7% to 28% in one pass through my <300 item dataset with a Qwen3 4B reflection LLM x.com/dspyoss/status…
DSPy@DSPyOSS

DSPy on a Pi: Cheap Prompt Optimization with GEPA and Qwen3 “It took me about sixteen hours on a Raspberry Pi to boost performance of chat-to-SQL using Qwen3 0.6B from 7.3% to 28.5%. Using gpt-oss:20b, to boost performance from ~60% to ~85% took 5 days.”

English
0
2
9
1.1K
Josh
Josh@JoshPurtell·
We ran a few benchmarks of GEPA and MIPROv2 across different implementations, on LangProBe datasets, at small scale. Would love to hear early thoughts / qualms / requested datasets for folks who want to dig deeper. TLDR: they work! docs.usesynth.ai/blog/prompt-op…
constantin@luckenco

has anyone of you used DSPy with GEPA in a production setting? the tutorials state around >10% improvements for most examples. seems like free performance, so i am a bit skeptic.

English
4
2
37
7.2K
Lee Butterman
Lee Butterman@leebutterman·
@noah_vandal @DSPyOSS With under 1GB to spare! But gpt-oss:20b (in Ollama) has weights of ~12GB and takes another few gigs of inference memory so I mostly had my four cores going full time on the 16GB Pi 5. Qwen 4B optimizing Qwen 4B would be even smaller. On a MacBook it is within an afternoon :)
English
1
0
2
80
Lee Butterman
Lee Butterman@leebutterman·
@karanjagtiani04 @DSPyOSS No manual tuning! That’s the whole point :) I just made a pretty basic prompt with some bland facts about the data (multiple rows per paper, one per author, read only, call out content policy violations) and GEPA did the rest, and DSPy was super easy to use to separate concerns
English
1
2
6
1.3K
Karan Jagtiani
Karan Jagtiani@karanjagtiani04·
@DSPyOSS Sixteen hours on a Pi for that jump is pretty impressive, especially with a 0.6B model. Curious how much tuning was manual vs automated?
English
1
0
1
325
Lee Butterman
Lee Butterman@leebutterman·
@lmeyerov Those piles of boring features that are getting automated are all closer to each other than making the initial proofs of concept so it’s easy to imagine that coding agents will get better at them, and that iterative polish is an iceberg of work, esp re compliance
English
1
0
1
56
lmeyerov
lmeyerov@lmeyerov·
Misconceptions on how vibes engineering works on the ground leads to product leaders misunderstanding velocity, tech leads ignoring a superpower, etc. A lot comes to one simple but useful mental shift in development economics: - yes you can have a 10K loc vibed prototype etc next day, but that's not what most of us do or are talking about - more practical is to recast production-grade coding from latency-centric to throughput-centric: * latency-centric: each feature still must go through QA etc, and today's AIs are often even slower per-step today. even items that can be faster might go slower b/c we can put more into them. * throughput-centric: the bigger shift is large swathes of a feature's work items can be automated away by parallel agents, so you can 2X+ your 'boring' feature count - result is individual items on roadmap can be faster or slower, but sheer volume should be up i think we'll get to latency-centric too, cutting in say 50% over next year (!) for good teams, but that's still dwarfed by the throughput-centric shift which can be many more multiples
Paul Xue@pxue

A frustrated startup founder came to me recently complaining about the devs he's been using for the past 10+ years. He's shocked at how fast we're moving on product with just a couple of people. His VP of eng and devs refuse to use AI, still builds on waterfall. Every feature takes 6-8 weeks between ideation, PRD, design, dev and testing. It blows my mind these devs still exist. But times are catching up and we're entering the "adopt or get left behind phase" The CEO and VP has been together for 10+ years, so transition isn't as clear cut. But they're now at the point where it's irresponsible to the investors not to do something. Take this however you will.

English
1
0
1
399
Charles 🎉 Frye
Charles 🎉 Frye@charles_irl·
what's some of your favorite technical documentation? any domain, any medium, broad sense for example some of mine: the Uline catalogs, the FastAPI docs, the IETF RFCs, Byte magazine ads
English
18
2
53
5.1K