DSPy

3K posts

DSPy

DSPy

@DSPyOSS

An open-source declarative framework for building modular AI software. Programming—not prompting—LLMs via higher-level abstractions & optimizers.

Katılım Nisan 2025
59 Takip Edilen13K Takipçiler
DSPy retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
late interaction model (150M) beats the 54x larger Qwen3-8B-Embedding by... hmm, looks like up to 34% relative increase :D also really funny that the entire top section of the BC+ leaderboard, sorted by Recall, is just late interaction models by @LightOnIO and @mixedbreadai
Omar Khattab tweet media
Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English
9
16
189
13.4K
DSPy retweetledi
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics
Antoine Chaffin tweet media
English
19
43
281
67.8K
DSPy retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
Antoine and team had trained a nice ColBERT late interaction model last year... Now they decided to try it on BrowseComp+, the canonical "deep research" task. Guess what, it's not only the strongest method by far but also basically solved the task (~90%). Who would have thunk!
Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English
2
8
59
4.1K
DSPy
DSPy@DSPyOSS·
@Dropbox thanks for writing this! it will be quite informative for the community
English
1
0
22
1.9K
DSPy retweetledi
Dropbox
Dropbox@Dropbox·
How we used DSPy to turn our relevance judge into a measurable optimization loop, making it more reliable and scalable in Dropbox Dash.
English
10
31
235
94.3K
DSPy retweetledi
Ramiro Salas
Ramiro Salas@ramirosalas·
My Obsidian vault became a mess over time, so I wrote a program that uses @DSPyOSS and RLM to completely refactor it into PARA + Zettelkasten, a perfect use for RLM since many notes were very large. Remarkable results. Link ⬇
English
8
9
268
26.8K
DSPy retweetledi
Wesley Smith
Wesley Smith@neowes2025·
I really don't understand this karpathy/autoresearch hype. I mean, it's a cool project, but haven't we been doing this kind of thing for a while now? What is different from DSPy, GEPA and that whole area of tools? What am I missing?
English
29
7
227
40.4K
DSPy retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
I've been eagerly awaiting this release from the @mixedbreadai folks. They're world-leading experts in late interaction retrieval. And today they remind us that late interaction done well makes all your favorite embedding models look like they don't work.
Omar Khattab tweet media
Mixedbread@mixedbreadai

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

English
9
23
197
21.6K
DSPy retweetledi
Drew Breunig
Drew Breunig@dbreunig·
I've always wanted to doc for @DSPyOSS that's akin to @rails' excellent "Getting Started", that covers the basics while building a CMS. I think this is half there...
English
0
1
15
2.7K
DSPy retweetledi
Drew Breunig
Drew Breunig@dbreunig·
A DSPy tutorial: learn the fundamentals while building a deep research agent. Covers Signatures & Modules: what they are and why they matter. No DSPy experience required. cmpnd.ai/blog/learn-dsp…
English
3
35
299
31.1K
DSPy retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
love to see more people feel the vibe of having LLM-driven learning algorithms optimize your systems! :D
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
9
9
130
13.9K
DSPy retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
. @gepa_ai + @DSPyOSS used for agent self-evolution and skill optimization by @NousResearch @Teknium to get +39.5% gains! Check out the full report to see how to build self-optimizing agents using GEPA.
Lakshya A Agrawal tweet media
English
6
18
157
10.6K
DSPy retweetledi