DSPy

3.2K posts

DSPy

DSPy

@DSPyOSS

An open-source declarative framework for building modular AI software. Programming—not prompting—LLMs via higher-level abstractions & optimizers.

Katılım Nisan 2025
61 Takip Edilen13.8K Takipçiler
DSPy retweetledi
Noah Ziems
Noah Ziems@NoahZiems·
Your RL algorithm makes *LLMs* smarter. Our RL algorithm makes *LLM Training* smarter. These are not the same
Noah Ziems tweet media
English
3
3
41
2K
DSPy retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
ICYMI: read the blog on Pedagogical RL Instead of sampling blindly from your LLM, leverage the label used for RLVR! Learn to directly approximate the distribution of your LLM's plausible rollouts that are actually correct. Then sample from *that*! noahziems.com/pedagogical-rl
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
7
12
83
6.5K
DSPy retweetledi
Souradip Chakraborty
Souradip Chakraborty@SOURADIPCHAKR18·
If the model can’t already stumble into success or at least useful states, on-policy methods stall. Instead, we argue that an ideal sampler would give our LLM its “nearest successes”: rollouts that are correct, but where every step makes sense to the LLM and thus can be learned
Souradip Chakraborty tweet media
English
3
4
24
4.4K
DSPy
DSPy@DSPyOSS·
@MaximeRivest wow, one year of Maxime in this community :D, we have been so lucky to have you!
English
0
0
8
228
DSPy retweetledi
DSPy retweetledi
Noah Ziems
Noah Ziems@NoahZiems·
Extremely excited about our recent work in Pedagogical RL. I’m optimistic approaches like this are going to completely shift how data collection is done for hard agentic tasks like coding
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
5
13
49
5.2K
DSPy retweetledi
will brown
will brown@willccbb·
@lateinteraction constant factor, close enough :) veeeery nice approach, really excited to dig into it further!
English
3
3
34
6.3K
DSPy retweetledi
Amrit Singh Bedi
Amrit Singh Bedi@amritsinghbedi3·
Are we really using all the available information when we post-train RL on hard tasks? Maybe not. 🧠 New framework — Pedagogical RL: teaching models to teach themselves to sample trajectories that are both correct AND actually learnable. Up to 40% gains over GRPO and OPD 🚀 👇
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
1
6
25
3.4K
DSPy retweetledi
Braden Hancock
Braden Hancock@bradenjhancock·
In other words: Humans are teaching teacher models how to teach other models the way good human teachers teach other humans so we can make smarter models that can teach humans to be smarter. Intuition: A good teacher model will not only lead to the right answer--it will do so following a sequence of steps that the student can follow. Teacher models are penalized for taking leaps that feel like they came out of nowhere. More cool work out of @lateinteraction's CSAIL lab!
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
2
15
55
6.8K
DSPy retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
End the tyranny of on-policy algorithms in LLM post-training! Maybe the key thing isn't whether your rollouts are purely "on-policy" or not, but the extent to which they’re pedagogically useful. Early explorations into newer paradigms for RL by @SOURADIPCHAKR18* @NoahZiems*:
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
7
16
130
10.9K
DSPy retweetledi
Souradip Chakraborty
Souradip Chakraborty@SOURADIPCHAKR18·
🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL
Souradip Chakraborty tweet media
English
15
75
421
90.8K
DSPy retweetledi
Databricks
Databricks@databricks·
Databricks is proud to be a Founding Gold Sponsor of @TheOfficialACM Conference on AI and Agentic Systems—the first ACM conference dedicated to compound AI and agentic systems, with our co-founder @matei_zaharia on the organizing committee. Join us May 26–29 in San Jose for the premier event for rigorous, reproducible research in compound AI architectures, optimization, and deployment. Register today: caisconf.org
Databricks tweet media
English
2
9
37
1.9K
DSPy retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
In San Diego / UCSD today & tomorrow — would love to grab a coffee or say hi to anyone around 👋 please reach out!
English
0
2
18
1.8K