Derek Pham

83 posts

Derek Pham

Derek Pham

@pham_derek

Research Engineer @SnorkelAI • Frontier model eval • Expert datasets & reasoning benchmarks • ML systems • Data-centric AI

Katılım Nisan 2015
284 Takip Edilen92 Takipçiler
Derek Pham
Derek Pham@pham_derek·
@karpathy We're still far from AGI. But agentic benchmarks and RLVR already unlock massive real world leverage in science, coding, ops, and research workflows. The impact doesn't require generality. #AIEvaluation
English
0
0
5
73
Derek Pham
Derek Pham@pham_derek·
@karpathy That's why benchmark design matters more than leaderboard position. If the signal is hard to game, RLVR drives real capability. If it leaks, you get benchmark theater
English
1
0
6
89
Derek Pham
Derek Pham@pham_derek·
A pattern I'm noticing in newer agentic benchmarks isn't about scores, but about what's verifiable. As RL from verifiable rewards (RLVR) becomes more dominant, benchmarks increasingly define where models can meaningfully improve
English
1
1
5
346
Derek Pham
Derek Pham@pham_derek·
#NeurIPS2025 reflections After a week of posters, hallway chats, and workshop deep dives, here are a few themes that stood out... Training strategies and RL's evolving role: @YejinChoinka's keynote pushed past the "scale-is-all" narrative with a focus on RL--not just as a fine-tuning method, but as part of the pre-training story. The idea of eliciting reasoning behaviors in smaller models through RL is gaining traction. Benchmark design and meta-evaluation: From Terminal-Bench to agent self-evaluation setups, we're seeing a shift toward evaluating not just outputs but internal reasoning and feedback loops. There's an emerging science of how we measure capability, not just completion. LLMs in software environments: Papers like SWE-smith and SWE-rebench explored how LLMs perform in interactive dev environments. Robustness to tool changes and realistic regression testing stood out--especially relevant for agent-based use cases. From scale to specificity: A subtle but important theme...quality, provenance, and realism in data pipelines (both human-labeled and synthetic) are being treated as first-class problems. That feels like a healthy turn from quantity-first mindsets. Takeaway: NeurIPS isn't just about bigger models anymore--it's about sharper questions. #NeurIPS #LLMs #AIresearch
English
2
10
59
10.4K
Derek Pham
Derek Pham@pham_derek·
@Drewch @Shopify Likewise—thanks again for the great chat and for the Sidekick insights!
English
0
0
1
26
Derek Pham
Derek Pham@pham_derek·
@Drewch @Shopify Perfect, I’ll come by then and say hi. Looking forward to the sidekick talk!
English
1
0
1
34
Andrew McNamara
Andrew McNamara@Drewch·
@pham_derek @Shopify Yes definitely come by! There’s a sidekick talk at the booth at 11 so I’ll be around for that, come find me then?
English
1
0
0
65
rajan agarwal
rajan agarwal@_rajanagarwal·
i'll be at neurips with Amazon AGI next tues-sat! lmk if you want to chat about RL, browser-use or world models (or just have fun)
English
7
2
94
8.9K
Derek Pham retweetledi
Snorkel AI
Snorkel AI@SnorkelAI·
Continuing our #NeurIPS2025 highlights — looking forward to a great conference this year. 📄 Papers exploring expert data, benchmark quality, RL environments, and evaluation design 🧪 The SEA Workshop (Dec 7), bringing together researchers pushing the boundaries of scalable agent evaluation 👥 A strong group of Snorkel researchers in San Diego sharing insights + connecting with the community
Snorkel AI tweet media
English
1
4
17
1.3K
Derek Pham
Derek Pham@pham_derek·
Scaling AI quality without compromise? Snorkel says yes! In Part 4 of our rubric series, I break down Snorkel's multi-stage process for using rubrics in our data pipeline. In one case, we saw human/LLMaJ alignment increase from 37% → 94%. snorkel.ai/blog/scaling-t…
English
0
4
11
840
Derek Pham retweetledi
KB24 NFT
KB24 NFT@KB24NFT·
Powerful story from Doug Young, Kobe’s highschool teammate & Lower Merion High school assistant coach Join our Discord for special AMA guests sharing their personal Kobe stories 💜💛🐍 Discord - discord.gg/KB24NFT
English
19
106
185
0