Derek Pham retweetledi
Derek Pham
83 posts

Derek Pham
@pham_derek
Research Engineer @SnorkelAI • Frontier model eval • Expert datasets & reasoning benchmarks • ML systems • Data-centric AI
Katılım Nisan 2015
284 Takip Edilen92 Takipçiler

@karpathy We're still far from AGI.
But agentic benchmarks and RLVR already unlock massive real world leverage in science, coding, ops, and research workflows.
The impact doesn't require generality.
#AIEvaluation
English

@karpathy That's why benchmark design matters more than leaderboard position.
If the signal is hard to game, RLVR drives real capability. If it leaks, you get benchmark theater
English

#NeurIPS2025 reflections
After a week of posters, hallway chats, and workshop deep dives, here are a few themes that stood out...
Training strategies and RL's evolving role: @YejinChoinka's keynote pushed past the "scale-is-all" narrative with a focus on RL--not just as a fine-tuning method, but as part of the pre-training story. The idea of eliciting reasoning behaviors in smaller models through RL is gaining traction.
Benchmark design and meta-evaluation: From Terminal-Bench to agent self-evaluation setups, we're seeing a shift toward evaluating not just outputs but internal reasoning and feedback loops. There's an emerging science of how we measure capability, not just completion.
LLMs in software environments: Papers like SWE-smith and SWE-rebench explored how LLMs perform in interactive dev environments. Robustness to tool changes and realistic regression testing stood out--especially relevant for agent-based use cases.
From scale to specificity: A subtle but important theme...quality, provenance, and realism in data pipelines (both human-labeled and synthetic) are being treated as first-class problems. That feels like a healthy turn from quantity-first mindsets.
Takeaway: NeurIPS isn't just about bigger models anymore--it's about sharper questions.
#NeurIPS #LLMs #AIresearch
English

If you are at neurips, come by the @Shopify booth, let’s talk agents!
Allie@alspee
come meet @Drewch and the sidekick team at booth # 1713 at NeurIPS 2025 ✨
English

@pham_derek @Shopify Yes definitely come by! There’s a sidekick talk at the booth at 11 so I’ll be around for that, come find me then?
English

Excited to be at NeurIPS with the Snorkel research team—great conversations so far on eval, reasoning, and agents
Snorkel AI@SnorkelAI
NeurIPS lunch crew → Snorkel researchers + the always-great @Walshe_tech If you’re at #NeurIPS2025, come say hi — and see everything else we’re doing this week (papers, workshops, events): snorkel.ai/neurips-event/
English
Derek Pham retweetledi

Continuing our #NeurIPS2025 highlights — looking forward to a great conference this year.
📄 Papers exploring expert data, benchmark quality, RL environments, and evaluation design
🧪 The SEA Workshop (Dec 7), bringing together researchers pushing the boundaries of scalable agent evaluation
👥 A strong group of Snorkel researchers in San Diego sharing insights + connecting with the community

English

Scaling AI quality without compromise? Snorkel says yes!
In Part 4 of our rubric series, I break down Snorkel's multi-stage process for using rubrics in our data pipeline.
In one case, we saw human/LLMaJ alignment increase from 37% → 94%.
snorkel.ai/blog/scaling-t…
English
Derek Pham retweetledi

🚨GIVEAWAY🚨
⭐️ 1/1 Kareem Abdul-Jabbar @SportStarsNFT
⭐️ 1 Icons #001 @SportStarsNFT
🐍 @KB24NFT Whitelist Spot
How To Enter 👇
❤️♻️ This Post
➡️ Follow @SportStarsNFT @KB24NFT
✅ Join Discord.gg/G5u7SUN5B8
✅ Join Discord.gg/KB24NFT


English
Derek Pham retweetledi

Powerful story from Doug Young, Kobe’s highschool teammate & Lower Merion High school assistant coach
Join our Discord for special AMA guests sharing their personal Kobe stories 💜💛🐍
Discord - discord.gg/KB24NFT
English

Enter to get the Yeezy Boost 750 'Glow in the Dark' and more at retail price. #GOATSUMMER goat.app.link/fEbYYYNL5E
English

