Derek Pham

83 posts

Derek Pham

@pham_derek

Research Engineer @SnorkelAI • Frontier model eval • Expert datasets & reasoning benchmarks • ML systems • Data-centric AI

Katılım Nisan 2015

284 Takip Edilen92 Takipçiler

Derek Pham retweetledi

vincent sunn chen@vincentsunnchen·11 Şub

x.com/i/article/2021…

ZXX

323

145K

Derek Pham@pham_derek·26 Ara

@karpathy We're still far from AGI. But agentic benchmarks and RLVR already unlock massive real world leverage in science, coding, ops, and research workflows. The impact doesn't require generality. #AIEvaluation

English

Derek Pham@pham_derek·26 Ara

@karpathy That's why benchmark design matters more than leaderboard position. If the signal is hard to game, RLVR drives real capability. If it leaks, you get benchmark theater

English

Derek Pham@pham_derek·26 Ara

A pattern I'm noticing in newer agentic benchmarks isn't about scores, but about what's verifiable. As RL from verifiable rewards (RLVR) becomes more dominant, benchmarks increasingly define where models can meaningfully improve

English

346

Derek Pham@pham_derek·10 Ara

#NeurIPS2025 reflections After a week of posters, hallway chats, and workshop deep dives, here are a few themes that stood out... Training strategies and RL's evolving role: @YejinChoinka's keynote pushed past the "scale-is-all" narrative with a focus on RL--not just as a fine-tuning method, but as part of the pre-training story. The idea of eliciting reasoning behaviors in smaller models through RL is gaining traction. Benchmark design and meta-evaluation: From Terminal-Bench to agent self-evaluation setups, we're seeing a shift toward evaluating not just outputs but internal reasoning and feedback loops. There's an emerging science of how we measure capability, not just completion. LLMs in software environments: Papers like SWE-smith and SWE-rebench explored how LLMs perform in interactive dev environments. Robustness to tool changes and realistic regression testing stood out--especially relevant for agent-based use cases. From scale to specificity: A subtle but important theme...quality, provenance, and realism in data pipelines (both human-labeled and synthetic) are being treated as first-class problems. That feels like a healthy turn from quantity-first mindsets. Takeaway: NeurIPS isn't just about bigger models anymore--it's about sharper questions. #NeurIPS #LLMs #AIresearch

English

10.4K

Derek Pham@pham_derek·5 Ara

@Drewch @Shopify Likewise—thanks again for the great chat and for the Sidekick insights!

English

Andrew McNamara@Drewch·4 Ara

@pham_derek @Shopify Enjoyed chatting! Stay in touch

English

Andrew McNamara@Drewch·3 Ara

If you are at neurips, come by the @Shopify booth, let’s talk agents!

Allie@alspee

come meet @Drewch and the sidekick team at booth # 1713 at NeurIPS 2025 ✨

English

3.9K

Derek Pham@pham_derek·3 Ara

@Drewch @Shopify Perfect, I’ll come by then and say hi. Looking forward to the sidekick talk!

English

Andrew McNamara@Drewch·3 Ara

@pham_derek @Shopify Yes definitely come by! There’s a sidekick talk at the booth at 11 so I’ll be around for that, come find me then?

English

Derek Pham@pham_derek·3 Ara

Excited to be at NeurIPS with the Snorkel research team—great conversations so far on eval, reasoning, and agents

Snorkel AI@SnorkelAI

NeurIPS lunch crew → Snorkel researchers + the always-great @Walshe_tech If you’re at #NeurIPS2025, come say hi — and see everything else we’re doing this week (papers, workshops, events): snorkel.ai/neurips-event/

English

492

Derek Pham@pham_derek·26 Kas

@_rajanagarwal Would love to connect and have fun! 😎

English

122

rajan agarwal@_rajanagarwal·26 Kas

i'll be at neurips with Amazon AGI next tues-sat! lmk if you want to chat about RL, browser-use or world models (or just have fun)

English

8.9K

Derek Pham retweetledi

Snorkel AI@SnorkelAI·20 Kas

Continuing our #NeurIPS2025 highlights — looking forward to a great conference this year. 📄 Papers exploring expert data, benchmark quality, RL environments, and evaluation design 🧪 The SEA Workshop (Dec 7), bringing together researchers pushing the boundaries of scalable agent evaluation 👥 A strong group of Snorkel researchers in San Diego sharing insights + connecting with the community

English

1.3K

Derek Pham@pham_derek·17 Eki

Scaling AI quality without compromise? Snorkel says yes! In Part 4 of our rubric series, I break down Snorkel's multi-stage process for using rubrics in our data pipeline. In one case, we saw human/LLMaJ alignment increase from 37% → 94%. snorkel.ai/blog/scaling-t…

English

840

Derek Pham retweetledi

KB24 NFT@KB24NFT·18 Eki

🚨GIVEAWAY🚨 ⭐️ 1/1 Kareem Abdul-Jabbar @SportStarsNFT ⭐️ 1 Icons #001 @SportStarsNFT 🐍 @KB24NFT Whitelist Spot How To Enter 👇 ❤️♻️ This Post ➡️ Follow @SportStarsNFT @KB24NFT ✅ Join Discord.gg/G5u7SUN5B8 ✅ Join Discord.gg/KB24NFT

English

116

379

514

Derek Pham retweetledi

KB24 NFT@KB24NFT·12 Eki

Powerful story from Doug Young, Kobe’s highschool teammate & Lower Merion High school assistant coach Join our Discord for special AMA guests sharing their personal Kobe stories 💜💛🐍 Discord - discord.gg/KB24NFT

English

106

185

Derek Pham@pham_derek·17 Şub

#sweepstakes twitter.com/xbox/status/96…

Xbox@Xbox

RT for a chance to win an Xbox One X console inspired by the #AirJordan III. NoPurchNec. Ends 02/21/18. #Sweepstakes rules: xbx.lv/2CpRp0E

QME

Derek Pham@pham_derek·26 Tem

Enter to get the Yeezy Boost 750 'Glow in the Dark' and more at retail price. #GOATSUMMER goat.app.link/fEbYYYNL5E

English

Keşfet

@karpathy @YejinChoinka @Drewch @Shopify @_rajanagarwal @KB24NFT @elonmusk @BarackObama