Devin White

83 posts

Devin White banner
Devin White

Devin White

@DevinWhiteAI

ML Researcher @USAEOP | Reinforcement learning, RLHF & LLMs

Tham gia Şubat 2024
182 Đang theo dõi20 Người theo dõi
Tweet ghim
Devin White
Devin White@DevinWhiteAI·
Ever wanted to learn the basics of Fine-tuning a LLM? I just built a complete, single GPU friendly, end to end pipeline for RLHF fine-tuning using @huggingface TRL! Here is the 2 stage process I did: 1⃣ Train a "Judge" (reward model) on human preferences (a subset of the @NVIDIAAI HelpSteer3 dataset) 2⃣ Align @GoogleDeepMind Gemma 3 with the judge using RLOO! Try out the code below👇 #RLHF #ML #AI (Image generated by @NanoBanana )
Devin White tweet media
English
1
0
2
433
Devin White
Devin White@DevinWhiteAI·
@wesroth If true this could be huge for research!
English
0
0
0
137
Wes Roth
Wes Roth@WesRoth·
Grok 4.20 is, by all accounts, going to be insane... early preview researchers out of UCI said "Grok 4.20 found a new Bellman function" this is a credible report that Grok 4.20 is capable of automated theorem discovery mathematicians usually have to guess the formula based on intuition... it's fuzzy (not a technical term) the solution Grok 4.20 gave was "sharp" ( sharp *is* a technical term, meaning it's precisive and accurate)
English
5
2
50
2.4K
Devin White
Devin White@DevinWhiteAI·
@NewsFromGoogle This is great! With how good Gemini 3 Pro is, I'm excited to see what updates Siri gets!
English
0
0
0
39
News from Google
News from Google@NewsFromGoogle·
Joint Statement: Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google's Gemini models and cloud technology. These models will help power future Apple Intelligence features, including a more personalized Siri coming this year. After careful evaluation, Apple determined that Google's Al technology provides the most capable foundation for Apple Foundation Models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple's industry-leading privacy standards.
English
1.6K
6.4K
52.1K
11M
Devin White
Devin White@DevinWhiteAI·
Ever wanted to learn the basics of Fine-tuning a LLM? I just built a complete, single GPU friendly, end to end pipeline for RLHF fine-tuning using @huggingface TRL! Here is the 2 stage process I did: 1⃣ Train a "Judge" (reward model) on human preferences (a subset of the @NVIDIAAI HelpSteer3 dataset) 2⃣ Align @GoogleDeepMind Gemma 3 with the judge using RLOO! Try out the code below👇 #RLHF #ML #AI (Image generated by @NanoBanana )
Devin White tweet media
English
1
0
2
433
Devin White
Devin White@DevinWhiteAI·
@DanKornas Thanks for sharing! Just checked out the repo and it has tons of great resources! The finetuning guide in particular looks super useful! Great resource for LLMs, RL and AI/ML in general!
English
1
0
3
207
Devin White
Devin White@DevinWhiteAI·
⚙️Why This Project? The goal of this project is to provide an easy to use template for RLHF experiments, research or just fun, while also being able to fit on a consumer GPU and being easy to use!
English
1
0
1
113
Devin White
Devin White@DevinWhiteAI·
🚀Excited to share my latest project: A simple implementation of RLHF for Language Models using @GoogleDeepMind Gemma 3! It demos the complete training loop from reward modeling to policy training, all powered by @huggingface TRL! Ideal for learning RLHF basics without the complexity. Thread 👇 #AI #ML #RLHF #HuggingFace #Gemma
English
1
0
4
167
Devin White
Devin White@DevinWhiteAI·
I’m especially thankful to all my collaborators, mentors, friends and the researchers I had the chance to meet, learn from, and exchange ideas with.
English
0
0
1
55
Devin White
Devin White@DevinWhiteAI·
As 2025 comes to a close, I wanted to say thank you to everyone for their support throughout the year! Some research highlights are in the thread 🧵 Looking forward to what 2026 has in store and excited for some updates coming soon! #AI #ML #ReinforcementLearning #RLHF
English
1
0
3
90
Devin White
Devin White@DevinWhiteAI·
Tomorrow at #NeurIPS2025 I’ll be presenting “Human-Inspired Multi-Level Reinforcement Learning” at the ARLET workshop. 🕤 Poster Session 2 - 3:30 PM 📍 Upper Level Room 31ABC Working on RL/RLHF? Come say hi 👋 #ReinforcementLearning #RLHF
Devin White tweet media
English
0
0
5
410
Devin White đã retweet
Manling Li
Manling Li@ManlingLi_·
We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still deep in the honeymoon phase. The only reason is the students are so amazing, making my transition so much easier. One year in, they already collected paper awards, orals, spotlights, etc What makes me proudest is they are vividly alive: curious, playful, confident in their own weird way, light up when talking about ideas, and never afraid to explore "the thing might fail". Everyone is just… themselves. And somehow, that version of themselves keeps shipping amazing work. In today's anxious academic world, this kind of aliveness is what I will try best to protect. Maybe the best part of being an advisor is that every student is so different and unique lol Interestingly, coming to second year, they've got their own passions, I can't just plug my ideas into their heads. So when I get excited about sth new, my first thought is: "Okay, time to find some fresh first-years who will be thrilled about this!" MLL lab is 1 year old, we started right in Oct 2024. We are growing and looking for more phds to join us! 1. Why our lab? (1/2) 2. Why @northwesterncs? (2/2) In 2025 alone: NU has 7 faculty as Sloan Fellows, plus a Nobel winner! Check more below
Manling Li tweet media
English
30
142
1K
193.2K
Devin White
Devin White@DevinWhiteAI·
🚨Exciting news! Our paper “Human-Inspired Multi-Level Reinforcement Learning” was accepted to the ARLET Workshop @ NeurIPS 2025! In this paper we not only do Rating-based Reinforcement Learning but use the ratings as demonstrations for learning as well! Excited to share more at the poster session! 🚀 #NeurIPS2025 #ReinforcementLearning #AIResearch @arlet_workshop
English
0
0
2
145