Rafael Pardinas

667 posts

Rafael Pardinas

@muchomuchacho

RL @ServiceNowRSRCH

London Katılım Temmuz 2007

177 Takip Edilen309 Takipçiler

Rafael Pardinas@muchomuchacho·5d

@vivek_2332 I feel like PipelineRL should be mentioned louder here :)

English

Vivek@vivek_2332·6d

-> the authors ablate every major design choice one at a time on an 8b model. pipelinerl beats ppo-off-policy on efficiency while hitting similar ceiling. -> cispo and gspo both crush dapo on asymptotic pass rate with cispo being more robust to hyperparameter choices. -> numerical mismatch between inference and training kernels was corrupting the importance sampling ratio. switching the lm head to fp32 leads to huge gains. ->dropping zero-variance prompts from the batch and permanently removing prompts above 0.9 pass rate both lift the asymptote. easy problems and dead gradient signals are just wasting compute. -> takeway: loss type, fp32 fix and off-policy algorithm actually raise the ceiling. -> everything else like aggregation, normalization, curriculum mostly just makes you climb faster. always prioritize A over B.

English

456

Vivek@vivek_2332·6d

notes on the scaleRL paper. -> had read it before but never went this deep -> way more insights packed in here than i expected. -> one of the best structured rl papers out right now. some amazing findings. check it out!! 🧵(1/n)

English

186

12.2K

Rafael Pardinas retweetledi

Alexandre Lacoste@alex_lacoste_·19 Mar

We're sitting on a gold mine of data for evaluation and post-training. Hundreds of agentic benchmarks, rich structured environments, verifiable signal. Most of it is sitting idle. Not because nobody wants it, but because the engineering to use it is brutal. 🧵

English

5.9K

Rafael Pardinas@muchomuchacho·10 Mar

@amilabs Why not London?

English

395

AMI Labs@amilabs·10 Mar

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

347

882

8.5K

4.8M

Rafael Pardinas@muchomuchacho·28 Şub

@growing_daniel Cooked

English

Daniel@growing_daniel·27 Şub

if the US military can't use AI to kill people then what the hell is the point of all this? improving the human condition?? wake the fuck up bucko

English

1.3K

49.2K

Rafael Pardinas@muchomuchacho·20 Şub

@InternetH0F Wait until you find out there's people being born in 2026

English

internet hall of fame@InternetH0F·20 Şub

I have emails that are older than lil dude

English

110

2.2K

68.2K

Rafael Pardinas@muchomuchacho·9 Şub

@simonw pretty metal exhaustion indeed 🤘

English

Simon Willison@simonw·9 Şub

Interesting research in HBR today about how the productivity boost you can get from AI tools can lead to burnout or general metal exhaustion, something I've noticed in my own work simonwillison.net/2026/Feb/9/ai-…

English

124

205

1.6K

227.9K

Rafael Pardinas retweetledi

Emiliano Penaloza@emilianopp_·6 Şub

Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10

English

428

45.4K

Rafael Pardinas retweetledi

ServiceNow AI Research@ServiceNowRSRCH·2 Şub

Incredibly proud of the PipelineRL team!!!

Rafael Pardinas@muchomuchacho

PipelineRL got accepted to TMLR 🎉 ~2x faster on-policy RL training through in-flight weight updates. Making LLM agents training fly at @ServiceNowRSRCH @alexpiche_ @DBahdanau @ehsk0 Paper: arxiv.org/abs/2509.19128 Code: github.com/ServiceNow/Pip…

English

1.4K

Rafael Pardinas@muchomuchacho·30 Oca

English

1.6K

Rafael Pardinas@muchomuchacho·20 Oca

@nisten @open_erv This is biased generalisation

English

nisten🇨🇦e/acc@nisten·19 Oca

@open_erv Impoverished mentality of grasping at the kWh instead of just making more...

English

108

Open_ERV@open_erv·19 Oca

This would be about $265 in electricity though, at 16 cents per kWh, on low. You'd have to factor in interest to calculate the equivalent upfront cost, but it would be a lot more than $25, because of the money you'd save on electricity. A lasko 3733 on low gives about 450 CFM cadr and uses 50-55 watts. The current BQF prototypes at 300 rpm use 12 watts and gives 680 CFM CADR with the same filters. Turn it down even lower and yeah it would be <1/5 the electrical consumption. So it is possible to make something even cheaper still, through better engineering, and also better in every other way. On high it would be about 33,000*96/1000*0.16=506.9 bucks. That's 900 cfm cadr or so. A bqf gives 1030 CFM cadr with the same filters, and uses only 26 watts. So the difference is $369. You could pay $394 for the fan, ignoring interest, and it would be the same actual net expenditure over that operating period. Again not including interest, which would budge that number down.

Liesl McConchie@Liesl4CleanAir

💔RIP Lasko Fan💔 Based on my estimations, this $25 Lasko fan provided my family with over 33,000 hours of clean air. Yes, there were a few filter changes over those 33,000 hours but a CR box is still the most affordable way to access clean indoor air. @CRFoundationUS

English

381

Rafael Pardinas@muchomuchacho·20 Oca

@FitFusion__ No

338

Fit_Fusion@FitFusion__·19 Oca

I learned this trick from a friend in Italy! Now I only make pasta this way

English

152

1.2K

245.4K

Rafael Pardinas@muchomuchacho·17 Oca

@eliebakouch You should probably read the PipelineRL paper: arxiv.org/abs/2509.19128

English

185

elie@eliebakouch·16 Oca

i'm in my RL training at scale era, what are the best paper/tech report i should read? so far on my reading list > inclusion ai ring 1T paper > longcat flash thinking (first time reading it, what a banger, will make a thread later) > scale RL > minimax M1

English

369

42.5K

Rafael Pardinas@muchomuchacho·26 Kas

Train reasoning models 3x faster without sacrificing on-policy learning github.com/ServiceNow/Pip…

Rishabh Agarwal@agarwl_

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to generator (to generate data from our latest policy being trained). (Conventional PPO-off-policy) A naive approach would be to "start generators on a batch, wait for all sequences to complete, update the model weights for both trainers and generators, and repeat. Unfortunately, this approach leads to idle generators and low pipeline efficiency due to heterogeneous completion times. (Pipeline-RL) Instead, we simply let the generators continue generating tokens without discarding or finishing ongoing generations in-flight whenever we need to do a weight update -- doing an "in-flight" weight update. As such our KV caches for these generations would be stale, as they would come from LLM with earlier copy(ies) of the weights) but this is ok (see below).

English

Rafael Pardinas@muchomuchacho·25 Ara

@tntsports Rafa Nadal is the GOAT

English

TNT Sports@tntsports·24 Ara

Rafa Nadal’s reaction as a plaque of his footprint is unveiled on Court Philippe Chatrier says it all 🥹❤️

English

386

3.9K

55.5K

2.5M

Rafael Pardinas@muchomuchacho·26 Kas

image on the left: rewards image on the right: clamp ratio

English

Rafael Pardinas@muchomuchacho·26 Kas

You can now train reasoning models with GSPO in PipelineRL: sequence-level optimisation + async weight updates = faster, more stable RL training. Can you guess which is which? @ServiceNowRSRCH

English

243

Rafael Pardinas@muchomuchacho·26 Kas

Blue = GSPO, Purple = GRPO Paper: arxiv.org/abs/2509.19128 Code: github.com/ServiceNow/Pip…

Filipino

Rafael Pardinas retweetledi

🇺🇦 Dzmitry Bahdanau@DBahdanau·7 Kas

i've been waiting for this moment since our initial PipelineRL blog post in May :) 🕺🕺🕺

Hamish Ivison@hamishivi

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance (combined with some other threading etc. updates). Here's IFEval perf for an internal model (same data, same starting model, same bsz). Same number of training steps, same end perf, but PipelineRL is much faster.

English

100

10.8K

Rafael Pardinas@muchomuchacho·7 Kas

Such a nice feeling to see it flying out there

Hamish Ivison@hamishivi

English

Keşfet

@vivek_2332 @amilabs @growing_daniel @InternetH0F @simonw @ServiceNowRSRCH @alexpiche_ @DBahdanau