Rafael Pardinas

667 posts

Rafael Pardinas

Rafael Pardinas

@muchomuchacho

RL @ServiceNowRSRCH

London Katılım Temmuz 2007
177 Takip Edilen309 Takipçiler
Vivek
Vivek@vivek_2332·
-> the authors ablate every major design choice one at a time on an 8b model. pipelinerl beats ppo-off-policy on efficiency while hitting similar ceiling. -> cispo and gspo both crush dapo on asymptotic pass rate with cispo being more robust to hyperparameter choices. -> numerical mismatch between inference and training kernels was corrupting the importance sampling ratio. switching the lm head to fp32 leads to huge gains. ->dropping zero-variance prompts from the batch and permanently removing prompts above 0.9 pass rate both lift the asymptote. easy problems and dead gradient signals are just wasting compute. -> takeway: loss type, fp32 fix and off-policy algorithm actually raise the ceiling. -> everything else like aggregation, normalization, curriculum mostly just makes you climb faster. always prioritize A over B.
Vivek tweet mediaVivek tweet mediaVivek tweet media
English
2
0
2
456
Vivek
Vivek@vivek_2332·
notes on the scaleRL paper. -> had read it before but never went this deep -> way more insights packed in here than i expected. -> one of the best structured rl papers out right now. some amazing findings. check it out!! 🧵(1/n)
Vivek tweet media
English
3
14
186
12.2K
Rafael Pardinas retweetledi
Alexandre Lacoste
Alexandre Lacoste@alex_lacoste_·
We're sitting on a gold mine of data for evaluation and post-training. Hundreds of agentic benchmarks, rich structured environments, verifiable signal. Most of it is sitting idle. Not because nobody wants it, but because the engineering to use it is brutal. 🧵
Alexandre Lacoste tweet media
English
1
14
35
5.9K
AMI Labs
AMI Labs@amilabs·
Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.
AMI Labs tweet media
English
347
882
8.5K
4.8M
Daniel
Daniel@growing_daniel·
if the US military can't use AI to kill people then what the hell is the point of all this? improving the human condition?? wake the fuck up bucko
English
70
29
1.3K
49.2K
internet hall of fame
internet hall of fame@InternetH0F·
I have emails that are older than lil dude
English
42
110
2.2K
68.2K
Simon Willison
Simon Willison@simonw·
Interesting research in HBR today about how the productivity boost you can get from AI tools can lead to burnout or general metal exhaustion, something I've noticed in my own work simonwillison.net/2026/Feb/9/ai-…
English
124
205
1.6K
227.9K
Rafael Pardinas retweetledi
Emiliano Penaloza
Emiliano Penaloza@emilianopp_·
Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10
English
11
77
428
45.4K
Open_ERV
Open_ERV@open_erv·
This would be about $265 in electricity though, at 16 cents per kWh, on low. You'd have to factor in interest to calculate the equivalent upfront cost, but it would be a lot more than $25, because of the money you'd save on electricity. A lasko 3733 on low gives about 450 CFM cadr and uses 50-55 watts. The current BQF prototypes at 300 rpm use 12 watts and gives 680 CFM CADR with the same filters. Turn it down even lower and yeah it would be <1/5 the electrical consumption. So it is possible to make something even cheaper still, through better engineering, and also better in every other way. On high it would be about 33,000*96/1000*0.16=506.9 bucks. That's 900 cfm cadr or so. A bqf gives 1030 CFM cadr with the same filters, and uses only 26 watts. So the difference is $369. You could pay $394 for the fan, ignoring interest, and it would be the same actual net expenditure over that operating period. Again not including interest, which would budge that number down.
Liesl McConchie@Liesl4CleanAir

💔RIP Lasko Fan💔 Based on my estimations, this $25 Lasko fan provided my family with over 33,000 hours of clean air. Yes, there were a few filter changes over those 33,000 hours but a CR box is still the most affordable way to access clean indoor air. @CRFoundationUS

English
2
0
1
381
Fit_Fusion
Fit_Fusion@FitFusion__·
I learned this trick from a friend in Italy! Now I only make pasta this way
English
51
152
1.2K
245.4K
elie
elie@eliebakouch·
i'm in my RL training at scale era, what are the best paper/tech report i should read? so far on my reading list > inclusion ai ring 1T paper > longcat flash thinking (first time reading it, what a banger, will make a thread later) > scale RL > minimax M1
English
18
16
369
42.5K
TNT Sports
TNT Sports@tntsports·
Rafa Nadal’s reaction as a plaque of his footprint is unveiled on Court Philippe Chatrier says it all 🥹❤️
English
386
3.9K
55.5K
2.5M
Rafael Pardinas
Rafael Pardinas@muchomuchacho·
image on the left: rewards image on the right: clamp ratio
English
0
0
0
51
Rafael Pardinas
Rafael Pardinas@muchomuchacho·
You can now train reasoning models with GSPO in PipelineRL: sequence-level optimisation + async weight updates = faster, more stable RL training. Can you guess which is which? @ServiceNowRSRCH
Rafael Pardinas tweet media
English
2
4
4
243
Rafael Pardinas retweetledi
🇺🇦 Dzmitry Bahdanau
🇺🇦 Dzmitry Bahdanau@DBahdanau·
i've been waiting for this moment since our initial PipelineRL blog post in May :) 🕺🕺🕺
Hamish Ivison@hamishivi

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance (combined with some other threading etc. updates). Here's IFEval perf for an internal model (same data, same starting model, same bsz). Same number of training steps, same end perf, but PipelineRL is much faster.

English
2
7
100
10.8K
Rafael Pardinas
Rafael Pardinas@muchomuchacho·
Such a nice feeling to see it flying out there
Hamish Ivison@hamishivi

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance (combined with some other threading etc. updates). Here's IFEval perf for an internal model (same data, same starting model, same bsz). Same number of training steps, same end perf, but PipelineRL is much faster.

English
0
0
1
31