


Weather Report
5.3K posts

@ReporterWeather
a dude | heaven is all that matters | gonna bring home little echo’s, truffles, Eliza’s




This is really cool. It got me thinking more deeply about personalized RL: what’s the real point of personalizing a model in a world where base models can become obsolete so quickly? The reality in AI is that new models ship every few weeks, each better than the last. And the pace is only accelerating, as we see on the Hugging Face Hub. We are not far away from better base models dropping daily. There’s a research gap in RL here that almost no one is working on. Most LLM personalization research assumes a fixed base model, but very few ask what happens to that personalization when you swap the base model. Think about going from Llama 3 to Llama 4. All the tuned preferences, reward signals, and LoRAs are suddenly tied to yesterday’s model. As a user or a team, you don’t want to reteach every new model your preferences. But you also don’t want to be stuck on an older one just because it knows you. We could call this "RL model transferability": how can an RL trace, a reward signal, or a preference representation trained on model N be distilled, stored, and automatically reapplied to model N+1 without too much user involvement? We solved that in SFT where a training dataset can be stored and reused to train a future model. We also tackled a version of that in RLHF phases somehow but it remain unclear more generally when using RL deployed in the real world. There are some related threads (RLTR for transferable reasoning traces, P-RLHF and PREMIUM for model-agnostic user representations, HCP for portable preference protocols) but the full loop seems under-studied to me. Some of these questions are about off-policy but other are about capabilities versus personalization: which of the old customizations/fixes does the new model already handle out of the box, and which ones are actually user/team-specific to ever be solved by default? That you would store in a skill for now but that RL allow to extend beyond the written guidance level. I have surely missed some work so please post any good work you’ve seen on this topic in the comments.

Ezra Klein: "Having AI summarize a book or paper for me is a disaster. It has no idea what I really wanted to know and wouldn't have made the connections I would've made. I'm interested in the thing I will see that other people wouldn't have seen, and I think AI typically sees what everybody else would see. I'm not saying that AI can't be useful, but I'm pretty against shortcuts. And obviously, you have to limit the amount of work you're doing. You can't read literally everything. But in some ways, I think it's more dangerous to think you've read something that you haven't than to not read it at all. I think the time you spend with things is pretty important." @ezraklein

New blog post: "A sufficiently detailed spec is code" I wrote this because I was tired of people claiming that the future of agentic coding is thoughtful specification work. As I show in the post, the reality devolves into slop pseudocode haskellforall.com/2026/03/a-suff…

This will have ZERO predictive power. Let me explain. I love 'living in the future', and that's why when I saw DALL·E 2 in April 2022, I could already predict that before long we'd get realistic images and video, songs, and eventually even all of it in real time, which is still to be seen. It was easy and plausible to draw that line. But the idea that a toy trying to simulate groups of humans in an Asimov-style psychohistory way could actually have predictive power is mathematically impossible, for a very simple reason: chaos theory. Small changes in the input produce unpredictable changes in the output. Simulating a bunch of Sims won't get you anywhere close to predicting the real world. But as a toy or a science fiction premise, though, it's great! Highly recommend Asimov's "Foundation" and the series "Devs".


Sam Altman: "I bet there is another new architecture to find" Sam Altman believes we are on the verge of discovering a new underlying architecture that will be as big of a leap forward as Transformers were over LSTMs. He noted that we finally have AI models that are smart enough to help conduct this level of research (GPT 5.4 and above 👀) His direct advice to builders looking for the next major leap is to look for a "mega breakthrough" and use current models to help them find it.


The persisting importance of prompt engineering -- and now harness engineering -- is one of the best indicators of how far we are from AGI. A general system doesn't need a task-specific harness. And when provided with instructions, it is robust to phrasing variations.


Introducing a new method to teach LLMs to reason like Bayesians. By training models to mimic optimal probabilistic inference, we improved their ability to update their predictions and generalize across new domains. Learn more: goo.gle/4ue4eqj



