Self
5.9K posts

Self
@SelfInfinity
builder. exploring the void. On a quest for the sublime and eternal




Can you imagine being a "frontier" lab that's raised like a billion dollars and now you can't release your latest model because it can't beat deepseek? 🐳 Sota can be a bitch if thats your target







Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…


sora is number 1 in the app store! it's been epic to see what the collective creativity of humanity is capable of so far. team is iterating fast and listening to feedback. feel free to drop us feature requests! (we're sending more invite codes soon, i promise!)


.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI


🥸 Many new Qwen models are coming soon, all empowered with enhanced code capabilities.






















