

Saurav Jha
103 posts

@saurav_j_
@IVADO_Qc postdoc @Mila_Quebec; interned @TencentGlobal, @SonyAI_global, @Inria_Nancy; ex-MLE @FactSet; PhD @UNSWComputing



Diffusion world models can help test and improve robot policies before running them on real robots. But can the choice of latent space make the WM more faithful? We show that semantic spaces beat reconstruction spaces on task relevant metrics. hskalin.github.io/semantic-wm

Extremely excited to share our recent work on diffusion world models. We ask a simple question - what space supports diffusion world modeling the most and how do we evaluate that?Turns out representation is the answer with JEPA space yielding the strongest diffusion world models!


Diffusion world models can help test and improve robot policies before running them on real robots. But can the choice of latent space make the WM more faithful? We show that semantic spaces beat reconstruction spaces on task relevant metrics. hskalin.github.io/semantic-wm



📣 Announcing the CoLLAs Seminars A year-long exploration of one of the central challenges in AI: building systems that can learn continually, adapt in real time, and improve over their lifetime. Join us on May 13th at 11 am ET as we kick off the series with Pulkit Agrawal speaking on “Rethinking Post training”. ℹ️ Learn more: lnkd.in/erEdDxgP ✉️ Join our mailing list: lnkd.in/eEGwH-3E 🔗 Zoom link for the talk: lnkd.in/ekkHE5nX



📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026





Introversion that occurs during SF winter is on another level: it’s like your body just wants to hibernate and stay warm indoors.







Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵



LLMs can now self-optimize. A new method allows an AI to rewrite its own prompts to achieve up to 35x greater efficiency, outperforming both Reinforcement Learning and Fine-Tuning for complex reasoning. UC Berkeley, Stanford, and Databricks introduce a new method called GEPA (Genetic-Pareto), an autonomous system for prompt optimization. The researchers tested this across diverse tasks like multi-hop Q&A and instruction following. They demonstrated gains using proprietary models like GPT-4.1 Mini and open-source models like Qwen3 8B. Here's a look at how it works: GEPA treats prompt optimization as a genetic evolution problem. It starts with a diverse "pool" of prompt candidates. It uses Pareto optimization to select the "fittest" prompts. It finds the ones that offer the best tradeoff between high performance on a task and low computational cost (measured in "rollouts"). It "evolves" new, better prompts using two key mechanisms: Crossover: Intelligently combining the best parts of two successful "parent" prompts to create a new "child" prompt. Reflective Mutation: This is the self-optimization engine. The system tasks an LLM to analyze its own detailed execution trace (its successes and failures) and then intelligently rewrite its own instructions to fix the flaws. How GEPA fits into your AI strategy: This method provides a powerful new tool without replacing existing ones. Here’s the distinction: GEPA works on its own. You can apply it directly to any base LLM to achieve significant performance gains just by optimizing the prompt. Fine-Tuning teaches the model what (domain knowledge), while GEPA optimizes how the model uses that knowledge (its reasoning process). This makes them powerful complements. You can use GEPA to supercharge a base model, OR you can apply it to an already fine-tuned model to get the absolute best performance from your expert AI. It's a new, flexible layer in the optimization toolkit that allows AI to optimize itself.