Julien Pourcel @ NeurIPS

353 posts

Julien Pourcel @ NeurIPS

Julien Pourcel @ NeurIPS

@PourcelJulien

PhD student at @inria (@flowersinria team) working on LLM4code | Google PhD Fellow 2025 | @ENS_ParisSaclay (MVA)

เข้าร่วม Şubat 2014
1K กำลังติดตาม253 ผู้ติดตาม
ทวีตที่ปักหมุด
Julien Pourcel @ NeurIPS
Julien Pourcel @ NeurIPS@PourcelJulien·
Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!) It brings LLMs from just a few percent on ARC-AGI-1 up to 52% We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code. 🧵
Julien Pourcel @ NeurIPS tweet media
English
8
38
192
31.8K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
Demis Hassabis
Demis Hassabis@demishassabis·
Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!
Demis Hassabis tweet media
English
317
884
8K
926.5K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
Qwen
Qwen@Alibaba_Qwen·
🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…
Qwen tweet media
English
922
2.9K
21.4K
8.9M
Silvia Sapora
Silvia Sapora@silviasapora·
Accepted at #ICLR2026! 1/🧵 Inverse Reinforcement Learning typically produces opaque black-box rewards that are impossible to debug. But what if we could learn rewards as executable, human-readable Python code instead? 🐍 Introducing GRACE: Generating Rewards As CodE. 👇
English
2
6
67
6.8K
Axel Darmouni
Axel Darmouni@ADarmouni·
Self-distillation, on top of being good in setups outside of RLVR, is also in fact very very good in RLVR setups! In « Reinforcement Learning via Self-Distillation », this is what @jonashuebotter et al from ETH Zurich demonstrate Setup is similar to the other self distillation paper: 1- Sample from student 2- Sample from teacher, given additionally this time *feedback* from the environment —> feedback can be environment return, but what works best is one of the solution that is correct in the rollout batch 3- Compute KL divergence from student to teacher to align student They put it in the Science Q&A, Tool use and LiveCodeBench setup and compare it with optimized GRPO (so actually rigorous, giving GRPO the fairest of chances) And it works very, very well, usually quite better than GRPO, which is honestly an incredible result Just like in the other setup as well, they make the training more stable by updating the teacher either through EMA or weight interpolation ; and simplify KL to compute only on top K tokens from student rather than the complete vocabulary A few tidbits that are as well: —> Models trained under self-distillation output way less tokens than trained with GRPO —> Strength of self-distillation scales with model strength —> Logit level SDPO (top-K compute for each token) rather than token-level (top-1 compute for each token) or sequence-level (top-1 compute for each token, averaged over sentence) —> The teacher also becomes better at the problem, but gets caught up by the student —> Less forgetting of other tasks not trained for —> Can be combined with GRPO for a slightly higher performance increase The most amazing thing: *It can also be used for test time learning in verifiable setups* How so? Make a generation -> get env feedback -> perform self-distillation on teacher getting generation+env feedback And it helps small models in solving the hard tasks of LCB to a tremendous amount! Both better than best-of-k or multi-turn A very very cool work, and if self-distillation gets implemented quickly into mainstream libs I’m getting the feeling that SD studies have just begun =) Seems like a breakthrough, congrats to the authors!
Axel Darmouni tweet mediaAxel Darmouni tweet media
English
9
13
116
14.7K
ARC Prize
ARC Prize@arcprize·
ARC Prize 2025 Winners Interviews Paper Award 2nd Place @PourcelJulien, @cedcolas, @pyoudeyer discuss SOAR - a self-improving evolutionary program synthesis framework that fine-tunes an LLM on its own search traces - without human-engineered DSLs or solution datasets.
English
7
16
90
8.7K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
Cédric
Cédric@cedcolas·
Our self-improving genetic algorithm received the 2nd place paper award for the @arcprize! Congrats in particular to @PourcelJulien the experiments wizard! We proposed a simple, general algorithm ⬇️
ARC Prize@arcprize

ARC Prize 2025 Winners Interviews Paper Award 2nd Place @PourcelJulien, @cedcolas, @pyoudeyer discuss SOAR - a self-improving evolutionary program synthesis framework that fine-tunes an LLM on its own search traces - without human-engineered DSLs or solution datasets.

English
1
3
20
980
Julien Pourcel @ NeurIPS รีทวีตแล้ว
François Chollet
François Chollet@fchollet·
Congrats to the ARC Prize 2025 winners! The Grand Prize remains unclaimed, but nevertheless 2025 saw remarkable progress on LLM-driven refinement loops, both with "local" models and with commercial frontier models. We also saw the rise of zero-pretraining DL approaches like HRM and TRM. Lots of new learnings!
ARC Prize@arcprize

Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop

English
17
53
528
74.1K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
ARC Prize
ARC Prize@arcprize·
ARC Prize 2025 Paper Award Winners 1st / "Less is More: Recursive Reasoning with Tiny Networks" (TRM) / A. Jolicoeur-Martineau / $50k 2nd / "Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI" (SOAR) / J. Pourcel et al. / $20k 3rd / "ARC-AGI Without Pretraining" / I. Liao et al. / $5k
ARC Prize tweet media
English
4
30
282
133.3K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
Greg Kamradt
Greg Kamradt@GregKamradt·
ARC Prize 2025 competition concluded today - The year of Refinements Our goal is to bring meaningful open source research into the community and today we awarded $137K to 14 teams Benchmarks matter, but their true value comes from the progress they catalyze ARC Prize 2025 was designed to inspire the community to publish research aimed at building more generalized systems The grand prize remains unclaimed, but the leaderboard reflects strong advances, and all submissions and solutions are now open sourced. Here is a recap of the winners, for more, checkout the great recap by @mikeknoop (link below) ** Paper Prizes ** 1/ Alexia Jolicoeur-Martineau (@jm_alexia) - TRM Tiny Recursive Model (TRM) is a tiny 2-layer network that does recursive reasoning: it keeps a latent state z and a current answer y, repeatedly updates z using the puzzle and y, then refines y from z over many “deep supervision” steps, so it can gradually fix its own mistakes without needing a huge model. It simplifies Hierarchical Reasoning Model (HRM). 2/ Pourcel julien (@PourcelJulien) - Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI SOAR is a self-improving evolutionary program synthesis system: it uses an LLM to sample and refine Python programs for ARC tasks (Sample & Refine phase), then turns all those attempts-both successes and failures-into new problem–solution pairs via hindsight relabeling, and fine-tunes the same LLM so it gets better at both sampling and refinement next time. 3/ ARC-AGI Without Pretraining - Isaac Liao (@LiaoIsaac91893) CompressARC shows that lossless information compression alone can produce intelligent behavior on ARC-AGI: for each puzzle, it builds a randomly initialized neural network and uses gradient descent at inference time to find a compact representation (like a VAE-style loss: cross-entropy + KL) that best “compresses” all the given example grids. ** Top Scores ** 1/ NVARC (@JFPuget, Ivan Sorokin) The NVIDIA team built a huge synthetic dataset of ARC-AGI puzzles, then turned those summaries into Python programs that produce consistent input/output grid pairs. Used test-time fine-tuning (TTFT) plus a fast Depth-First Search decoding process to adapt each model to the hidden test puzzles. 2/ the ARChitects (@dvhrtm, Daniel Franzen, @JDisselh) The ARChitects fine-tune a LLM on ARC-style grids and then use it at test time in two roles: 1) As a generator that, via depth-first search (DFS) over token probabilities, systematically explores the space of high-probability candidate solutions (not just random samples), 2) Second as a scorer that evaluates how likely each complete solution is. 3/ MindsAI @ Tufa Labs (@MindsAI_Jack, @DriesSmit1, @MohamedOsmanML, @bayesilicon) Trained a trimmed CodeT5 encoder–decoder model for years on the massive ARC-AGI Mega dataset (100M+ examples) using span corruption, reversals, and BPE dropout so it learned structure, not surface patterns. At inference, they ran large-scale test-time training (TTT) on thousands of permuted and augmented versions of the test set, then applied AIRV. 4/ Lonnie Lonnie reused the 2024 ARChitects pipeline but treated the random seed as a hyperparameter, systematically exploring seeds to exploit variance on the small 240-task evaluation set, which pushed an otherwise baseline-style system up to 5th place on the private leaderboard. 5/ Guillermo Barbadillo @ Veridas (@guille_bar) Guillermo believes that ARC will ultimately be solved by a search-and-learn approach that combines program synthesis with test-time training (TTT) and hindsight relabeling, so the system can search over code, learn from failed attempts, and steadily refine its solutions. We're going bigger in 2026! Let' go!!
ARC Prize@arcprize

Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop

English
4
16
66
8K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
ARC Prize
ARC Prize@arcprize·
Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop
ARC Prize tweet media
English
25
48
314
221.5K
Julien Pourcel @ NeurIPS รีทวีตแล้ว
Cédric
Cédric@cedcolas·
In San Diego for #NeurIPS Happy to chat about open-endedness, self goal-generation, intrinsic motivations, self-improvement, human-machine collective intelligence Open to hear about research scientist opportunities too Don't hesitate to reach out!
English
3
3
29
2.4K
Julien Pourcel @ NeurIPS
Julien Pourcel @ NeurIPS@PourcelJulien·
Big news: I’m officially a 2025 Google PhD Fellow! 🎓✨ I’m also heading to #NeurIPS2025 in SD! Happy to chat about LLM, code gen, evolutionary algo, open-endedness, self-improvement, enhancing LLM diversity, ARC-AGI, and other subjects. Open to hear about summer internship. ☀️
English
2
4
17
945