Alexandre Brown 🇨🇦

769 posts

Alexandre Brown 🇨🇦

@AlexandreBrown0

PhD student, researcher at @Mila_Quebec ~ RL , robotics and stuff

Canada Katılım Eylül 2017

1.2K Takip Edilen185 Takipçiler

Sabitlenmiş Tweet

Alexandre Brown 🇨🇦@AlexandreBrown0·15 Ağu

🚀 I'm excited to share our new paper: SegDAC: Segmentation-Driven Actor-Critic for Visual Reinforcement Learning 🧠 SegDAC combines large vision models with online RL to reason about its environment at the object and sub-object level, avoiding noisy pixel-level reasoning. 🛠️ Using YOLO-World and SAM, SegDAC breaks the scene into semantically meaningful segments and learns to attend to a variable number of segments and proprioceptive signals, focusing on the most relevant information to complete the task. ⚡ Trained purely with online RL, without human labels or demonstrations. 🏆 Outperforms previous online RL state-of-the-art methods across all difficulty levels on our challenging visual generalization benchmark, with up to 2x better visual generalization in the hardest setting. 📄 Paper: arxiv.org/pdf/2508.09325 🌐 Project Page: segdac.github.io Work done with @GlenBerseth at @Mila_Quebec #ReinforcementLearning #RobotLearning #ArtificialIntelligence #Robotics

English

10.8K

Alexandre Brown 🇨🇦 retweetledi

Allen Nie (🇺🇦☮️)@allenainie·2d

What a lovely way to continue this debate in 2026 💪

Niklas Muennighoff@Muennighoff

One gem from Composer paper is that RL improved both pass @k & pass@1. Suggests RL does not just reweigh existing capabilities but also teaches new ones? 💎

English

Alexandre Brown 🇨🇦 retweetledi

Roger Creus Castanyer@creus_roger·2d

🚀 I vibecoded yet another autoresearch tool This one works with SLURM clusters and lets Claude Code agents run experiments completely hands-off for weeks. It's called xgenius — open source, works with ANY codebase. github.com/roger-creus/xg…

English

1.1K

Alexandre Brown 🇨🇦 retweetledi

𝚂𝚗𝚘𝚠𝙱𝚛𝚘@MeltingSnowBro·20 Mar

@lporiginalg

QME

104.7K

Alexandre Brown 🇨🇦 retweetledi

Yann LeCun@ylecun·10 Mar

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

870

1.9K

19.3K

2.6M

Alexandre Brown 🇨🇦@AlexandreBrown0·9 Mar

@francoisfleuret Do we have examples of hypernetworks working at large scale? That feels like the missing piece here.

English

282

François Fleuret@francoisfleuret·9 Mar

BTW are hyper-networks a thing of the past?

English

14.8K

Alexandre Brown 🇨🇦 retweetledi

Stone Tao@Stone_Tao·25 Şub

I will be giving a talk at UPenn @GRASPlab tomorrow 3-4PM EST on my research in sim/robotics. I’ll be discussing how sim integrated robot learning can drive and accelerate robotics progress. If you are in the area let’s meet up! Link with more details in the thread

English

2.8K

Alexandre Brown 🇨🇦 retweetledi

Sonia Joseph@soniajoseph_·25 Şub

We came away pleasantly surprised by how tractable these models are to decode, how different video encoders feel from language (many techniques won’t transfer), and how brain-like these representations look, reviving old “petri dish” conversations about studying the brain.

English

Alexandre Brown 🇨🇦@AlexandreBrown0·25 Şub

@N8Programs This feels wrong but is beautiful at the same time.

English

301

N8 Programs@N8Programs·24 Şub

Beat it by having Codex hand-craft weights: gist.github.com/N8python/02e41… 100% accuracy on 10 million random test cases w/ only 343 parameters. As a bonus, it uses the vanilla Qwen3 architecture, just with the right weights.

Dimitris Papailiopoulos@DimitrisPapail

491 parameters: a new top scorer for 10-digit addition with transformers! Who can beat it?

English

117

1.8K

749.8K

Alexandre Brown 🇨🇦 retweetledi

Jesse Silverberg@SilverbergJesse·6 Şub

I spent $0 and a weekend vibecoding an @openclaw setup that I text to run experiments for me. In the process, I ended up with bespoke software for self-managing a personal cluster. Also, it now comes up with its own experiments if I don’t have enough running. Blog post link ⬇️

Karel@KarelDoostrlnck

x.com/i/article/2018…

English

1.7K

Alexandre Brown 🇨🇦@AlexandreBrown0·13 Oca

@F_Vaggi @rasbt @rryssf_ You deserve my follow for this ✅

English

111

Federico Vaggi@F_Vaggi·12 Oca

@rasbt @rryssf_ I follow a pretty rigid rule where if someone who doesn't work in ML in a technical role tweets hyped up stuff that's clearly designed to go viral to build up clout, I mute them.

English

3.6K

Robert Youssef@rryssf_·12 Oca

DeepMind just did the unthinkable. They built an AI that doesn't need RAG and it has perfect memory of everything it's ever read. It's called Recursive Language Models, and it might mark the death of traditional context windows forever. Here's how it works (and why it matters way more than it sounds) ↓

English

303

1.1K

7.9K

952.5K

Alexandre Brown 🇨🇦 retweetledi

Adam Patni@adam_patni·4 Oca

4/ Results & What Worked Delay Target performed best with the most stable learning curve, followed by Rainbow DQN and SAC which showed comparable performance. PPO significantly underperformed. The pattern is clear: off-policy Q-learning methods dominated. When you can't parallelize environments and data arrives at real-time speed, the ability to reuse past experiences via replay buffers is critical.

English

772

Alexandre Brown 🇨🇦@AlexandreBrown0·5 Oca

@Yuchenj_UW You can "read" paper faster now but do you still understand a paper as deeply if you don't read it yourself, that's the question.

English

Yuchen Jin@Yuchenj_UW·4 Oca

Tbh, if I had Claude Code, Gemini, and ChatGPT during my PhD, I’d probably have graduated in 1 year instead of 5.5 years. My PhD was ~50% coding, 25% writing/polishing my papers, 25% reading others' papers. AI now accelerates each by at least 10×. Nothing will ever be the same.

English

268

369

5.6K

793K

Alexandre Brown 🇨🇦@AlexandreBrown0·4 Ara

@Stone_Tao @bharthulwar Very insightful, I wonder if there's a connection between this and how Off-policy methods get this for free with a Replay Buffer.

English

414

Stone Tao@Stone_Tao·3 Ara

GPU parallelized envs have accelerated RL, but most implementations exhibit critical instability when running on-policy RL with short rollouts. We present Staggered Environment Resets. A few lines of code are all you need! Presenting today, 4:30PM poster 310 #NeurIPS2025 🧵(1/8)

English

153

29.9K

Alexandre Brown 🇨🇦@AlexandreBrown0·4 Ara

@creus_roger Well-deserved Roger!

English

Roger Creus Castanyer@creus_roger·18 Eyl

Accepted to NeurIPS 2025 as a spotlight (top ~3% of submissions)!!! ✨🌟 Paper: arxiv.org/abs/2506.15544 Code: github.com/roger-creus/st…

Roger Creus Castanyer@creus_roger

🚨 Excited to share our new work: "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning"! 📈 We propose gradient interventions that enable stable, scalable learning, achieving significant performance gains across agents and environments! Details below 👇

English

106

17.1K

Alexandre Brown 🇨🇦@AlexandreBrown0·4 Ara

@ebanks_keyshawn @arnie_hacker Can you link a paper that supports your claim ?

English

Keyshawn Ebanks@ebanks_keyshawn·3 Ara

@arnie_hacker from what i’ve seen anything imitation + RL. For tasks like this you might want to throw an LLM in the loop

English

152

Arnie Ramesh@arnie_hacker·3 Ara

Crazy to me how far away we are from generalizable robotics. Nvidia Gr00t N1.5, a SOTA model: - trained on 1K H100 GPUs - with data augmentation - and future latents alignment (& human data) - and 1K teleop demos Drops success rate by ~14% when there are new objects in the observation. How will the humanoids react when they're deployed in real worlds with humans walking around for instance (haven't seen any in the datasets so far)

English

160

15.6K

Alexandre Brown 🇨🇦 retweetledi

Shuran Song@SongShuran·3 Ara

Everyone uses DAgger in robot learning, but most papers barely mention how they do it … 😕 We’ve found a few subtle details that make a big difference (which differ from the standard practice today):💡 - How to collect corrections? Standard human “take-over” corrections create discontinuities. A compliant interface that gently nudges the on-policy roll-out works much better. - How to update the policy? Instead of standard finetuning, we found that learning residual networks works better — more stable and flexible, and lets you plug in new modalities (e.g., force on top of a position-only base policy). Our paper Compliant Residual DAgger summarizes these and other interesting findings in detail 👇

Yifan Hou@YifanHou2

Can we quickly improve a pre-trained robot policy by learning from real world human corrections? Introducing Compliant Residual DAgger (CR-DAgger), a system that improves policies performance to close to 100% on challenging contact-rich manipulation problems, using as few as 50~100 episodes of human corrections. Co-lead by @XiaomengXu11 and I, CR-DAgger quickly learns a force-aware residual policy even when the base policy is position-only. CR-DAgger already won the best paper award at the Human2robot workshop at CoRL 2025, and will be presented at NeurIPS tomorrow Dec 3 at poster #2314. Come talk to us if you are interested! - NeurIPS paper: arxiv.org/abs/2506.16685 - Extended version with more experiments & learnings: compliant-residual-dagger.github.io/files/CR_DAgge… - Full code and instructions: github.com/yifan-hou/cr-d…

English

263

23K

Alexandre Brown 🇨🇦 retweetledi

François Fleuret@francoisfleuret·22 Kas

I do not think you can pursue meaningful research without (1) some grandiose delusion about your abilities (2) a sense of esthetics and harmony to judge ideas still free of experimental confirmation (3) an unreasonable taste for the required tangible work (e.g. programming)

English

140

1.8K

189.6K

Alexandre Brown 🇨🇦 retweetledi

Siddarth Venkatraman@siddarthv66·17 Kas

> Be AI PhD student > Submit paper to conference > LLM slop reviews > Rejected > Concurrent paper with same method accepted > Resubmit to next conference > Reviewer points to concurrent paper which was accepted by last conference > Lack of novelty > Rejected

English

1.7K

89.6K

Alexandre Brown 🇨🇦 retweetledi