Alexandre Brown 🇨🇦

769 posts

Alexandre Brown 🇨🇦 banner
Alexandre Brown 🇨🇦

Alexandre Brown 🇨🇦

@AlexandreBrown0

PhD student, researcher at @Mila_Quebec ~ RL , robotics and stuff

Canada Katılım Eylül 2017
1.2K Takip Edilen185 Takipçiler
Sabitlenmiş Tweet
Alexandre Brown 🇨🇦
Alexandre Brown 🇨🇦@AlexandreBrown0·
🚀 I'm excited to share our new paper: SegDAC: Segmentation-Driven Actor-Critic for Visual Reinforcement Learning 🧠 SegDAC combines large vision models with online RL to reason about its environment at the object and sub-object level, avoiding noisy pixel-level reasoning. 🛠️ Using YOLO-World and SAM, SegDAC breaks the scene into semantically meaningful segments and learns to attend to a variable number of segments and proprioceptive signals, focusing on the most relevant information to complete the task. ⚡ Trained purely with online RL, without human labels or demonstrations. 🏆 Outperforms previous online RL state-of-the-art methods across all difficulty levels on our challenging visual generalization benchmark, with up to 2x better visual generalization in the hardest setting. 📄 Paper: arxiv.org/pdf/2508.09325 🌐 Project Page: segdac.github.io Work done with @GlenBerseth at @Mila_Quebec #ReinforcementLearning #RobotLearning #ArtificialIntelligence #Robotics
English
5
17
80
10.8K
Alexandre Brown 🇨🇦 retweetledi
Roger Creus Castanyer
Roger Creus Castanyer@creus_roger·
🚀 I vibecoded yet another autoresearch tool This one works with SLURM clusters and lets Claude Code agents run experiments completely hands-off for weeks. It's called xgenius — open source, works with ANY codebase. github.com/roger-creus/xg…
English
2
6
28
1.1K
Alexandre Brown 🇨🇦 retweetledi
François Fleuret
François Fleuret@francoisfleuret·
BTW are hyper-networks a thing of the past?
English
13
2
71
14.8K
Alexandre Brown 🇨🇦 retweetledi
Stone Tao
Stone Tao@Stone_Tao·
I will be giving a talk at UPenn @GRASPlab tomorrow 3-4PM EST on my research in sim/robotics. I’ll be discussing how sim integrated robot learning can drive and accelerate robotics progress. If you are in the area let’s meet up! Link with more details in the thread
Stone Tao tweet media
English
2
2
32
2.8K
Alexandre Brown 🇨🇦 retweetledi
Sonia Joseph
Sonia Joseph@soniajoseph_·
We came away pleasantly surprised by how tractable these models are to decode, how different video encoders feel from language (many techniques won’t transfer), and how brain-like these representations look, reviving old “petri dish” conversations about studying the brain.
English
1
2
13
1K
Alexandre Brown 🇨🇦 retweetledi
Jesse Silverberg
Jesse Silverberg@SilverbergJesse·
I spent $0 and a weekend vibecoding an @openclaw setup that I text to run experiments for me. In the process, I ended up with bespoke software for self-managing a personal cluster. Also, it now comes up with its own experiments if I don’t have enough running. Blog post link ⬇️
Karel@KarelDoostrlnck

x.com/i/article/2018…

English
1
1
7
1.7K
Federico Vaggi
Federico Vaggi@F_Vaggi·
@rasbt @rryssf_ I follow a pretty rigid rule where if someone who doesn't work in ML in a technical role tweets hyped up stuff that's clearly designed to go viral to build up clout, I mute them.
English
2
0
55
3.6K
Robert Youssef
Robert Youssef@rryssf_·
DeepMind just did the unthinkable. They built an AI that doesn't need RAG and it has perfect memory of everything it's ever read. It's called Recursive Language Models, and it might mark the death of traditional context windows forever. Here's how it works (and why it matters way more than it sounds) ↓
Robert Youssef tweet media
English
303
1.1K
7.9K
952.5K
Alexandre Brown 🇨🇦 retweetledi
Adam Patni
Adam Patni@adam_patni·
4/ Results & What Worked Delay Target performed best with the most stable learning curve, followed by Rainbow DQN and SAC which showed comparable performance. PPO significantly underperformed. The pattern is clear: off-policy Q-learning methods dominated. When you can't parallelize environments and data arrives at real-time speed, the ability to reuse past experiences via replay buffers is critical.
Adam Patni tweet media
English
2
1
11
772
Alexandre Brown 🇨🇦
Alexandre Brown 🇨🇦@AlexandreBrown0·
@Yuchenj_UW You can "read" paper faster now but do you still understand a paper as deeply if you don't read it yourself, that's the question.
English
0
0
0
16
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Tbh, if I had Claude Code, Gemini, and ChatGPT during my PhD, I’d probably have graduated in 1 year instead of 5.5 years. My PhD was ~50% coding, 25% writing/polishing my papers, 25% reading others' papers. AI now accelerates each by at least 10×. Nothing will ever be the same.
English
268
369
5.6K
793K
Stone Tao
Stone Tao@Stone_Tao·
GPU parallelized envs have accelerated RL, but most implementations exhibit critical instability when running on-policy RL with short rollouts. We present Staggered Environment Resets. A few lines of code are all you need! Presenting today, 4:30PM poster 310 #NeurIPS2025 🧵(1/8)
Stone Tao tweet media
English
7
8
153
29.9K
Keyshawn Ebanks
Keyshawn Ebanks@ebanks_keyshawn·
@arnie_hacker from what i’ve seen anything imitation + RL. For tasks like this you might want to throw an LLM in the loop
English
1
0
0
152
Arnie Ramesh
Arnie Ramesh@arnie_hacker·
Crazy to me how far away we are from generalizable robotics. Nvidia Gr00t N1.5, a SOTA model: - trained on 1K H100 GPUs - with data augmentation - and future latents alignment (& human data) - and 1K teleop demos Drops success rate by ~14% when there are new objects in the observation. How will the humanoids react when they're deployed in real worlds with humans walking around for instance (haven't seen any in the datasets so far)
Arnie Ramesh tweet media
English
11
8
160
15.6K
Alexandre Brown 🇨🇦 retweetledi
Shuran Song
Shuran Song@SongShuran·
Everyone uses DAgger in robot learning, but most papers barely mention how they do it … 😕 We’ve found a few subtle details that make a big difference (which differ from the standard practice today):💡 - How to collect corrections? Standard human “take-over” corrections create discontinuities. A compliant interface that gently nudges the on-policy roll-out works much better. - How to update the policy? Instead of standard finetuning, we found that learning residual networks works better — more stable and flexible, and lets you plug in new modalities (e.g., force on top of a position-only base policy). Our paper Compliant Residual DAgger summarizes these and other interesting findings in detail 👇
Yifan Hou@YifanHou2

Can we quickly improve a pre-trained robot policy by learning from real world human corrections? Introducing Compliant Residual DAgger (CR-DAgger), a system that improves policies performance to close to 100% on challenging contact-rich manipulation problems, using as few as 50~100 episodes of human corrections. Co-lead by @XiaomengXu11 and I, CR-DAgger quickly learns a force-aware residual policy even when the base policy is position-only. CR-DAgger already won the best paper award at the Human2robot workshop at CoRL 2025, and will be presented at NeurIPS tomorrow Dec 3 at poster #2314. Come talk to us if you are interested! - NeurIPS paper: arxiv.org/abs/2506.16685 - Extended version with more experiments & learnings: compliant-residual-dagger.github.io/files/CR_DAgge… - Full code and instructions: github.com/yifan-hou/cr-d…

English
6
23
263
23K
Alexandre Brown 🇨🇦 retweetledi
François Fleuret
François Fleuret@francoisfleuret·
I do not think you can pursue meaningful research without (1) some grandiose delusion about your abilities (2) a sense of esthetics and harmony to judge ideas still free of experimental confirmation (3) an unreasonable taste for the required tangible work (e.g. programming)
English
36
140
1.8K
189.6K
Alexandre Brown 🇨🇦 retweetledi
Siddarth Venkatraman
Siddarth Venkatraman@siddarthv66·
> Be AI PhD student > Submit paper to conference > LLM slop reviews > Rejected > Concurrent paper with same method accepted > Resubmit to next conference > Reviewer points to concurrent paper which was accepted by last conference > Lack of novelty > Rejected
English
32
59
1.7K
89.6K
Alexandre Brown 🇨🇦 retweetledi
Peter Richtarik
Peter Richtarik@peter_richtarik·
People are giving up on AI conferences due to non-sensical / non-professional reviews. It's time to stop this madness.
English
4
8
111
8.9K
Scholarship for PhD
Scholarship for PhD@ScholarshipfPhd·
Say hi and I’ll recommend a research topic that perfectly fits your profile.
English
45.2K
2.7K
61.7K
6M