Mark Ibrahim @ICLR 2026

85 posts

Mark Ibrahim @ICLR 2026

Mark Ibrahim @ICLR 2026

@marksibrahim

Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab

everywhere Katılım Aralık 2012
1.6K Takip Edilen508 Takipçiler
Mark Ibrahim @ICLR 2026 retweetledi
Reyhane Askari
Reyhane Askari@ReyhaneAskari·
Don’t miss @dohmatobelvis presenting our latest work, “Why less is more (sometimes): A theory of data curation” at #ICLR2026! ​Swing by our poster at the main conference to chat: 📅 Saturday, April 25 🕒 3:15pm–5:45pm 📍 Pavilion 3, P3-#1816
Rohan Paul@rohanpaul_ai

New @AIatMeta paper explains when a smaller, curated dataset beats using everything. Standard training wastes effort because many examples are redundant or wrong. They formalize a label generator, a pruning oracle, and a learner. From this, they derive exact error laws and sharp regime switches. With a strong generator and plenty of data, keeping hard examples works best. With a weak generator or small data, keeping easy examples or keeping more helps. They analyze 2 modes, label agnostic by features and label aware that first filters wrong labels. ImageNet and LLM math results match the theory, and pruning also prevents collapse in self training. ---- Paper – arxiv. org/abs/2511.03492 Paper Title: "Why Less is More (Sometimes): A Theory of Data Curation"

English
0
11
49
6.3K
Mark Ibrahim @ICLR 2026 retweetledi
Dr. Karen Ullrich
Dr. Karen Ullrich@karen_ullrich·
I am soon heading to Rio for #ICLR2026! It is going to be a packed week: including an oral presentation of OpenApps, our work on measuring how reliable UI agents really are when the apps they interact with change.
English
1
1
25
1.6K
Mark Ibrahim @ICLR 2026 retweetledi
Sharut Gupta
Sharut Gupta@sharut_gupta·
1/n Can LLMs learn to reason on hard benchmarks like AIME and GPQA purely through context, without SFT, RL, or any weight updates? Turns out… Yes! And it can have strong performance while being highly efficient Paper: arxiv.org/pdf/2602.02366 Blog: reasoncache.github.io
Sharut Gupta tweet media
English
4
35
206
17.6K
Mark Ibrahim @ICLR 2026 retweetledi
Jack Morris
Jack Morris@jxmnop·
at long last, the final paper of my phd 🧮 Learning to Reason in 13 Parameters 🧮 we develop TinyLoRA, a new ft method. with TinyLoRA + RL, models learn well with dozens or hundreds of params example: we use only 13 parameters to train 7B Qwen model from 76 to 91% on GSM8K 🤯
Jack Morris tweet media
English
60
232
2K
182.2K
Eric W. Tramel
Eric W. Tramel@fujikanaeda·
The presence of a leading whitespace leaks the correct choice selection in the MMLU-Pro benchmark. Am I missing something? Seems to impact Chemistry, Physics, and Math. HF Issue in reply.
Eric W. Tramel tweet media
English
26
31
387
94.7K
Mark Ibrahim @ICLR 2026 retweetledi
Basile Terver
Basile Terver@BasileTerv987·
My first PhD paper is out! 🎓 "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?" tl:dr: JEPA-WMs for robotics: learn dynamics on top of visual encoders, optimize actions towards goal 👇 w/ @JimmyTYYang1, Jean Ponce, @AdrienBardes, @ylecun
English
14
111
939
123.3K
Mark Ibrahim @ICLR 2026 retweetledi
Dr. Karen Ullrich
Dr. Karen Ullrich@karen_ullrich·
Release Day 🎉 Meet OpenApps — a pure-Python, open-source ecosystem for stress-testing UI agents at scale. Runs on a single CPU. Generates thousands of unique UI variations. And it reveals just how fragile today’s SOTA agents are. (Yes, even GPT-4 and Claude struggle.)
English
3
17
34
9.9K
Mark Ibrahim @ICLR 2026
Mark Ibrahim @ICLR 2026@marksibrahim·
Want to teach AI agents to use apps like humans? Get started with digital agents research using OpenApps, our new Python-based environment.
English
1
10
29
10K
Mark Ibrahim @ICLR 2026 retweetledi
Dr. Karen Ullrich
Dr. Karen Ullrich@karen_ullrich·
Stop by the Meta booth tomorrow, Wednesday Dec 3rd at #NeurIPS in San Diego! 🤖📱 We demo our new research environment, OpenApps, for digital agents. Generate thousands of app versions to train and evaluate multimodal agents to use apps like humans do. Not attending? Stay tuned
Dr. Karen Ullrich tweet media
English
1
2
9
931
Mark Ibrahim @ICLR 2026 retweetledi
Randall Balestriero
Randall Balestriero@randall_balestr·
With LeJEPA (arxiv.org/abs/2511.08544) it has never been easier to train JEPAs! And this matters A LOT because JEPAs have numerous provable benefits over the good-old reconstruction based methods (arxiv.org/abs/2505.12477). NeurIPS spotlight: Wed, 11 a.m. PST, Hall C,D,E #2613
Hugues Van Assel@hugues_va

Lots of discussion around JEPA and why latent space prediction works better than input space (e.g., LLMs) for certain modalities. But no one has formalized WHY. The answer lies in whether statistically dominant features are semantically meaningful. @NeurIPSConf spotlight 🧵👇

English
12
60
445
86.1K
Mark Ibrahim @ICLR 2026
Mark Ibrahim @ICLR 2026@marksibrahim·
Despite saturating single image perception, Common-O establishes a new challenging multimodal benchmark. The best performing model only achieves 35% on Common-O and on Common-O Complex, consisting of more complex scenes, the best model achieves only 1%. 🧵2/3
Mark Ibrahim @ICLR 2026 tweet media
English
1
0
2
150
Mark Ibrahim @ICLR 2026
Mark Ibrahim @ICLR 2026@marksibrahim·
We introduce, Common-O, a new multimodal benchmark for hallucination when reasoning across scenes. We find leading multimodal LLMs can reliably identify objects, yet hallucinate when reasoning across scenes. 🧵1/3
Mark Ibrahim @ICLR 2026 tweet media
English
1
2
11
3.6K
Mark Ibrahim @ICLR 2026 retweetledi
Sarthak Mittal
Sarthak Mittal@sarthmit·
Meta on meta: thrilled to share our work on Meta-learning… at Meta! 🔥🧠 We make two major contributions: 1️⃣ Unified framework revealing insights into various amortizations 🧠 2️⃣ Greedy belief-state updates to handle long context-lengths 🚀
Sarthak Mittal tweet media
English
5
33
225
45.9K