Virginie Do

138 posts

Virginie Do

Virginie Do

@gini_do

gêne AI researcher 😬 @AIatMeta

London, England Katılım Ekim 2017
418 Takip Edilen731 Takipçiler
Virginie Do retweetledi
Deepak Nathani
Deepak Nathani@deepaknathani11·
🎉 Excited to share 🍐 PARE and PARE-Bench - a framework and benchmark for evaluating proactive assistants through active user simulation in mobile environments. Current LM agents are reactive: they wait for you to tell them what to do. Proactive agents flip this. They observe what you're doing and figure out how to help. Imagine your assistant notices you got a text from your roommate saying "we're out of soap" while you're editing your shopping list, and adds soap to your list. 🚧 Evaluating these agents is challenging because they must observe realistic user behavior to infer goals. You can't do this with static benchmarks or passive users. Our key contributions: 🍐 PARE: an active user simulation framework where users navigate apps through Finite State Machine (FSM) based stateful interfaces, just like on a real phone 📱 Asymmetric design: users and assistants observe different information and interact through different interfaces, matching real-world deployment 👀 Observe-Execute architecture: lightweight observer monitors continuously, executor acts only after user approval 📋 PARE-Bench: 143 tasks across 9 app categories testing goal inference, intervention timing, and multi-app orchestration 📊 Evaluation of 7 LLMs reveals that even frontier models achieve only 42% success rate PARE is built on top of Meta's Agent Research Environment (ARE) and enables scalable, repeatable evaluation of proactive agents. In PARE, the simulated user goes about their day on the phone: accomplishing goals, navigating between apps, and responding to notifications. The proactive agent watches all of this unfold and uses the user's actions and environment signals to build context about what the user might need help with. Huge thanks to my advisors @xwang_lk @WilliamWangNLP and my amazing collaborators @JasonZ118707 @HuanCC2002 Jiaming Shan @yinfeiy Alkesh Patel @zhegan4 @m2saxon 🙏
Deepak Nathani tweet media
English
3
21
58
17.9K
Virginie Do retweetledi
Grégoire Mialon
Grégoire Mialon@mialon_gregoire·
🎤 Gaia2 and ARE got an oral @iclr_conf ! We can’t wait to show what it unlocks for agent research and frontier evaluation-- although at NeurIPS 2025 we were honestly surprised by how many of you were already using it 😄 arxiv.org/abs/2509.17158
Grégoire Mialon tweet media
English
0
2
20
1K
Virginie Do retweetledi
Basile Terver
Basile Terver@BasileTerv987·
My first PhD paper is out! 🎓 "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?" tl:dr: JEPA-WMs for robotics: learn dynamics on top of visual encoders, optimize actions towards goal 👇 w/ @JimmyTYYang1, Jean Ponce, @AdrienBardes, @ylecun
English
13
110
915
79.7K
Wassim (Wes) Bouaziz
Wassim (Wes) Bouaziz@_Vassim·
Big milestone 🎓✨ I’ve successfully defended my PhD thesis at @Polytechnique in collaboration with @AIatMeta ! "Towards Secure and Trustworthy Machine Learning: From Data Poisoning to Ownership Verification" Grateful to my advisors, jury, and everyone who supported me 🙏
Wassim (Wes) Bouaziz tweet media
English
2
1
12
571
Virginie Do retweetledi
Romain Froger
Romain Froger@froger_romain·
Gaia2 Leaderboard Update: DeepSeek is leading OSS models! We’ve added fresh models (DeepSeek v3.1, Qwen 235B thinking, GPT-OSS 120B) and uncovered cool insights on cost, reasoning, and efficiency. Blogpost 👉 tinyurl.com/ybvxtmny
Romain Froger tweet media
English
2
4
12
2.8K
Virginie Do retweetledi
clem 🤗
clem 🤗@ClementDelangue·
We need better agent evaluations! Glad to have collaborated with @Meta Super Intelligence Lab to release Gaia2 and ARE! GPT5 (high) from @OpenAI is leading on execution, search, ambiguity, adaptability and noise. Kimi-K2 from @Kimi_Moonshot is leading open weight. Full blogpost: huggingface.co/blog/gaia2
clem 🤗 tweet media
English
18
53
491
52K
Virginie Do retweetledi
elvis
elvis@omarsar0·
Very cool work from Meta Superintelligence Lab. They are open-sourcing Meta Agents Research Environments (ARE), the platform they use to create and scale agent environments. Great resource to stress-test agents in environments closer to real apps. Read on for more:
elvis tweet media
English
39
175
990
151.2K
Virginie Do retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
🧠Great research from @Meta Superintelligence Labs. Proposes Meta Agents Research Environments (ARE) for scaling up agent environments and evaluations. ARE lets researchers build realistic agent environments, run agents asynchronously, and verify them cleanly. On top of it they release Gaia2, a 1,120 scenario benchmark that stresses search, execution, ambiguity, time pressure, collaboration, and noise, and the results show sharp tradeoffs between raw reasoning and speed or cost. ⚙️ The Core Concepts ARE (Agent Runtime Environment) treats the world as a clocked simulation where everything is an event, the agent runs separately, and interactions flow through tools and notifications. Apps are the tools, environments bundle the apps plus rules, and scenarios package starting state, scheduled events, and a verifier. Traditional old benchmarks froze the world while a model was “thinking.” That made results look clean but ignored the real costs of inference time. In ARE, the world keeps ticking asynchronously. Time passes even while the model is generating, apps can trigger notifications, and other actors may act. So if a model is slow, it directly shows up as missed deadlines in the benchmark. That is exactly why GPT-5 (high) got 79.6 on Search but 0 on Time in default mode. The reasoning quality was excellent, but ARE exposed its inference slowness as a concrete failure mode. When ARE switched to instant mode, stripping out the latency, the model suddenly performed well — proving the bottleneck wasn’t reasoning but raw response time @AIatMeta 🧵 Read on 👇
Rohan Paul tweet media
English
2
11
26
5K
Virginie Do
Virginie Do@gini_do·
I am at #ICLR and honored to present this work on Saturday afternoon at the poster session. Thanks @jade_lei_yu @mahnerak @nicola_cancedda for this wonderful collaboration! I am also happy to chat about Llama / agents / safety 👋
Lei Yu@jade_lei_yu

New paper! 🎊 We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!

English
0
6
28
2.5K
Virginie Do retweetledi
Roberta Raileanu
Roberta Raileanu@robertarail·
Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented @deepaknathani11 who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀
English
15
117
488
104.8K
Mikayel Samvelyan
Mikayel Samvelyan@_samvelyan·
Presenting our 🌈 Rainbow Teaming paper today at #NeurIPS2024 with @_andreilupu. 📅 December 11 🕚 11 am — 2 pm 📍 East Exhibit Hall A-C, Poster #1906 Stop by to learn how open-endedness can enhance LLM safety—or to see the most colorful poster in town!
Mikayel Samvelyan tweet media
English
3
4
38
7.7K
Mikayel Samvelyan
Mikayel Samvelyan@_samvelyan·
#PhDone I am happy to share that I successfully defended my PhD thesis yesterday! 🎓 This wraps up an amazing journey, and I’m beyond grateful to everyone who’s been there for me along the way.
Mikayel Samvelyan tweet media
English
23
8
147
12.2K
Oana-Maria Camburu
Oana-Maria Camburu@oanacamb·
🥳🎉Incredibly happy that our paper received an ✨Outstanding Paper Award✨ at #EMNLP2024! Huge congrats particularly to @maximek3, who did an incredible job! So lucky to have you as my student! Check out our paper below ⬇️
Oana-Maria Camburu tweet media
Oana-Maria Camburu@oanacamb

🚨🌟Very excited to share our #EMNLP2024 work on XAI for healthcare 🩺 We did a large user study with 💫85 medics💫 to find out how natural language explanations and saliency maps influence clinical decision-making Check out the details below ⬇️ #XAI #Medical #UserStudy

English
2
7
86
9.7K
Virginie Do retweetledi
Adrien Bardes
Adrien Bardes@AdrienBardes·
Job alert 🚨 My team @AIatMeta is looking for a PhD intern to join us in 2025 in Paris. We are working on self-supervised learning from video, world modelling and JEPA ! Apply here or reach out directly: metacareers.com/jobs/168411027…
English
3
44
228
58.6K