Virginie Do

138 posts

Virginie Do

@gini_do

gêne AI researcher 😬 @AIatMeta

London, England Katılım Ekim 2017

418 Takip Edilen731 Takipçiler

Sabitlenmiş Tweet

Virginie Do@gini_do·25 Şub

It’s official, we won the Outstanding Paper Award at #AAAI2022! 🤩 I am so incredibly grateful to my co-authors and mentors @scorbettdavies, Jamal Atif and Nicolas Usunier! @MetaAI @psl_univ @Paris_Dauphine @LAMSADEDauphine @DauphineMiles @RealAAAI

English

117

Virginie Do retweetledi

Deepak Nathani@deepaknathani11·4d

🎉 Excited to share 🍐 PARE and PARE-Bench - a framework and benchmark for evaluating proactive assistants through active user simulation in mobile environments. Current LM agents are reactive: they wait for you to tell them what to do. Proactive agents flip this. They observe what you're doing and figure out how to help. Imagine your assistant notices you got a text from your roommate saying "we're out of soap" while you're editing your shopping list, and adds soap to your list. 🚧 Evaluating these agents is challenging because they must observe realistic user behavior to infer goals. You can't do this with static benchmarks or passive users. Our key contributions: 🍐 PARE: an active user simulation framework where users navigate apps through Finite State Machine (FSM) based stateful interfaces, just like on a real phone 📱 Asymmetric design: users and assistants observe different information and interact through different interfaces, matching real-world deployment 👀 Observe-Execute architecture: lightweight observer monitors continuously, executor acts only after user approval 📋 PARE-Bench: 143 tasks across 9 app categories testing goal inference, intervention timing, and multi-app orchestration 📊 Evaluation of 7 LLMs reveals that even frontier models achieve only 42% success rate PARE is built on top of Meta's Agent Research Environment (ARE) and enables scalable, repeatable evaluation of proactive agents. In PARE, the simulated user goes about their day on the phone: accomplishing goals, navigating between apps, and responding to notifications. The proactive agent watches all of this unfold and uses the user's actions and environment signals to build context about what the user might need help with. Huge thanks to my advisors @xwang_lk @WilliamWangNLP and my amazing collaborators @JasonZ118707 @HuanCC2002 Jiaming Shan @yinfeiy Alkesh Patel @zhegan4 @m2saxon 🙏

English

17.9K

Virginie Do retweetledi

Grégoire Mialon@mialon_gregoire·6 Şub

🎤 Gaia2 and ARE got an oral @iclr_conf ! We can’t wait to show what it unlocks for agent research and frontier evaluation-- although at NeurIPS 2025 we were honestly surprised by how many of you were already using it 😄 arxiv.org/abs/2509.17158

English

Virginie Do retweetledi

Basile Terver@BasileTerv987·12 Oca

My first PhD paper is out! 🎓 "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?" tl:dr: JEPA-WMs for robotics: learn dynamics on top of visual encoders, optimize actions towards goal 👇 w/ @JimmyTYYang1, Jean Ponce, @AdrienBardes, @ylecun

English

110

915

79.7K

Virginie Do@gini_do·19 Ara

@_Vassim @Polytechnique @AIatMeta Et félicitations bien sûr 😛

Français

Virginie Do@gini_do·19 Ara

@_Vassim @Polytechnique @AIatMeta Very rare photo of Nicolas Usunier on the Internet 👌

English

Wassim (Wes) Bouaziz@_Vassim·19 Ara

Big milestone 🎓✨ I’ve successfully defended my PhD thesis at @Polytechnique in collaboration with @AIatMeta ! "Towards Secure and Trustworthy Machine Learning: From Data Poisoning to Ownership Verification" Grateful to my advisors, jury, and everyone who supported me 🙏

English

571

Virginie Do retweetledi

Romain Froger@froger_romain·2 Eki

Gaia2 Leaderboard Update: DeepSeek is leading OSS models! We’ve added fresh models (DeepSeek v3.1, Qwen 235B thinking, GPT-OSS 120B) and uncovered cool insights on cost, reasoning, and efficiency. Blogpost 👉 tinyurl.com/ybvxtmny

English

2.8K

Virginie Do retweetledi

clem 🤗@ClementDelangue·24 Eyl

We need better agent evaluations! Glad to have collaborated with @Meta Super Intelligence Lab to release Gaia2 and ARE! GPT5 (high) from @OpenAI is leading on execution, search, ambiguity, adaptability and noise. Kimi-K2 from @Kimi_Moonshot is leading open weight. Full blogpost: huggingface.co/blog/gaia2

English

491

52K

Virginie Do retweetledi

elvis@omarsar0·22 Eyl

Very cool work from Meta Superintelligence Lab. They are open-sourcing Meta Agents Research Environments (ARE), the platform they use to create and scale agent environments. Great resource to stress-test agents in environments closer to real apps. Read on for more:

English

175

990

151.2K

Virginie Do retweetledi

Rohan Paul@rohanpaul_ai·23 Eyl

🧠Great research from @Meta Superintelligence Labs. Proposes Meta Agents Research Environments (ARE) for scaling up agent environments and evaluations. ARE lets researchers build realistic agent environments, run agents asynchronously, and verify them cleanly. On top of it they release Gaia2, a 1,120 scenario benchmark that stresses search, execution, ambiguity, time pressure, collaboration, and noise, and the results show sharp tradeoffs between raw reasoning and speed or cost. ⚙️ The Core Concepts ARE (Agent Runtime Environment) treats the world as a clocked simulation where everything is an event, the agent runs separately, and interactions flow through tools and notifications. Apps are the tools, environments bundle the apps plus rules, and scenarios package starting state, scheduled events, and a verifier. Traditional old benchmarks froze the world while a model was “thinking.” That made results look clean but ignored the real costs of inference time. In ARE, the world keeps ticking asynchronously. Time passes even while the model is generating, apps can trigger notifications, and other actors may act. So if a model is slow, it directly shows up as missed deadlines in the benchmark. That is exactly why GPT-5 (high) got 79.6 on Search but 0 on Time in default mode. The reasoning quality was excellent, but ARE exposed its inference slowness as a concrete failure mode. When ARE switched to instant mode, stripping out the latency, the model suddenly performed well — proving the bottleneck wasn’t reasoning but raw response time @AIatMeta 🧵 Read on 👇

English

Virginie Do@gini_do·23 Eyl

merci la dream team ✨🫶 @mialon_gregoire @froger_romain @MekalaDheeraj @amine_benh Pierre, Maxime, @ulyanapiterbarg, @ThomasScialom and everyone else!

English

219

Virginie Do@gini_do·23 Eyl

We released Gaia2 and ARE, our platform for agent environments and evals in the LLM+RL era! This was such a fun project to work on, I hope you like it too :) Paper: arxiv.org/abs/2509.17158 Demo @huggingface: huggingface.co/spaces/meta-ag… Blog post: huggingface.co/blog/gaia2

English

410

Virginie Do@gini_do·24 Nis

I am at #ICLR and honored to present this work on Saturday afternoon at the poster session. Thanks @jade_lei_yu @mahnerak @nicola_cancedda for this wonderful collaboration! I am also happy to chat about Llama / agents / safety 👋

Lei Yu@jade_lei_yu

New paper! 🎊 We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!

English

2.5K

Virginie Do retweetledi

Roberta Raileanu@robertarail·21 Şub

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented @deepaknathani11 who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

English

117

488

104.8K

Virginie Do@gini_do·11 Ara

@hannahrosekirk @NeurIPSConf I am not surprised because this work is awesome!! Congratulations @hannahrosekirk + @computermacgyve @bertievidgen @paul_rottger @adinamwilliams and co-authors! Great news for @oiioxford too 🥳

English

162

Hannah Rose Kirk@hannahrosekirk·11 Ara

A real honour and career dream that PRISM has won a @NeurIPSConf best paper award! 🌈 One year ago I was sat in a 13,000+ person audience of NeurIPs '23 having just finished data collection. Safe to say I've gone from feeling #stressed to very #blessed 😁

NeurIPS Conference@NeurIPSConf

Announcing the NeurIPS 2024 Best Paper Awards: blog.neurips.cc/2024/12/10/ann…

English

414

78.1K

Virginie Do@gini_do·11 Ara

@_samvelyan @_andreilupu Such a beautiful poster 😃

English

156

Mikayel Samvelyan@_samvelyan·11 Ara

Presenting our 🌈 Rainbow Teaming paper today at #NeurIPS2024 with @_andreilupu. 📅 December 11 🕚 11 am — 2 pm 📍 East Exhibit Hall A-C, Poster #1906 Stop by to learn how open-endedness can enhance LLM safety—or to see the most colorful poster in town!

English

7.7K

Virginie Do@gini_do·19 Kas

@_samvelyan Congratulations 🥳🥳

English

Mikayel Samvelyan@_samvelyan·19 Kas

#PhDone I am happy to share that I successfully defended my PhD thesis yesterday! 🎓 This wraps up an amazing journey, and I’m beyond grateful to everyone who’s been there for me along the way.

English

147

12.2K

Virginie Do@gini_do·15 Kas

@oanacamb @maximek3 🥳🥳 Congratulations @maximek3 and @oanacamb et al !!

English

126

Oana-Maria Camburu@oanacamb·15 Kas

🥳🎉Incredibly happy that our paper received an ✨Outstanding Paper Award✨ at #EMNLP2024! Huge congrats particularly to @maximek3, who did an incredible job! So lucky to have you as my student! Check out our paper below ⬇️

Oana-Maria Camburu@oanacamb

🚨🌟Very excited to share our #EMNLP2024 work on XAI for healthcare 🩺 We did a large user study with 💫85 medics💫 to find out how natural language explanations and saliency maps influence clinical decision-making Check out the details below ⬇️ #XAI #Medical #UserStudy

English

9.7K

Virginie Do@gini_do·14 Kas

@MrsCaroline_C @WiMLDS_Paris @mer__edith @RaphaelleRL @Mines_Paris Wow quelle lineup !

Français

Caroline Chavier@MrsCaroline_C·14 Kas

The wait is over. Come to the next @WiMLDS_Paris meetup and listen to @mer__edith & @RaphaelleRL talk about AI, privacy, surveillance, ecofeminism and techno police. 📍@Mines_Paris 🗓️December, 10 meetup.com/paris-women-in…

English

296

Virginie Do@gini_do·4 Kas

5-star opportunity!!! 💫 I 100% recommend @mialon_gregoire, @MekalaDheeraj and the food

Grégoire Mialon@mialon_gregoire

I am hiring an intern in our Llama team for 2025! Near the end of PhD completion, willing to be based out of Paris. You will succeed @MekalaDheeraj, work around frontier LLMs, tool use, agents, and more :) Please apply here: metacareers.com/jobs/109555634…

English

2.6K

Virginie Do retweetledi

Adrien Bardes@AdrienBardes·23 Eki

Job alert 🚨 My team @AIatMeta is looking for a PhD intern to join us in 2025 in Paris. We are working on self-supervised learning from video, world modelling and JEPA ! Apply here or reach out directly: metacareers.com/jobs/168411027…

English

228

58.6K

Keşfet

@xwang_lk @WilliamWangNLP @JasonZ118707 @HuanCC2002 @yinfeiy @zhegan4 @m2saxon @iclr_conf