Andrei Lupu

277 posts

Andrei Lupu banner
Andrei Lupu

Andrei Lupu

@_andreilupu

DPhil student @FLAIR_Ox and @AIatMeta. Previously @Mila_Quebec and @rllabmcgill Theory of Mind / Coordination / Rainbow Teaming 🌈 Opinions my own.

Katılım Aralık 2016
359 Takip Edilen774 Takipçiler
Sabitlenmiş Tweet
Andrei Lupu
Andrei Lupu@_andreilupu·
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
English
4
26
104
23.2K
Andrei Lupu retweetledi
Harry Mayne
Harry Mayne@HarryMayne5·
New paper. A Positive Case for Faithfulness. When asked to explain their decisions, LLMs can give highly plausible self-explanations. But are these explanations actually faithful, or are they just post-hoc rationalizations? We measure faithfulness via simulatability.
Harry Mayne tweet media
English
2
12
51
2.4K
Andrei Lupu retweetledi
MattStaniek
MattStaniek@MattStaniek·
Windermere vs Buttermere Windermere: Since 2017, over fifty billion litres of treated sewage and more than 30,000 hours of untreated sewage have made their way into the lake. Buttermere: a lake where the water company is not allowed to put any sewage at all. Images taken on 24 and 25 April 2025. The only way to protect Windermere is to end the sewage pollution once and for all. savewindermere.com
MattStaniek tweet mediaMattStaniek tweet media
English
42
1.6K
6.1K
198.3K
Andrei Lupu retweetledi
Jakob Foerster
Jakob Foerster@j_foerst·
My Oxford lab (@FLAIR_Ox ) is hiring Phd students! If you are thinking of doing a Phd in blue-sky and -sort of crazy ambitious- ML and have a technically strong background and love to work with others, please consider all options for joining us: 1) Direct entry - deadline is the 1st of Dec AOE (ox.ac.uk/admissions/gra…) 2) AIMS CDT (ox.ac.uk/admissions/gra…) deadline on 27th of Jan 2026 AOE 3) EIT CDT (ox.ac.uk/admissions/gra…) deadline on the 7th of Jan 2026 AOE Student funding is a real constraint / concern in the UK (especially for overseas students) and by applying for these three programs you can maximize your chances of ending up in a very very special place.
English
3
30
162
14K
Andrei Lupu
Andrei Lupu@_andreilupu·
This resonates deeply. It is sad to read, and unfortunately the norm in recent years. The most original papers or those at the intersection of different sub-fields get the worst of it too, since it is harder to grasp the value of their contribution at a glance.
Peter Richtarik@peter_richtarik

I am an AC for ICLR 2026. One of the papers in my batch was just withdrawn. The authors wrote a brief response, explaining why the reviewers failed at their job. I agree with most of their comments. The authors gave up. They are fed up. Just like many of us. I understand. We pretend the emperor has clothes, but he is naked. Here is the final part of their withdrawal notice. I took the liberty to make it public, to highlight that what we are doing with AI conference reviews these last few years is, basically, madness. --- Comment: We thank the reviewers for their time. However, upon reading the reviews for our paper, it became immediately apparent that the four "reject" ratings are not based on good-faith academic disagreement, but on a critical failure to read the submitted paper. The reviews are rife with demonstrably false claims that are directly contradicted by the text. The core justifications for rejection rely on asserting that key components are "missing" when they are explicitly detailed in the manuscript. Some specific examples are (and many are even fake claims). Claim: Harder tasks like GSM8K are missing. Fact: GSM8K results are in many tables, like Table 2 (Section 4.2) and Appendix G. Claim: The method does not use per-layer ranks. Fact: This is the entire point of our method. The reviewer clearly mistook our method for the baselines. (Section 2, Table 1). Claim: The GP kernel is not specified. Fact: It is specified in Appendix E (Table 6). Claim: There is no ablation of the method's three stages. Fact: Section 4.4 ("Ablation Study") and Appendix J are dedicated to this. Reviewers have a fundamental responsibility to read and evaluate the work they are assigned. The nature of these errors is so fundamental, so systemic in overlooking explicit content, that it goes far beyond what "limited time" or "oversight" can explain. This work has gone through several rounds of revision over the last year. In earlier submissions, the paper usually received borderline or weak-accept scores. Numerous signs strongly suggest that some reviewers are relying entirely on AI tools to automatically generate peer reviews, rather than fulfilling their fundamental responsibility of personally reading and evaluating manuscripts. We strongly protest this. This is a gross disrespect to the authors. It is a flagrant desecration of the reviewer's sacred duty. It fundamentally undermines the integrity of the entire peer-review process. Given that the reviews are not based on the actual content of our paper, we have decided to withdraw the submission. We leave this comment so that future readers of the OpenReview page are aware that the items described as "missing" are already present in the submitted manuscript. These negative reviews for this submission are factually unsound and do not reflect the content of the paper. We cannot and will not accept an assessment that is not based on the work we actually submitted.

English
0
2
7
1.4K
Andrei Lupu retweetledi
Minqi Jiang
Minqi Jiang@MinqiJiang·
What if you kept asking an LLM to "make it better"? In some recent work at FAIR, we investigate how we can efficiently use RL to fine-tune LLMs to iteratively self-improve on their previous solutions at inference-time. Training for iterated self-improvement can be costly. The naive approach to training for K self-improvement steps leads to K times the number of rollout steps per episode. We introduce Exploratory Iteration (ExIt), an RL-based automatic curriculum method that bootstraps diverse training distributions of self-improvement tasks by upcycling the LLM's own responses at previous turns as the starting points for both self-improvement and *self-divergence.* In order to decide what task to train on next, the curriculum prioritizes sampling of partial turn histories that led to higher return variance in its GRPO group (a learnability score that comes for free). This automatic curriculum over the bootstrapped task space teaches the model how to perform iterated self-improvement while only ever training the model on single-step self-improvement tasks. We look at ExIt's impact in both single-turn (contest math problems) and multi-turn (BFCLv3 multi-turn tasks), as well as MLE-bench, where the LLM is run in a search scaffold to produce solutions to real Kaggle competitions. Across these eval settings, we find ExIt produces models with greater capacity for inference-time self-improvement compared to GRPO. Notably, ExIt models can self-improve on test tasks for many more steps than the typical solution depth encountered during training, including a 22% improvement in MLE-bench performance compared to GRPO.
English
15
72
405
40.9K
Joelle Pineau
Joelle Pineau@jpineau1·
I’m thrilled to be joining @cohere in the role of Chief AI Officer, helping advance cutting-edge research and product development. Cohere has an incredible team and mission. Exciting new chapter for me!
Cohere@cohere

We’re excited to announce $500M in new funding to accelerate our global expansion and build the next generation of enterprise AI technology! We are also welcoming two additions to our leadership team: Joelle Pineau as Chief AI Officer and Francois Chadwick as Chief Financial Officer. cohere.com/blog/august-20…

English
123
71
1.7K
180.3K
Andrei Lupu retweetledi
Keith Sakata, MD
Keith Sakata, MD@KeithSakata·
I’m a psychiatrist. In 2025, I’ve seen 12 people hospitalized after losing touch with reality because of AI. Online, I’m seeing the same pattern. Here’s what “AI psychosis” looks like, and why it’s spreading fast: 🧵
Keith Sakata, MD tweet media
English
1.5K
13.3K
92.9K
7.7M
Andrei Lupu
Andrei Lupu@_andreilupu·
@shlomifruchter Thanks for the link, but I don't think it's settled! Genie 3 seems to excel at realism and consistency for objects out of frame, but most examples are quite static. I would love to see how it handles more complex interactions (shooting, grabbing or throwing objects, etc.)
English
0
0
1
94
Andrei Lupu
Andrei Lupu@_andreilupu·
@alexUnder_sky Great theory of mind on your behalf! 😉 And of course, you know I also reached out to @kaggle and offered to port Decrypto over to Game Arena!
English
1
0
1
37
sacha🥝
sacha🥝@alexUnder_sky·
@_andreilupu yeah, I expected (looked forward) you to reply with your benchmark xd
English
1
0
0
22
Andrei Lupu
Andrei Lupu@_andreilupu·
Games isolate key aspects of intelligence and make for fantastic evergreen benchmarks. Thrilled to see them come back in style! And if you're excited about LLM Theory of Mind, how about a game of Decrypto with your favourite LLM? 👀👇
Andrei Lupu tweet media
Google DeepMind@GoogleDeepMind

We have a long history of using games to measure progress in AI. 🎮 That’s why we’re helping unveil the @Kaggle Game Arena: an open-source platform where models go head-to-head in complex games to help us gauge their capabilities. 🧵

English
1
1
5
656
Andrei Lupu
Andrei Lupu@_andreilupu·
Sloppy authors write sloppy reviews. Conferences should publish rejection rates for first and last authors. This cuts down the number of submission, improves submission quality, and limits the number of sloppy authors being forced to review. Make this retroactive, too!🧹
English
0
0
7
390
Andrei Lupu retweetledi
Alex Goldie
Alex Goldie@AlexDGoldie·
1/ 🕵️ Algorithm discovery could lead to huge AI breakthroughs! But what is the best way to learn or discover new algorithms? I'm so excited to share our brand new @rl_conference paper which takes a step towards answering this! 🧵
Alex Goldie tweet media
English
3
36
213
25.2K