Magda Dubois

95 posts

Magda Dubois banner
Magda Dubois

Magda Dubois

@DubMagda

Research Scientist @AISecurityInst working on LLM evaluations 🤖 | PhD in computational cognitive neuroscience @MPC_CompPsych 🧠

London, England Katılım Aralık 2016
453 Takip Edilen376 Takipçiler
Magda Dubois retweetledi
Cozmin Ududec
Cozmin Ududec@CUdudec·
New from the Science of Evaluation Team at @AISafetyInst: a pipeline for rigorous transcript analysis. I think transcript analysis is still underrated, especially as model horizons are getting longer and task environments more complex.
English
2
3
19
1.2K
Magda Dubois retweetledi
Arvindh Arun
Arvindh Arun@arvindh__a·
Why does horizon length grow exponentially as shown in the METR plot? Our new paper investigates this by isolating the execution capabilities of LLMs. Here's why you shouldn't be fooled by slowing progress on typical short-task benchmarks... 🧵
Arvindh Arun tweet media
English
14
33
265
51.7K
Magda Dubois retweetledi
Konrad Rieck 🌈
Konrad Rieck 🌈@mlsec·
We're excited to announce the Call for Papers for SaTML 2026, the premier conference on secure and trustworthy machine learning @satml_conf We seek papers on secure, private, and fair learning algorithms and systems. 👉 satml.org/call-for-paper… ⏰ Deadline: Sept 24
Konrad Rieck 🌈 tweet media
English
0
15
40
5.7K
Magda Dubois retweetledi
Sahar Abdelnabi 🕊
Sahar Abdelnabi 🕊@sahar_abdelnabi·
Hawthorne effect describes how study participants modify their behavior if they know they are being observed In our paper 📢, we study if LLMs exhibit analogous patterns🧠 Spoiler: they do⚠️ 🧵1/n
Sahar Abdelnabi 🕊 tweet media
English
3
19
126
24.7K
Magda Dubois retweetledi
summerfieldlab @summerfieldlab.bsky.social
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
summerfieldlab @summerfieldlab.bsky.social tweet media
English
4
24
85
17.1K
Magda Dubois retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
Evaluating AI models is essential for improving their performance and understanding their risks. Increasingly, researchers are using “autograders” – having Large Language Models (LLMs) grade model outputs. But how do we know if these autograders are reliable? 🧵
English
1
6
66
5.4K
Magda Dubois
Magda Dubois@DubMagda·
New paper introducing a framework to better quantify uncertainty in LLM evaluations (led by @LLuettgau🙌). A beta Python package (developed by @HarryCoppock🚀) is available if you want to try it out. ➡️Get in touch if you have any Qs/feedback! Paper: arxiv.org/abs/2505.05602
AI Security Institute@AISecurityInst

Advanced AI systems require complex evaluations to measure abilities, but conventional analysis techniques often fall short. Introducing HiBayES: a flexible, robust statistical modelling framework that accounts for the nuances & hierarchical structure of advanced evaluations.

English
0
0
1
143
Magda Dubois retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.
English
5
50
123
29.3K
Magda Dubois retweetledi
Lennart Luettgau
Lennart Luettgau@LLuettgau·
Excited to share our brand-new work shedding some light on the neural mechanisms behind one of human’s coolest cognitive feats: compositional generalization of structural knowledge! A Tweeprint-Thread 🧵 1/n
English
1
8
29
4K
Magda Dubois retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ New paper in Nature shows model collapse as successive model generations models are recursively trained on synthetic data. This is an important result. While many researchers today view synthetic data as AI philosopher’s stone, there is no free lunch. Read more 👇
Alexandr Wang tweet media
English
44
91
665
272K
Dr Aislinn Bowler
Dr Aislinn Bowler@AshBowler·
Happy to announce I've passed my viva with minor corrections! Thanks to my examiners @OOssmy and Kate Langley for a great discussion! And to my supervisors @Gelironald and @PascoFearon!
Dr Aislinn Bowler tweet mediaDr Aislinn Bowler tweet media
English
8
0
25
3.1K
Magda Dubois retweetledi
Lennart Luettgau
Lennart Luettgau@LLuettgau·
Preprint alert🚨! In this new paper we study how humans decompose dynamical subprocesses and leverage the abstracted subprocesses for compositional reuse of experience in new situations. psyarxiv.com/sxn4a/ Tweeprint to follow soon!
English
0
23
59
10.5K
Magda Dubois retweetledi
Marcelo Mattar
Marcelo Mattar@marcelomattar·
In our lab's latest paper, we introduce a novel modeling approach using RNNs to reveal the cognitive algorithms behind animal decision-making. Check out our preprint, led by UCSD PhD student @Ji_An_Li and co-authored by Marcus Benna: biorxiv.org/content/10.110…
English
3
28
97
19.9K
Julia Griem
Julia Griem@julia_griem·
Successfully defended my PhD today - what a great feeling! Thank you to my wonderful examiners Essi Viding & @BaskinSommers, and thank you to @forensicrg for an amazing 4.5 years!
Julia Griem tweet media
English
11
0
29
2.6K
Magda Dubois
Magda Dubois@DubMagda·
Congratulations to my academic sibling @AlisaLoosen for those (very) well-deserved three shiny balloons
Magda Dubois tweet media
English
0
0
19
2.5K
Magda Dubois
Magda Dubois@DubMagda·
Postdoc position in Boston ⭐️ Great place and amazing person to work with !
English
0
1
1
666
Magda Dubois retweetledi
Tobias Hauser
Tobias Hauser@TobiasUHauser·
A while ago we published this #RegisteredReport in @NatureComms - but was this format of pre-registration really useful? Find some answers in this Q&A with us and one of the reviewers: nature.com/articles/s4146…
Magda Dubois@DubMagda

Our #RegisteredReport with @TobiasUHauser is now out in @NatureComms 🤓 We asked how people differ in their exploration - and found that impulsive and anxious subjects explore using different exploration strategies ! 1/ nature.com/articles/s4146…

English
0
1
2
0