Arthur Chatton

160 posts

Arthur Chatton

@ArthurChatton

Assistant professor in Biostatistics at Université de Montréal. Causal inference - Casual chess (or perhaps the reverse)

Katılım Ocak 2020

450 Takip Edilen147 Takipçiler

Arthur Chatton retweetledi

Andrew Vickers@VickersBiostats·19 Ara

"LLMs .... overfit to surface patterns; struggle to abandon bad hypotheses even when evidence contradicts them; confuse correlation for causation; hallucinate explanations when experiments fail; optimize for plausibility, not truth". In short, LLMs act like human scientists.

Alex Prompter@alex_prompter

This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly: Can LLMs actually discover science, or are they just good at talking about it? The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder: Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists? Here’s what the authors did differently 👇 • They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision • Tasks span biology, chemistry, and physics, not toy puzzles • Models must work with incomplete data, noisy results, and false leads • Success is measured by scientific progress, not fluency or confidence What they found is sobering. LLMs are decent at suggesting hypotheses, but brittle at everything that follows. ✓ They overfit to surface patterns ✓ They struggle to abandon bad hypotheses even when evidence contradicts them ✓ They confuse correlation for causation ✓ They hallucinate explanations when experiments fail ✓ They optimize for plausibility, not truth Most striking result: `High benchmark scores do not correlate with scientific discovery ability.` Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories. Why this matters: Real science is not one-shot reasoning. It’s feedback, failure, revision, and restraint. LLMs today: • Talk like scientists • Write like scientists • But don’t think like scientists yet The paper’s core takeaway: Scientific intelligence is not language intelligence. It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.” Until models can reliably do that, claims about “AI scientists” are mostly premature. This paper doesn’t hype AI. It defines the gap we still need to close. And that’s exactly why it’s important.

English

3.5K

Arthur Chatton retweetledi

McGill School of Population and Global Health@McGill_SPGH·19 Ağu

We have a new Associate Dean and Director! healthenews.mcgill.ca/robert-platt-a…

McGill School of Population and Global Health tweet media

English

498

Arthur Chatton retweetledi

Alejandro Schuler@UnibusPluram·4 Eki

do you guys know that you can *design* experiments with TMLE/DML/etc. in mind for the analysis? Here's the power calculation. Smaller sample size for free if you can guess how much your R2 will go down with ML adjustment. degruyter.com/document/doi/1…

English

3.5K

Arthur Chatton retweetledi

Valeriy M., PhD, MBA, CQF@predict_addict·5 Eyl

The groundbreaking paper demonstrating that neural networks are often miscalibrated and tend to produce overconfident predictions has already garnered over 6,000 citations. Despite this, some still occasionally make ludicrous claims on social media that 'calibration doesn’t matter.' It’s important for the community to recognize the value of proper calibration to ensure reliable and trustworthy AI systems. #calibration

English

319

22K

Arthur Chatton retweetledi

Achim Zeileis @[email protected]@AchimZeileis·11 Haz

PSA: All #rstats package on #cran will get an official DOI! This will facilitate bibliometrics and giving credit to R package authors. Registering all 20,000+ packages will still take a few more days. But the first couple of thousand are already live. Example:

Achim Zeileis @zeileis@fosstodon.org tweet media

English

225

859

91.5K

Arthur Chatton@ArthurChatton·14 May

journals.sagepub.com/doi/10.1177/25…

ZXX

319

Arthur Chatton@ArthurChatton·14 May

Now published in AMPPS (link below, open access)! Some additions include a box on non-binary treatments, an expanded discussion on recent works for measurement errors, and a box on alternative forms of IPW. Thanks again to @dingding_peng, who does a fantastic job!

English

6.9K

Arthur Chatton retweetledi

Neuroskeptic 🇺🇦@Neuro_Skeptic·6 May

In 1599 the church used a placebo controlled trial to test if a French girl was possessed by a demon. Holy objects and identical, non-blessed objects were shown to the girl. "She reacted similarly when exposed to both genuine and sham religious objects" journals.sagepub.com/doi/full/10.11…

English

260

1.8K

366.3K

Arthur Chatton retweetledi

Stat Society / Société Stat du Canada@SSC_stat·13 Nis

CFIRB: Soumission d'abrégés pour les affiches jusqu'au 30 avril -- plusieurs prix de voyage à gagner ! / Submit a poster abstract by April 30th -- Several travel awards to win! crmath.ca/activites/#/ty…

Français

244

Arthur Chatton retweetledi

Dr Kareem Carr@kareem_carr·3 Nis

it's extremely powerful when all your projects fall under one overarching goal such that they feed into and enhance each other.

English

135

9.4K

Arthur Chatton retweetledi

Triad sou.@triadsou·13 Mar

Do machine learning methods lead to similar individualized treatment rules? A comparison study on real data. Florie Bouvier, Etienne Peyrot, Alan Balendran, Corentin Ségalas, Ian Roberts, François Petit, Raphaël Porcher. Statistics in Medicine. onlinelibrary.wiley.com/doi/10.1002/si…

Français

1.3K

Arthur Chatton retweetledi

Iván Díaz@ildiazm·8 Şub

Testing for assumptions is not great practice. The test will lead to model changes if the assumption is violated, ruining the validity of uncertainty quantitfication procedures (e.g., CIs) We should use models with assumptions that are reasonably known to be correct a priori.

Jonathan Bartlett@TheStatsGeek

Why test for proportional hazards - or any other models assumptions? Sjölander & @pauldickman doi.org/10.1093/aje/kw…

English

8.8K

Arthur Chatton retweetledi

Gary Collins@GSCollins·22 Oca

All 3 papers (#openaccess) now out in the @bmj_latest on the **Evaluation of clinical prediction models** part 1: tinyurl.com/3h4zfvfw part 2: tinyurl.com/ynytrpmx part 3: tinyurl.com/2cfc9uj3 #stats #MachineLearning #predictiveanalytics

English

181

460

43.5K

Arthur Chatton retweetledi

TRIPODStatement@TRIPODStatement·17 Oca

Good news TRIPOD+AI #reporting recommendations for prediction model studies using either regression or #machinelearning methods has been accepted for publication 🥳🎉 #ArtificialInteligence #transparency #reportingstandards #standardsforAI

TRIPODStatement@TRIPODStatement

Started 4 years ago (tinyurl.com/5n7me3bs), but we've finally submitted the TRIPOD+AI #reporting recommendations for prediction model studies using either regression or #machinelearning methods. Original 2015 guidance here tinyurl.com/bdd6e2at and tinyurl.com/2tvdydfc

English

24.1K

Arthur Chatton retweetledi

🄼🄴🄴🄷🄰🅆🄻 ⭕@meehawl·15 Oca

Three vaccines (MMR, Flu, and HPV) and their risks, compared against becoming infected with the disease itself. nytimes.com/2020/01/09/opi…

Prof Peter Hotez MD PhD DSc(hon)@PeterHotez

Fyi I provided the data and cited refs for the @nytimes but the man behind the graphics was Bill Marsh @billmarshnyt their amazing graphics editor for the Sunday Review and Opinion and other sections. He was really great to work with for this article, attached to this tweet

English

26K

Arthur Chatton retweetledi

Miguel Hernán@_MiguelHernan·22 Kas

"Draw your assumptions before your conclusions" 5 years ago we launched the *free* #CausalDiagrams course @HarvardOnline @edxOnline. Since then, 80,000 people in 180 countries have registered. Check it out if you want to learn about DAGs and SWIGs : edx.org/learn/data-ana…

English

111

516

133.5K

Arthur Chatton retweetledi

Björn Siepe@b_siepe·31 Eki

We reviewed 100 psych. simulation studies & find room for improvement in planning/reporting. As a remedy, we (@BartosFra, @tmorris_mrc, @BoulesteixLaure, @Daniel_W_Heck & Samuel Pawel) present ADEMP-PreReg, a sim study preregistration & reporting template osf.io/ufgy6/

English

19.6K

Arthur Chatton retweetledi

CAUSALab@CAUSALab·24 Eki

Mark your calendars! The 16th #KolokotronesSymposium takes place 12/15 in-person at @HarvardChanSPH and virtually. Join us for a @CAUSALab & @h4p_research event. Learn about #causalinference in #pregnancy. Registration opens soon - stay tuned!

English

3.2K

Arthur Chatton retweetledi

UCL Innovative Clinical Trials Unit@innovativectu·23 Eki

What should you do when your simulation study results aren't what you expected?🤔 Following advice from @IanWhit25399993, @t_mpham, @stats_q & @tmorris_mrc can help you avoid errors and improve the quality of your studies. Read now in @IntJEpidemiol👇bit.ly/3QiIwiS

English

3.3K

Arthur Chatton retweetledi

Journal of Applied Ecology@JAppliedEcology·16 Eki

The prize for most original poster at #AEET2023 so far definitely goes to @ibartomeus et al for this amazing effort!

English

281

1.9K

171.3K

Keşfet

@bmj_latest @HarvardOnline @edXOnline @BartosFra @BoulesteixLaure @HarvardChanSPH @CAUSALab @h4p_research