Arthur Chatton

160 posts

Arthur Chatton

Arthur Chatton

@ArthurChatton

Assistant professor in Biostatistics at Université de Montréal. Causal inference - Casual chess (or perhaps the reverse)

Katılım Ocak 2020
450 Takip Edilen147 Takipçiler
Arthur Chatton retweetledi
Andrew Vickers
Andrew Vickers@VickersBiostats·
"LLMs .... overfit to surface patterns; struggle to abandon bad hypotheses even when evidence contradicts them; confuse correlation for causation; hallucinate explanations when experiments fail; optimize for plausibility, not truth". In short, LLMs act like human scientists.
Alex Prompter@alex_prompter

This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly: Can LLMs actually discover science, or are they just good at talking about it? The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder: Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists? Here’s what the authors did differently 👇 • They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision • Tasks span biology, chemistry, and physics, not toy puzzles • Models must work with incomplete data, noisy results, and false leads • Success is measured by scientific progress, not fluency or confidence What they found is sobering. LLMs are decent at suggesting hypotheses, but brittle at everything that follows. ✓ They overfit to surface patterns ✓ They struggle to abandon bad hypotheses even when evidence contradicts them ✓ They confuse correlation for causation ✓ They hallucinate explanations when experiments fail ✓ They optimize for plausibility, not truth Most striking result: `High benchmark scores do not correlate with scientific discovery ability.` Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories. Why this matters: Real science is not one-shot reasoning. It’s feedback, failure, revision, and restraint. LLMs today: • Talk like scientists • Write like scientists • But don’t think like scientists yet The paper’s core takeaway: Scientific intelligence is not language intelligence. It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.” Until models can reliably do that, claims about “AI scientists” are mostly premature. This paper doesn’t hype AI. It defines the gap we still need to close. And that’s exactly why it’s important.

English
1
5
21
3.5K
Arthur Chatton retweetledi
Alejandro Schuler
Alejandro Schuler@UnibusPluram·
do you guys know that you can *design* experiments with TMLE/DML/etc. in mind for the analysis? Here's the power calculation. Smaller sample size for free if you can guess how much your R2 will go down with ML adjustment. degruyter.com/document/doi/1…
Alejandro Schuler tweet media
English
5
6
28
3.5K
Arthur Chatton retweetledi
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
The groundbreaking paper demonstrating that neural networks are often miscalibrated and tend to produce overconfident predictions has already garnered over 6,000 citations. Despite this, some still occasionally make ludicrous claims on social media that 'calibration doesn’t matter.' It’s important for the community to recognize the value of proper calibration to ensure reliable and trustworthy AI systems. #calibration
Valeriy M., PhD, MBA, CQF tweet media
English
4
40
319
22K
Arthur Chatton retweetledi
Achim Zeileis @zeileis@fosstodon.org
PSA: All #rstats package on #cran will get an official DOI! This will facilitate bibliometrics and giving credit to R package authors. Registering all 20,000+ packages will still take a few more days. But the first couple of thousand are already live. Example:
Achim Zeileis @zeileis@fosstodon.org tweet media
English
12
225
859
91.5K
Arthur Chatton
Arthur Chatton@ArthurChatton·
Now published in AMPPS (link below, open access)! Some additions include a box on non-binary treatments, an expanded discussion on recent works for measurement errors, and a box on alternative forms of IPW. Thanks again to @dingding_peng, who does a fantastic job!
English
1
3
27
6.9K
Arthur Chatton retweetledi
Neuroskeptic 🇺🇦
Neuroskeptic 🇺🇦@Neuro_Skeptic·
In 1599 the church used a placebo controlled trial to test if a French girl was possessed by a demon. Holy objects and identical, non-blessed objects were shown to the girl. "She reacted similarly when exposed to both genuine and sham religious objects" journals.sagepub.com/doi/full/10.11…
Neuroskeptic 🇺🇦 tweet media
English
36
260
1.8K
366.3K
Arthur Chatton retweetledi
Dr Kareem Carr
Dr Kareem Carr@kareem_carr·
it's extremely powerful when all your projects fall under one overarching goal such that they feed into and enhance each other.
English
2
7
135
9.4K
Arthur Chatton retweetledi
Triad sou.
Triad sou.@triadsou·
Do machine learning methods lead to similar individualized treatment rules? A comparison study on real data. Florie Bouvier, Etienne Peyrot, Alan Balendran, Corentin Ségalas, Ian Roberts, François Petit, Raphaël Porcher. Statistics in Medicine. onlinelibrary.wiley.com/doi/10.1002/si…
Français
0
7
11
1.3K
Arthur Chatton retweetledi
Iván Díaz
Iván Díaz@ildiazm·
Testing for assumptions is not great practice. The test will lead to model changes if the assumption is violated, ruining the validity of uncertainty quantitfication procedures (e.g., CIs) We should use models with assumptions that are reasonably known to be correct a priori.
Jonathan Bartlett@TheStatsGeek

Why test for proportional hazards - or any other models assumptions? Sjölander & @pauldickman doi.org/10.1093/aje/kw…

English
4
9
38
8.8K
Arthur Chatton retweetledi
TRIPODStatement
TRIPODStatement@TRIPODStatement·
Good news TRIPOD+AI #reporting recommendations for prediction model studies using either regression or #machinelearning methods has been accepted for publication 🥳🎉 #ArtificialInteligence #transparency #reportingstandards #standardsforAI
TRIPODStatement tweet media
TRIPODStatement@TRIPODStatement

Started 4 years ago (tinyurl.com/5n7me3bs), but we've finally submitted the TRIPOD+AI #reporting recommendations for prediction model studies using either regression or #machinelearning methods. Original 2015 guidance here tinyurl.com/bdd6e2at and tinyurl.com/2tvdydfc

English
0
27
83
24.1K
Arthur Chatton retweetledi
🄼🄴🄴🄷🄰🅆🄻 ⭕
Three vaccines (MMR, Flu, and HPV) and their risks, compared against becoming infected with the disease itself. nytimes.com/2020/01/09/opi…
🄼🄴🄴🄷🄰🅆🄻 ⭕ tweet media🄼🄴🄴🄷🄰🅆🄻 ⭕ tweet media🄼🄴🄴🄷🄰🅆🄻 ⭕ tweet media
Prof Peter Hotez MD PhD DSc(hon)@PeterHotez

Fyi I provided the data and cited refs for the @nytimes but the man behind the graphics was Bill Marsh @billmarshnyt their amazing graphics editor for the Sunday Review and Opinion and other sections. He was really great to work with for this article, attached to this tweet

English
1
36
94
26K