Alexander Hoyle

918 posts

Alexander Hoyle banner
Alexander Hoyle

Alexander Hoyle

@miserlis_

Postdoctoral fellow with @ETH_AI_Center. CSS + NLP. Previously CS PhD @umdcs, intern at @msftresearch and @ai2_allennlp.

Katılım Mayıs 2011
589 Takip Edilen1.1K Takipçiler
Nishant Balepur
Nishant Balepur@NishantBalepur·
🚨 New Paper! 🚨 One of my first Ph.D. papers found that LLMs can answer multiple-choice questions without seeing the question 🤔 At #ACL2026, I'm presenting a follow-up showing that current reasoning LLMs can still do this! And quite similarly to a clever test-taker 🧑‍🎓🧵
Nishant Balepur tweet media
English
49
110
1.8K
1.2M
Alexander Hoyle
Alexander Hoyle@miserlis_·
We view this effort as a first rung on a reproducibility ladder, eventually ending with wholesale replication from only a research question Ben, and the rest of the team, did terrific work on this and I’m really excited for what’s next! Now on arXiv: arxiv.org/abs/2604.21965
English
0
0
2
91
Alexander Hoyle
Alexander Hoyle@miserlis_·
New preprint out! Recent work tasks LLM agents with re-running existing social science replication code. Given current capabilities, that should be table stakes Here, we move up a level of abstraction, and ask models to reproduce results from a paper’s descriptions alone
Elliott Ash@ellliottt

Can AI agents read a social science paper and write the code from scratch to reproduce its results? No access to original code. Just text + data. New paper with Ben Kohler, @david_rzs, @__jae_1, and @miserlis_ 👇

English
1
4
18
2.1K
Matt Darling 🌐🏗️
Matt Darling 🌐🏗️@besttrousers·
Bad analysis. You should not control for these factors, because they all are downstream of gender on the causal pathway. Think about Mad Men S1. There was no pay gap between men and women when you controlled for gender - because women were prohibited from high paying positions.
English
85
57
990
111.7K
Yegor Denisov-Blanch
Yegor Denisov-Blanch@yegordb·
Full room for our new class CS 321M: AI Measurement Science today! Why the class matters: Every field that ignored measurement science paid the price Psychometrics ran into a simple problem: - a low score could mean a bad student… or just a harder test - That forced them to build better ways to measure ability instead of just raw scores Finance learned it in 2008 - Toxic mortgages were labeled safe. The ratings looked precise, but they didn’t reflect reality - As long as the numbers looked good, the system kept going… until it unraveled Medicine is the clearest case - In the CAST trial, drugs reduced irregular heartbeats. On paper, that looked like success - In reality, more patients died. The metric improved, but the outcome got worse In each case, the metrics looked right while the outcomes were not AI is starting to look similar: - Benchmarks go up and leaderboards improve - But models shift, scores drift away from real-world use, and systems get optimized for the metric instead of the outcome We’re already using these numbers (benchmarks, metrics) to make decisions about deployment, regulation, and trust. Measurement is the foundation of science. If it’s off, everything built on it is too. We’ll be sharing more distilled insights from the class - make sure to follow along! Comment if you want the full materials (slides, textbook, etc.) (And if you commented on my last post - we're working hard to get the materials to you!)
Yegor Denisov-Blanch tweet mediaYegor Denisov-Blanch tweet media
English
17
8
39
4.4K
Rex "garbage in" Douglass Ph.D.
LoL I hunt economists recreationally in part because there's a whole ecosystem of midwits that fell for the credibility revolution PR campaign. Endless trash do files for a cute identification strategy with opaque hidden assumptions.
Richard Hanania@RichardHanania

If you’ve ever read an empirical paper in a top economic journal and compare it to another social science, it’s night and day. Learning about the standards of economics has made me embarrassed of the correlational regression monkey studies I used to read in political science.

English
3
0
24
5.1K
scott cunningham
scott cunningham@causalinf·
In one thing I did, I sent 305,000 congressional speeches to OpenAI for batch classification of whether the speech was pro or anti immigration (a paper by Leah Boustan and others), but rather than allow for “pro, anti or neutral”, I gave it a “thermometer” of -100 to 100 1/n
English
7
7
138
39.1K
Aaron Schein
Aaron Schein@AaronSchein·
@miserlis_ Do any of these roads end in a Keynote deck that can be edited the usual way?
English
1
0
1
65
Alexander Hoyle
Alexander Hoyle@miserlis_·
I wrote a blog post on my experience using AI for slide generation Basic idea: write your lecture notes first, then prompt the LLM to produce corresponding slides in reveal.js (h/t @ChenhaoTan). I'm picky about my slides but was happy with the results! (link in thread below)
Alexander Hoyle tweet media
English
6
22
249
23.4K
Alexander Hoyle
Alexander Hoyle@miserlis_·
@shakoistsLog trying to follow your post---is there any underlying data being analyzed, or are you relying on the idea that models are ingesting lost of data and can reflectively self-analyze?
English
1
0
3
376
shako
shako@shakoistsLog·
I keep wanting to share this recent post, and since X doesn't seem to want to stop devolving any time soon I might as well now. I spent a lot of time working on a concept of an idea of a prototype to push LLMs into the direction of applied social sciences.
shako tweet media
English
23
26
436
70.3K
Alexander Hoyle
Alexander Hoyle@miserlis_·
A new EACL paper! There's been a lot of interest in LLMs for annotation recently, and they tend to treat humans as a ground truth. But we know that's a simplification---humans disagree all the time. Here, we investigate whether we can model that disagreement with LLMs
Ni Jingwei@NJingwei

🚨 Using reasoning LLMs as annotators? You might be erasing critical human disagreements. Our EACL'25 paper shows RLVR-style reasoning actually HARMS disagreement modeling—even when carefully prompted to consider it! 🙀 📄 Paper: arxiv.org/abs/2506.19467🧵👇

English
0
2
29
3.3K
Alexander Hoyle
Alexander Hoyle@miserlis_·
@ChenhaoTan Looking forward to seeing your process! I'm not familiar with reveal.js, what's the learning curve like?
English
2
0
0
375
Chenhao Tan
Chenhao Tan@ChenhaoTan·
The last time I taught NLP was winter 2022, how the world has changed! My main goal in this quarter is to move everything online and in a public github organization: uchicago-nlp-course.github.io. Part of making everything code is that now I am making all slides in reveal.js. It looks pretty good so far! Let us see if I can keep this up! I have some completely new lectures to make.
Chenhao Tan tweet media
English
6
57
338
16K
Alexander Hoyle retweetledi
Mosh Levy
Mosh Levy@mosh_levy·
New paper: We are often told that reasoning tokens aren't faithful explanations. But to have a useful metaphor for their operation we need a characterization of what they are, not what they are not. To that end, we suggest "State over Tokens" (SoT) 👇🧵
Mosh Levy tweet media
English
5
18
60
13.3K