Alexander Hoyle

918 posts

Alexander Hoyle

@miserlis_

Postdoctoral fellow with @ETH_AI_Center. CSS + NLP. Previously CS PhD @umdcs, intern at @msftresearch and @ai2_allennlp.

Katılım Mayıs 2011

589 Takip Edilen1.1K Takipçiler

Alexander Hoyle@miserlis_·6d

@NishantBalepur @rachelrudinger Wow, this is very fun. Nice work as usual !

English

Nishant Balepur@NishantBalepur·18 May

🚨 New Paper! 🚨 One of my first Ph.D. papers found that LLMs can answer multiple-choice questions without seeing the question 🤔 At #ACL2026, I'm presenting a follow-up showing that current reasoning LLMs can still do this! And quite similarly to a clever test-taker 🧑‍🎓🧵

English

110

1.8K

1.2M

Alexander Hoyle@miserlis_·27 Nis

We view this effort as a first rung on a reproducibility ladder, eventually ending with wholesale replication from only a research question Ben, and the rest of the team, did terrific work on this and I’m really excited for what’s next! Now on arXiv: arxiv.org/abs/2604.21965

English

Alexander Hoyle@miserlis_·27 Nis

New preprint out! Recent work tasks LLM agents with re-running existing social science replication code. Given current capabilities, that should be table stakes Here, we move up a level of abstraction, and ask models to reproduce results from a paper’s descriptions alone

Elliott Ash@ellliottt

Can AI agents read a social science paper and write the code from scratch to reproduce its results? No access to original code. Just text + data. New paper with Ben Kohler, @david_rzs, @__jae_1, and @miserlis_ 👇

English

2.1K

Alexander Hoyle@miserlis_·22 Nis

Thanks for mavhing me! I had a lovely time!

MilaNLP@MilaNLProc

We were thrilled to host @miserlis_ at our lab! His insightful talk on topic modeling sparked great discussions and fresh perspectives across the team. Thanks for the visit, we hope to welcome you back soon! #NLProc

English

393

Alexander Hoyle@miserlis_·9 Nis

David Pfau@pfau

Out of the whole space of bad LLM applications, there is something about this specifically that upsets me on a different level, because it so fundamentally misunderstands the thing it is trying to replace that I fail to understand how the idea ever arose in the first place.

ZXX

1.4K

Alexander Hoyle@miserlis_·1 Nis

@besttrousers @FuckoBucko1 @FamilyOfficeCA as much as one can be said to "agree" with a factual statement about causal graphs

English

Alexander Hoyle@miserlis_·1 Nis

@besttrousers @FuckoBucko1 @FamilyOfficeCA factcheck.org/2012/06/obamas… I remember this specifically because I found it so irksome (disclaimer: I am not a chud and agree with your original point)

English

122

Matt Darling 🌐🏗️@besttrousers·1 Nis

Bad analysis. You should not control for these factors, because they all are downstream of gender on the causal pathway. Think about Mad Men S1. There was no pay gap between men and women when you controlled for gender - because women were prohibited from high paying positions.

English

990

111.7K

Alexander Hoyle@miserlis_·31 Mar

@yegordb Very interested!

English

Yegor Denisov-Blanch@yegordb·31 Mar

Full room for our new class CS 321M: AI Measurement Science today! Why the class matters: Every field that ignored measurement science paid the price Psychometrics ran into a simple problem: - a low score could mean a bad student… or just a harder test - That forced them to build better ways to measure ability instead of just raw scores Finance learned it in 2008 - Toxic mortgages were labeled safe. The ratings looked precise, but they didn’t reflect reality - As long as the numbers looked good, the system kept going… until it unraveled Medicine is the clearest case - In the CAST trial, drugs reduced irregular heartbeats. On paper, that looked like success - In reality, more patients died. The metric improved, but the outcome got worse In each case, the metrics looked right while the outcomes were not AI is starting to look similar: - Benchmarks go up and leaderboards improve - But models shift, scores drift away from real-world use, and systems get optimized for the metric instead of the outcome We’re already using these numbers (benchmarks, metrics) to make decisions about deployment, regulation, and trust. Measurement is the foundation of science. If it’s off, everything built on it is too. We’ll be sharing more distilled insights from the class - make sure to follow along! Comment if you want the full materials (slides, textbook, etc.) (And if you commented on my last post - we're working hard to get the materials to you!)

English

4.4K

Alexander Hoyle@miserlis_·30 Mar

@RexDouglass Could you elaborate/do you have some examples? I'm now sitting in an Econ group and I feel that CS research could benefit a lot from a similar emphasis on robustness/rigor/causality (although I agree both engage in a willful ignorance of existing work) doomscrollingbabel.manoel.xyz/p/the-missing-…

English

287

Rex "garbage in" Douglass Ph.D.@RexDouglass·30 Mar

LoL I hunt economists recreationally in part because there's a whole ecosystem of midwits that fell for the credibility revolution PR campaign. Endless trash do files for a cute identification strategy with opaque hidden assumptions.

Richard Hanania@RichardHanania

If you’ve ever read an empirical paper in a top economic journal and compare it to another social science, it’s night and day. Learning about the standards of economics has made me embarrassed of the correlational regression monkey studies I used to read in political science.

English

5.1K

Alexander Hoyle@miserlis_·27 Mar

@causalinf I think you might be interested in our paper on LLMs for scalar measurement: aclanthology.org/2025.emnlp-mai…

English

2.2K

scott cunningham@causalinf·27 Mar

In one thing I did, I sent 305,000 congressional speeches to OpenAI for batch classification of whether the speech was pro or anti immigration (a paper by Leah Boustan and others), but rather than allow for “pro, anti or neutral”, I gave it a “thermometer” of -100 to 100 1/n

English

138

39.1K

Alexander Hoyle@miserlis_·26 Mar

@AaronSchein The closest you'd get to that with this approach is using slides.com

English

Aaron Schein@AaronSchein·26 Mar

@miserlis_ Do any of these roads end in a Keynote deck that can be edited the usual way?

English

Alexander Hoyle@miserlis_·26 Mar

I wrote a blog post on my experience using AI for slide generation Basic idea: write your lecture notes first, then prompt the LLM to produce corresponding slides in reveal.js (h/t @ChenhaoTan). I'm picky about my slides but was happy with the results! (link in thread below)

English

249

23.4K

Alexander Hoyle@miserlis_·26 Mar

@ChenhaoTan @Keleesssss If you mean me, then yes :)

English

175

Chenhao Tan@ChenhaoTan·26 Mar

@Keleesssss @miserlis_ Wow you make beautiful slides! Can I steal the template?

English

156

Alexander Hoyle@miserlis_·26 Mar

@Keleesssss @ChenhaoTan Woah I love that!

English

818

Alperen Keleş@Keleesssss·26 Mar

@miserlis_ @ChenhaoTan I also use it for building animations, they look pretty sleek IMO (theconsensus.dev/p/2026/03/06/s…)

English

1.5K

Alexander Hoyle retweetledi

Mel Andrews@bayesianboy·28 Oca

If you, as a scientist, cannot be bothered to engage in the intellectual work of science, please quit your job and leave it to someone with skill and integrity.

MIT Technology Review@techreview

OpenAI’s latest product let’s you vibe code science trib.al/kxbfFr0

English

401

3.2K

113.1K

Alexander Hoyle@miserlis_·25 Oca

@shakoistsLog trying to follow your post---is there any underlying data being analyzed, or are you relying on the idea that models are ingesting lost of data and can reflectively self-analyze?

English

376

shako@shakoistsLog·25 Oca

I keep wanting to share this recent post, and since X doesn't seem to want to stop devolving any time soon I might as well now. I spent a lot of time working on a concept of an idea of a prototype to push LLMs into the direction of applied social sciences.

English

436

70.3K

Alexander Hoyle@miserlis_·8 Oca

A new EACL paper! There's been a lot of interest in LLMs for annotation recently, and they tend to treat humans as a ground truth. But we know that's a simplification---humans disagree all the time. Here, we investigate whether we can model that disagreement with LLMs

Ni Jingwei@NJingwei

🚨 Using reasoning LLMs as annotators? You might be erasing critical human disagreements. Our EACL'25 paper shows RLVR-style reasoning actually HARMS disagreement modeling—even when carefully prompted to consider it! 🙀 📄 Paper: arxiv.org/abs/2506.19467🧵👇

English

3.3K

Alexander Hoyle@miserlis_·7 Oca

@ChenhaoTan Looking forward to seeing your process! I'm not familiar with reveal.js, what's the learning curve like?

English

375

Chenhao Tan@ChenhaoTan·6 Oca

The last time I taught NLP was winter 2022, how the world has changed! My main goal in this quarter is to move everything online and in a public github organization: uchicago-nlp-course.github.io. Part of making everything code is that now I am making all slides in reveal.js. It looks pretty good so far! Let us see if I can keep this up! I have some completely new lectures to make.

English

338

16K

Alexander Hoyle retweetledi

Mosh Levy@mosh_levy·16 Ara

New paper: We are often told that reasoning tokens aren't faithful explanations. But to have a useful metaphor for their operation we need a characterization of what they are, not what they are not. To that end, we suggest "State over Tokens" (SoT) 👇🧵

English

13.3K

Keşfet

@NishantBalepur @rachelrudinger @besttrousers @FuckoBucko1 @FamilyOfficeCA @yegordb @RexDouglass @causalinf