James Michaelov

2

276

James Michaelov@jamichaelov·2 Ara

Looking forward to #NeurIPS25 this week 🏝️! I'll be presenting at Poster Session 3 (11-2 on Thursday). Feel free to reach out!

Excited to announce that I’ll be presenting a paper at #NeurIPS this year! Reach out if you’re interested in chatting about LM training dynamics, architectural differences, shortcuts/heuristics, or anything at the CogSci/NLP/AI interface in general! #Neurips2025

English

Catherine Arnett@linguist_cat

7

298

James Michaelov@jamichaelov·25 Kas

I'll also be presenting this paper with @linguist_cat at #CogInterp! x.com/linguist_cat/s…

@jamichaelov and I will be presenting our paper at the @CogInterp workshop 13:15 - 14:45 on Dec 7th. We show how disaggregating grammatical benchmarks over the course of training reveals stages of training where models learn heuristics before learning more generalizable patterns.

English

0

3

245

James Michaelov@jamichaelov·25 Kas

Preprint link: arxiv.org/abs/2510.24963

English

0

4

129

James Michaelov@jamichaelov·25 Kas

Excited to announce that I’ll be presenting a paper at #NeurIPS this year! Reach out if you’re interested in chatting about LM training dynamics, architectural differences, shortcuts/heuristics, or anything at the CogSci/NLP/AI interface in general! #Neurips2025

English

2

4

34

2.6K

James Michaelov@jamichaelov·12 Haz

See the full paper here: arxiv.org/abs/2506.06808

English

1

65

James Michaelov@jamichaelov·12 Haz

In the most extreme case, LMs assign sentences such as ‘the car was given a parking ticket by the explorer’ (unlikely but possible event) a lower probability than ‘the car was given a parking ticket by the brake’ (impossible event, related final word) over half of the time.

English

0

2

102

James Michaelov@jamichaelov·12 Haz

New paper accepted at Findings of ACL! TL;DR: While language models generally predict sentences describing possible events to have a higher probability than impossible (animacy-violating) ones, this is not robust for generally unlikely events + is impacted by semantic relatedness

English

Catherine Arnett@linguist_cat

2

9

404

James Michaelov@jamichaelov·13 Mar

Excited to share the second paper of this research project!

✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.

English

12

935

James Michaelov@jamichaelov·6 Eki

Also generally interested in chatting about cognitive modeling, scaling, and language comprehension/understanding in humans and machines! @COLM_conf #COLM2024

Excited to present this at COLM this week! Reach out if you want to meet/chat!

English

6

1K

James Michaelov@jamichaelov·6 Eki

Excited to present this at COLM this week! Reach out if you want to meet/chat!

New preprint with @linguist_cat and Ben Bergen! We’ve all heard of the new wave of recurrent language models, but how good are they for modeling human language comprehension? Quite good, it turns out! 🧵 arxiv.org/abs/2404.19178

English

0

7

1.9K

James Michaelov@jamichaelov·27 Ağu

This paper is now accepted to be presented at @COLM_conf! Updated version is on arXiv. Feeling excited for the conference, let me know if you want to meet!

New preprint with @linguist_cat and Ben Bergen! We’ve all heard of the new wave of recurrent language models, but how good are they for modeling human language comprehension? Quite good, it turns out! 🧵 arxiv.org/abs/2404.19178

English

1

19

2K

James Michaelov@jamichaelov·1 May

@linguist_cat And the current wave of recurrent architectures has just started! As we see more and more new architectures and developments, it will be interesting to see how they compare. One thing does seem clear though: recurrent models are back with a vengeance!

English

1

175

James Michaelov@jamichaelov·1 May

@linguist_cat With reading time, the results are more variable between experiments, and this seems like it might be related to the difference in stimuli (see paper for more details)

English

Ev (like in 'evidence', not Eve) Fedorenko 🇺🇦@ev_fedorenko

0

2

274

James Michaelov@jamichaelov·1 May

New preprint with @linguist_cat and Ben Bergen! We’ve all heard of the new wave of recurrent language models, but how good are they for modeling human language comprehension? Quite good, it turns out! 🧵 arxiv.org/abs/2404.19178

English

2

5

25

4.5K

James Michaelov@jamichaelov·30 Nis

Exciting to see our paper (with @MeganBardolph, Cyma K. Van Petten, Benjamin K. Bergen, and @CoulsonSeana) 'in print' at @jneurolang!

5️⃣Michaelov etal. find surprisal explains N400s to sentence-final words varying in predictability, plausibility, and relation to the likely completion better than sem. similarity. The results support lexical predictive coding accounts. shorturl.at/knsRY @jamichaelov 7/n

English

2

13

1.1K

James Michaelov@jamichaelov·25 Nis

This is concerning, and I wouldn't be surprised if it leads to some students having to withdraw their papers from the conference

Eve Fleisig@enfleisig

NAACL 2024 seems to charge $750 for students to register if they're a presenter (every paper requires at least one registered presenter). @naacl am I reading this right? Seems like a major burden on students, especially if (as is common) only a paper's student authors attend.

English

👶 BabyLM Challenge is back! Can you improve pretraining with a small data budget? BabyLMs for better LLMs & for understanding how humans learn from 100M words New: How vision affects learning Bring your own data Paper track babylm.github.io 🧵

2

371

James Michaelov@jamichaelov·10 Nis

Really enjoyed the @babyLMchallenge talks and posters hosted by @conll_conf/@CMCL_NLP at @emnlpmeeting last year! Looking forward to seeing what people come up with this time round!

babyLM@babyLMchallenge

English

twitter.com/jamichaelov/st…

0

5

784

James Michaelov@jamichaelov·3 Nis

The key takeaway of this study is that compared to the human cloze baseline, language models over-predict words that are either related to the most predictable next word (the 'best completion') or to the event under discussion

ZXX

1

148

James Michaelov@jamichaelov·2 Nis

On the other hand, we show that the next-word prediction is sufficient to get both effects, at least qualitatively.

English