Nitay Calderon (@NitCal) - Twitter-Profil | Zamantika Mersobahis Locabet

Angehefteter Tweet

[1/7] Why do frontier LLMs make factual errors? Is it because they never learned the fact… or because they can’t access knowledge they already encoded? In our new paper, we show: The bottleneck is not encoding; it is recall. 🧵👇 Paper: arxiv.org/abs/2602.14080 Many thanks to @_galyo @bd_eyal @zorikgekhman @eran_ofek59358 @GoogleResearch

English

4

32

107

6.1K

Nitay Calderon retweetet

Omer Nahum@omer6nahum·4d

Do LLMs have motivation? Motivation is a key lens for explaining human behavior. As LLM behavior becomes more human-like, a natural question arises: could it help understand model behavior too? With @AsaelSklar @GoldsteinYAriel @roireichart 📄 Paper: arxiv.org/pdf/2603.14347 1/5

English

3

16

48

2.6K

Nitay Calderon retweetet

Gal Kesten Pomeranz@KestenGal·14 Mar

Protein repeat detection is hard: repeated segments are often mutated and only approximately similar. Yet PLMs can still detect them well. But How? Check out our new preprint: "Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models"

English

1

16

44

5.1K

Nitay Calderon retweetet

Zorik Gekhman@zorikgekhman·11 Mar

New paper 🚨 We know that reasoning helps when step-by-step solutions are natural, for example in math, code, and multi-hop factual QA. But why should it help with factual recall, where no complex reasoning steps are needed? 1/🧵

English

3

16

90

13.4K

Nitay Calderon retweetet

Gal Yona@_galyo·3 Mar

this is really really neat, but it is literally the EXACT OPPOSITE of vibemathing Don Knuth (at 88, worth adding!) carefully went through all 30 of Claude's attempts, and then actually wrote down a proof for the one that seemed to empirically work

Tenobrus@tenobrus

Donald Knuth is vibemathing now. real tough day for the stochastic-parrot crew.

English

30

126

1.7K

114.8K

Nitay Calderon@NitCal·25 Şub

@CFGeek Thanks!

English

0

41

Charles Foster@CFGeek·24 Şub

@NitCal Nice figure!

English

1

0

1

42

Nitay Calderon@NitCal·24 Şub

[1/7] Why do frontier LLMs make factual errors? Is it because they never learned the fact… or because they can’t access knowledge they already encoded? In our new paper, we show: The bottleneck is not encoding; it is recall. 🧵👇 Paper: arxiv.org/abs/2602.14080 Many thanks to @_galyo @bd_eyal @zorikgekhman @eran_ofek59358 @GoogleResearch

English

4

32

107

6.1K

Nitay Calderon@NitCal·25 Şub

@bronzeagepapi Could be the reason for recall failures.

English

0

1

38

Kirito (e/acc) 🏴‍☠️@bronzeagepapi·25 Şub

@NitCal Associative memory issue?

English

1

0

1

63

Nitay Calderon@NitCal·25 Şub

@AndrewLampinen Oh wow, definitely a very relevant and interesting work. Ill make sure to add a discussion of it in the newer version.

English

1

0

1

49

Andrew Lampinen@AndrewLampinen·25 Şub

@NitCal Cool! You might like our work on latent learning: x.com/AndrewLampinen… — we similarly suggest that models fail to effectively use information when it needs to be recalled in a sufficiently different format, inspired by some LM findings but also other phenomena, e.g. in RL

Andrew Lampinen@AndrewLampinen

we argue that parametric learning methods are too tied to the explicit training task, and fail to effectively encode latent information relevant to possible future tasks, and we suggest that this explains a wide range of findings, from navigation to the reversal curse. 3/

English

1

0

14

1.3K

Nitay Calderon retweetet

roeeaharoni@roeeaharoni·24 Şub

New work from our group on understanding factual errors in LLMs, led by @NitCal ! Models encode more knowledge than they can recall when prompted. We need to close this gap!

Nitay Calderon@NitCal

[1/7] Why do frontier LLMs make factual errors? Is it because they never learned the fact… or because they can’t access knowledge they already encoded? In our new paper, we show: The bottleneck is not encoding; it is recall. 🧵👇 Paper: arxiv.org/abs/2602.14080 Many thanks to @_galyo @bd_eyal @zorikgekhman @eran_ofek59358 @GoogleResearch

English

0

2

5

365

Nitay Calderon@NitCal·24 Şub

x.com/nitcal/status/…

Nitay Calderon@NitCal

[1/7] Why do frontier LLMs make factual errors? Is it because they never learned the fact… or because they can’t access knowledge they already encoded? In our new paper, we show: The bottleneck is not encoding; it is recall. 🧵👇 Paper: arxiv.org/abs/2602.14080 Many thanks to @_galyo @bd_eyal @zorikgekhman @eran_ofek59358 @GoogleResearch

ZXX

0

49

Nitay Calderon@NitCal·20 Şub

Thanks @_akhaliq 🤩 We will soon share a thread about our paper @_galyo @zorikgekhman @bd_eyal

AK@_akhaliq

Google presents Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality paper: huggingface.co/papers/2602.14…

English

1

4

12

968

Nitay Calderon@NitCal·24 Şub

@_akhaliq x.com/nitcal/status/…

Nitay Calderon@NitCal

[1/7] Why do frontier LLMs make factual errors? Is it because they never learned the fact… or because they can’t access knowledge they already encoded? In our new paper, we show: The bottleneck is not encoding; it is recall. 🧵👇 Paper: arxiv.org/abs/2602.14080 Many thanks to @_galyo @bd_eyal @zorikgekhman @eran_ofek59358 @GoogleResearch

QME

0

29

AK@_akhaliq·19 Şub

Google presents Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality paper: huggingface.co/papers/2602.14…

English

5

13

52

11.3K

Nitay Calderon@NitCal·24 Şub

So, what can help recall? Thinking helps LLMs access the knowledge they already store. It recovers 40–65% of encoded-but-not-directly-known facts, with the largest gains for rare facts and reverse questions. This resembles the human “tip-of-the-tongue” effect. Bottom line: Given that encoding in frontier LLMs is nearing saturation while substantial headroom remains for recall, future improvements are likely to come not from scaling but from better utilization of existing knowledge. Paper: arxiv.org/abs/2602.14080

English

0

1

4

217

Nitay Calderon@NitCal·24 Şub

[6/7] The reversal curse is when LLMs know "A is B" but can't answer "What is B?" If bidirectional knowledge were missing, reverse questions would be hard in any format. But in multiple-choice, reverse questions are as easy as direct ones. The problem, again, is recall.

English

1

4

238

Nitay Calderon retweetet

Avi Caciularu@clu_avi·23 Şub

CoT is powerful but slow. @AmosaurusRex, my intern, introduces Thinking States—a way for LLMs to reason while processing input, not after. 🚀 Better latency 📈 (Almost) Matches CoT on complex QA 🧠 Learns from NL supervision Check out his thread 🧵

Ido Amos@AmosaurusRex

Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: arxiv.org/abs/2602.08332

English

0

7

27

4.8K

Nitay Calderon retweetet

Marius Mosbach@mariusmosbach·23 Şub

LatentLens is now installable via pip. Check it out and let us know what you find! 🔍⚙️

Benno Krojer@benno_krojer

You can now "pip install latentlens" 🔨 It comes with: * pre-computed embeddings for several popular LLMs and VLMs * a txt file with sentences describing WordNet concepts, which we recommend as a standard corpus to get embeddings from * ... Try it out and let us know what we can improve!

English

0

1

11

536

Nitay Calderon retweetet