Angelos Poulis

7 posts

Angelos Poulis

Angelos Poulis

@angelosps

CS PhD student @BUCompSci

Boston, MA Katılım Eylül 2019
48 Takip Edilen8 Takipçiler
Angelos Poulis
Angelos Poulis@angelosps·
Takeaway: truth directions in LLMs seem robust mostly in a limited range of pure-factual tasks for specific prompt formats, but break down when truth assessment requires tracking intermediate results. 📄Testing the Limits of Truth Directions in LLMs: arxiv.org/pdf/2604.03754
English
0
0
0
30
Angelos Poulis
Angelos Poulis@angelosps·
Geometrically, we observe that as task difficulty increases, activations of true and false statements become indistinguishable.
Angelos Poulis tweet media
English
1
0
0
57
Angelos Poulis
Angelos Poulis@angelosps·
Does an LLM have an internal representation of truth? Yes... but it is more limited than previously assumed. E.g., counting how many (out of 3) cities are in the same country can significantly degrade truth representations. New preprint with @mcrovella and Evimaria Terzi🧵
English
1
1
3
113