Angelos Poulis

@angelosps

CS PhD student @BUCompSci

Boston, MA Katılım Eylül 2019

48 Takip Edilen8 Takipçiler

Angelos Poulis@angelosps·19 Nis

Takeaway: truth directions in LLMs seem robust mostly in a limited range of pure-factual tasks for specific prompt formats, but break down when truth assessment requires tracking intermediate results. 📄Testing the Limits of Truth Directions in LLMs: arxiv.org/pdf/2604.03754

English

Angelos Poulis@angelosps·19 Nis

Geometrically, we observe that as task difficulty increases, activations of true and false statements become indistinguishable.

English

Angelos Poulis@angelosps·19 Nis

Does an LLM have an internal representation of truth? Yes... but it is more limited than previously assumed. E.g., counting how many (out of 3) cities are in the same country can significantly degrade truth representations. New preprint with @mcrovella and Evimaria Terzi🧵

English

113

Keşfet

@mcrovella @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine