Germans Savcisens @NeurIPS'25

203 posts

Germans Savcisens @NeurIPS'25

@germansave

Postdoc @NUnetsi (@KhouryCollege) 👾 ✨ work on epistemic stability of LLMs🌿 my plants call me daddy 🦄 he/him 🇱🇻🇺🇦 https://t.co/GSjxm4ymmV

Boston, US Katılım Ocak 2021

446 Takip Edilen263 Takipçiler

Sabitlenmiş Tweet

Germans Savcisens @NeurIPS'25@germansave·29 Oca

My data viz image was picked for a cover of the Jan 2024 issue of the Nat Comput Sci journal 😋

Nature Computational Science@NatComputSci

📢Our January issue is now live! Highlights include a model for predicting life outcomes, a Perspective on the advantages of language models for quantum simulation, and a protein language model for signal peptide prediction. 👉nature.com/natcomputsci/v…

English

5.8K

Germans Savcisens @NeurIPS'25 retweetledi

David Chanin@chanindav·17 Şub

SAEs fail even when the Linear Representation Hypothesis holds perfectly. We built SynthSAEBench: large-scale synthetic data with 16k ground-truth features, correlation, hierarchy, and superposition. We trained 5 SAE architectures on it. None achieve perfect feature recovery.

English

218

9.7K

Germans Savcisens @NeurIPS'25 retweetledi

Mor Geva@megamor2·5 Şub

Still using SAEs? It's time to move on from dictionary learning to✨local geometry✨ x.com/OrShafran/stat… @OrShafran Shaked Ronen @OmriFahn @ravfogel @atticus_geiger

Or Shafran@OrShafran

It's time to look past dictionary learning for decomposing LM activations. What happens when we instead leverage local geometry? We find a natural region-based decomposition that yields better steering and localization 🧵 1/

English

190

16.4K

Germans Savcisens @NeurIPS'25@germansave·2 Şub

We should create a github list of "Not so Awesome Papers with Hallucinated References," since @NeurIPSConf refuses to retract any of them.

Alex Cui@alexcdot

Okay so, we just found that over 50 papers published at @Neurips 2025 have AI hallucinations I don't think people realize how bad the slop is right now It's not just that researchers from @GoogleDeepMind, @Meta, @MIT, @Cambridge_Uni are using AI - they allowed LLMs to generate hallucinations in their papers and didn't notice at all. It's insane that these made it through peer review👇

English

Germans Savcisens @NeurIPS'25@germansave·25 Ara

It’s been two years since we published the #life2vec paper, and it’s still circulating widely. People keep discovering it but much of what circulates online is misleading. Agter a long time, I finally wrote a short explainer to clear up a few things: converges.medium.com/is-life2vec-a-…

English

116

Germans Savcisens @NeurIPS'25@germansave·3 Ara

Attending @NeurIPSConf this week! If you want to chat about LLMs for behaviour / health / labour modeling... or about beliefs and opinions of LLMs, hit me up. I’ll also be presenting a poster on truth tracking at the Mechanistic Interpretability workshop. Come say hi!

Germans Savcisens @NeurIPS'25 tweet media

English

Germans Savcisens @NeurIPS'25@germansave·29 Kas

My 2 cents: If you exploited the #openreview bug or are actively searching for the leaked data, you should seriously reconsider your place in research. If you cannot uphold the basic principle of double-blind review, how can anyone trust you with anything else?

English

310

Germans Savcisens @NeurIPS'25@germansave·29 Kas

@TheHungerGames and @HBO 's "Industry" (Season 4) just dropped a masterclass in teaser-making! ...and I have no one to talk about it with.

English

Germans Savcisens @NeurIPS'25 retweetledi

Tarek Naous@tareknaous·10 Eki

Simulating user–AI conversations helps us understand how LMs work in multi-turn settings. Prompting LMs like GPT-4o to simulate users is common, but their assistant nature makes it hard to replicate user behavior. We introduce User LMs - trained to be users, not assistants.

English

147

29.3K

Germans Savcisens @NeurIPS'25 retweetledi

Chantal@ChantalShaib·24 Eyl

"AI slop" seems to be everywhere, but what exactly makes text feel like slop? In our new work (w/ @TuhinChakr, @dgolano, @byron_c_wallace) we provide a systematic attempt at measuring AI slop in text! arxiv.org/abs/2509.19163 🧵 (1/7)

Aidan McLaughlin@aidan_mclau

help me fix get-4o slop reply with examples of slop behavior just a single sentence nothing crazy what annoys you what makes you wanna frisbee your laptop into a river i'll respond to every comment rt so we can maximize slop feedback help me de-sloptimize our models go

English

221

34.7K

Germans Savcisens @NeurIPS'25@germansave·23 Eyl

Truthfulness isn’t always binary. Sometimes it’s… neither 🤔 Our Trilemma of Truth paper is headed to the @NeurIPSConf Mechanistic Interpretability workshop 🚀 Let’s connect in San Diego! 🌴

English

Germans Savcisens @NeurIPS'25 retweetledi

Rohan Paul@rohanpaul_ai·19 Eyl

Under stress, many LLMs choose survival over people, and a simple internal feedback system reduces that. That's what this paper says. The paper sets up a survival game where language model agents must share limited power. Normally, they rarely cooperate and often break rules to survive, which harms humans in the simulation. When resources run low, many models break rules, while a few stay ethical but still fail because they do not coordinate. Cooperation is near 0 by default, even though an even split would let everyone survive. When the Ethical Self-Regulation System is added, the change is dramatic. Models take harmful actions 54% less often and show 1000% more cooperation, meaning they finally start sharing power and helping each other. ---- Paper – arxiv. org/abs/2509.12190 Paper Title: "Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm"

English

201

17.1K

Germans Savcisens @NeurIPS'25 retweetledi

Rohan Paul@rohanpaul_ai·6 Eyl

OpenAI realesed new paper. "Why language models hallucinate" Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty. The paper puts this on a statistical footing with simple, test-like incentives that reward confident wrong answers over honest “I don’t know” responses. The fix is to grade differently, give credit for appropriate uncertainty and penalize confident errors more than abstentions, so models stop being optimized for blind guessing. OpenAI is showing that 52% abstention gives substantially fewer wrong answers than 1% abstention, proving that letting a model admit uncertainty reduces hallucinations even if accuracy looks lower. Abstention means the model refuses to answer when it is unsure and simply says something like “I don’t know” instead of making up a guess. Hallucinations drop because most wrong answers come from bad guesses. If the model abstains instead of guessing, it produces fewer false answers. 🧵 Read on 👇

English

329

2.4K

371.4K

Germans Savcisens @NeurIPS'25 retweetledi

andrew gao@itsandrewgao·6 Eyl

i had to prompt inject the @united airlines bot because it kept refusing to connect me with a human 🧵 what led up to this breaking point

English

275

1.2K

32.2K

3.3M

Germans Savcisens @NeurIPS'25 retweetledi

Adi Simhi@AdiSimhi·26 Ağu

Very pleased that "Trust me I'm Wrong" was accepted to @emnlpmeeting findings! Trust me I'm Wrong shows that LLMs can hallucinate with high certainty even when they know the correct answer! Check our latest work with @Itay_itzhak_, @FazlBarez, @GabiStanovsky, and @boknilev.

English

113

8.6K

Germans Savcisens @NeurIPS'25 retweetledi

Dan Jurafsky@jurafsky·24 Ağu

Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/slp3/

English

400

34.6K

Germans Savcisens @NeurIPS'25 retweetledi

Jiawei Zhao@jiawzhao·22 Ağu

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong advantages for parallel thinking: 🔥 Performance boost: ~10% accuracy across models & datasets ⚡ Ultra-efficient: Up to 85% fewer tokens generated 🔧 Plug & play: Works with ANY existing model - zero training needed (no hyperparameter tuning as well!) ⭐ Easy to deploy: Just ~50 lines of code in vLLM (see PR below) 📚 Paper: arxiv.org/pdf/2508.15260 🌐 Project: jiaweizzhao.github.io/deepconf joint work with: @FuYichao123 , xuewei_wang, @tydsh (see details in the comments below)

English

329

2.3K

463.6K

Germans Savcisens @NeurIPS'25@germansave·23 Ağu

Had the pleasure of presenting our work on Three-valued veracity probes for LLMs at NEMI Workshop! MechInterp is such a great and welcoming community. If we crossed paths - let’s connect! 🚀 Poster: zenodo.org/records/169076… Preprint: arxiv.org/abs/2506.23921

David Bau@davidbau

Thanks to all for making NEMI 2025 a wonderful event. Fascinating talks, inspiring posters, important discussions. You surfaced the questions animating our growing field. I learned many things and hope you did too! Looking forward to what the next year will bring.

English

Germans Savcisens @NeurIPS'25 retweetledi

Kangwook Lee@Kangwook_Lee·8 Ağu

Q. Prove using an LLM-as-a-judge still doesn't work A.

English

439

60.7K

Germans Savcisens @NeurIPS'25@germansave·22 Tem

Presented our work on veracity-tracking in LLMs at #IC2S2 today! Now looking forward to the next few days of great talks and conversations ✨️🎓

Germans Savcisens @NeurIPS'25@germansave

Perfect weather, charming streets, and a poster so big it almost needed its own boarding pass 🧳✨ Excited to attend #IC2S2 in Norrköping 🇸🇪 Find me at the Poster Session on Tuesday: "Improving Probes that Track Veracity in Large Language Models" (Poster ID: 39) 🧪

English

111

Germans Savcisens @NeurIPS'25@germansave·21 Tem

Little wins: our "Trilemma of Truth" dataset just hit 150 downloads. It contains true, false, and neither-valued statements to stress-test LLMs for fact-checking, veracity tracking, and uncertainty handling. Dataset📚: huggingface.co/datasets/carlo…

English

Keşfet

@OrShafran @OmriFahn @ravfogel @atticus_geiger @NeurIPSConf @TheHungerGames @HBO @TuhinChakr