Sophia Hager

14 posts

Sophia Hager

@SophiaNLP

PhD Student at @jhuclsp

Katılım Aralık 2025

25 Takip Edilen16 Takipçiler

Sophia Hager@SophiaNLP·1d

Artificial uncertainty has the potential to ensure we can keep AI reliable and interpretable without worrying that calibration data will be memorized in the next wave of models. Big thanks to my collaborators, Simon Zeng and Nick Andrews! Read the paper: arxiv.org/pdf/2605.13595

English

Sophia Hager@SophiaNLP·1d

(This is not the case when introducing data uncertainty through ambiguity; while it's trivial to construct questions that the model gets wrong with data uncertainty, any improvements in calibration are much more inconsistent than inducing artificial model uncertainty.)

English

Sophia Hager@SophiaNLP·1d

Can we learn to recognize artificial uncertainty as a proxy for real uncertainty? As LLMs memorize more of the internet, they become correctly confident on almost any existing question you can throw at them. Creating new challenging calibration data is unsustainably expensive.🧵

English

404

Sophia Hager retweetledi

joshua schachter@joshu·6 May

this breaks gemini. it has been saying “mm-“ for several minutes now

Gabriele Berton@gabriberton

Really cool When they trained GPT3 they had loss spikes because they scraped from a subreddit of microwave noises That training batch was literally text like "mmmmmmmmmmmmmmm"

English

1.9K

178.8K

Sophia Hager retweetledi

Mark Dredze@mdredze·7 May

Apparently, my students have brought binoculars to their office to bird watch during the day. The advantages of our new beautiful @HopkinsDSAI office space. 🦜🦆🕊️ Should I be worried about productivity?

English

8.1K

Sophia Hager retweetledi

Rohan Jha@Robro612·5 May

New 📄: we replicate XTR, a multi-vector retrieval method that makes ColBERT faster by avoiding its expensive step of gathering full document embeddings XTR is not a free lunch over ColBERT, but its training objective is useful for modern efficient engines like PLAID and WARP 👇🏼

English

11.4K

Sophia Hager retweetledi

Drew Prinster@DrewPrinster·22 Nis

Can we ensure AI agents respect our safety constraints, even as they explore & improve? - Medical LLMs that are helpful, & avoid false claims? - Bioscience agents that generate effective molecule designs, & ensure they’re safe? 📄🧵w/ @samuel_stanton_ @clara_fannjiang @jiwoncpark @kchonyc @anqi_liu33 @suchisaria Excited to share “Conformal Policy Control” ⬇️ 1/12

GIF

English

21.6K

Sophia Hager retweetledi

Jack Jingyu Zhang@jackjingyuzhang·15 Nis

Real-world agents juggle instructions from skill files, tools, other agents, ... each with different trust levels. When these conflict, can models reliably prioritize the most trusted one? Our ManyIH-Bench🪜 finds that even frontier models like GPT-5.4 only get ~40% accuracy! 👇

English

120

11.8K

Sophia Hager@SophiaNLP·10 Nis

Amazon Science put out a blog post about RuleForge, which I helped make the evaluation/refinement mechanism for last summer in my internship!

Amazon Science@AmazonScience

Amazon built RuleForge to tackle 48,000+ new vulnerabilities. The agentic AI system generates detection rules 336% faster than human analysts. amazon.science/blog/how-amazo…

English

Sophia Hager retweetledi

arXiv Sound@ArxivSound·25 Haz

``Generating Music with Structure Using Self-Similarity as Attention,'' Sophia Hager, Kathleen Hablutzel, Katherine Kinnaird, ift.tt/FNwGvzx

English

935

Sophia Hager retweetledi

Johns Hopkins Data Science and AI Institute@HopkinsDSAI·2 Ara

Statistical Frontiers in LLMs and Foundation Models* * Poster presentation by Sophia Hager and Nicholas Andrews neurips.cc/virtual/2024/w…

English

108

Sophia Hager retweetledi

Marc Marone@ruyimarone·27 Kas

I'm on the job market and at #neurips2025! Looking for research roles around data for foundation models and would love to chat with folks - resume/site in my bio. I've recently worked @AIatMeta and @databricks and publish papers with my awesome collaborators @jhuclsp!

English

10.7K

Sophia Hager retweetledi

Andrew Wang@andrewwnlp·18 Eyl

Tools break in the real world all the time, but not much attention has been given to how well LLMs deal with tool failures. We introduce HOHW, a tool-use benchmark where problems remain solvable even when tools break adversarially.

English

2.4K

Keşfet

@HopkinsDSAI @samuel_stanton_ @clara_fannjiang @jiwoncpark @kchonyc @anqi_liu33 @suchisaria @AIatMeta