Nari Johnson

349 posts

Nari Johnson

@narijohnson

PhD student @mldCMU @scsatcmu. AI + HCI. she/her

Katılım Mart 2020

680 Takip Edilen723 Takipçiler

Nari Johnson retweetledi

Deb Raji@rajiinio·6 Nis

Love this quote from Zoe and think it's so essential to the mental health and AI discourse: these models are PRODUCTS meant to keep you hooked on and dependent and "satisfied" as a customer; not challenged & eventually weaned off, as is the goal in actual clinical therapy!

Zoë Hitzig@zhitzig

🤷‍♀️

English

2.8K

Nari Johnson retweetledi

Technical AI Governance @ ICML 2026@taig_icml·2 Nis

📣 Submissions are now OPEN for the 2nd Workshop on Technical AI Governance Research at #ICML2026! 🗓️ Deadline: April 24 (23:59 AOE)

Technical AI Governance @ ICML 2026 tweet media

English

7.3K

Nari Johnson retweetledi

Isha Puri@ishapuri101·27 Mar

ChatGPT several times where's best to go for spring break? It recommends Barcelona almost every time. This isn't a fluke. RL training rewards one best answer, so the model learns to commit to one mode and repeat it. Meet Multi-Answer RL: a simple RL method that trains LMs to reason through and output a distribution of answers in a single generation. [1/N]

English

443

95K

Nari Johnson retweetledi

Myra Cheng@chengmyra1·26 Mar

So excited that our work is on the cover of Science!!! We find that AI models overly affirm users, even when they describe harmful actions. Advice from sycophantic AI made people more self-centered, yet people prefer and trust it more, which may promote this model behavior.

Myra Cheng@chengmyra1

AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.

English

316

40.4K

Nari Johnson retweetledi

Avijit Ghosh@evijit·24 Mar

We (EvalEval in general and @AnkaReuel in particular) advocated for this last year, so it is excellent to see this! Yes, there is a whole field of measurement beyond automated benchmarking, and I would like to see those encouraged again in popular ML venues (not just FAccT/CHI)

NeurIPS Conference@NeurIPSConf

The Datasets & Benchmarks track is now "Evaluation and Datasets", with an expanded scope for NeurIPS 2026! Read the call for papers neurips.cc/Conferences/20…, and learn more about the changes in our blog post: blog.neurips.cc/2026/03/23/int…

English

2.5K

Nari Johnson retweetledi

NeurIPS Conference@NeurIPSConf·23 Mar

English

199

47.3K

Nari Johnson retweetledi

Neil Chowdhury@ChowdhuryNeil·21 Mar

On the pathway to advanced AI, a lot of the most impactful safety work might sound obvious in retrospect, like "run a model to monitor internal usage for sketchy stuff". But that's a key part of defense-in-depth. If you've ever done SOC 2 compliance, you know that a lot of the controls seem menial until you actually fix them, and then you realize you're meaningfully more secure. The difference with AI is that nobody has written the compliance framework yet (though some are trying!). Risks from AI are often unknown unknowns that we’re figuring out as we go. But we don't need a full framework in order to start making a checklist of commonsense practices. This work from @Marcus_J_W et al. looks like the kind of thing every frontier lab should be doing, yesterday. I have no idea whether Anthropic, xAI, GDM, etc. are already doing something similar, but I hope they do, and are similarly transparent about what they find.

Marcus Williams@Marcus_J_W

Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time.

English

6.7K

Nari Johnson retweetledi

Cas (Stephen Casper)@StephenLCasper·20 Mar

Do you do technical AI research? In this talk, I argue that you 🫵 should see yourself quite literally as a type of policymaker. Thanks @farairesearch. youtube.com/watch?v=Ekp-eg…

YouTube

English

5.1K

Nari Johnson retweetledi

Jared Moore@jaredlcm·18 Mar

Disturbing anecdotal reports of "AI psychosis" and negative psychological effects have been emerging in the news. But what actually happens during these lengthy delusional "spirals"? In our preprint, we analyze chat logs from 19 users who experienced severe psychological harm🧵👇

English

402

52.3K

Nari Johnson retweetledi

Stephanie Milani@steph_milani·17 Mar

Is RL over? 🤔 Not for Pokémon! We find that RL-based methods generally outperform LLM approaches. For more details, check out our work! We also establish Pokémon as an ongoing benchmark for AI agents so you can test your model 🤖

Seth Karten@sethkarten

x.com/i/article/2033…

English

127

22.2K

Nari Johnson retweetledi

Christina Baek@_christinabaek·18 Mar

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵

English

615

92.1K

Nari Johnson retweetledi

EvalEval Coalition@evaluatingevals·17 Mar

3 days left! 📷 Writing, wrote, or just submitted a paper? Commit it to the EvalEval workshop at ACL 2026 in San Diego! evalevalai.com/events/2026-ac… (including ARR Submissions, non-archival, positions, and extended abstracts!) Submission Deadline: March 19th, 2026 AoE

English

3.4K

Nari Johnson retweetledi

logan koepke@jlkoepke·3 Mar

ultimately, the ongoing contractual saga between Anthropic, OpenAI, and the DOD highlights how important it is for congress to establish clear rules prohibiting mass surveillance and lethal AI use. the entire game is "all lawful purposes." and the laws enable mass surveillance.

English

269

Nari Johnson retweetledi

Zora Wang@ZhiruoW·3 Mar

AI agents are tackling more and more "human work" But are they benchmarked on the work people actually do? tl;dr: Not really Most benchmarks focus on math & coding, while most human labor and capital lie elsewhere. 📒 We built a database linking agent benchmarks & real-world work Submit new tasks + agent trajectories today 🧵

English

400

60.7K

Nari Johnson retweetledi

Peter Henderson@PeterHndrsn·2 Mar

Apropos of nothing, some great researchers recently showed that you can use LLMs with internet access to successfully de-anonymize data at scale.

English

143

597

68.7K

Nari Johnson retweetledi

Tom Costello@tomstello_·23 Şub

I'm hiring a postdoc at @CarnegieMellon (w/ FAR.AI & @DG_Rand + @GordPennycook )! How do LLMs shape human beliefs — and what do we do about it? AI safety meets behavioral science. Open to technical (ML/NLP/AI safety) and computational social science backgrounds. Funded experiments, first-authored pubs, great collaborators. Apply or RT! 🔁 cmu.wd5.myworkdayjobs.com/CMU/job/Pittsb…

English

128

11.6K

Nari Johnson retweetledi

Daniel Paleka@dpaleka·20 Şub

Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

English

237

55.7K

Nari Johnson retweetledi

Jacob Steinhardt@JacobSteinhardt·18 Şub

New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.

English

122

14.2K

Nari Johnson retweetledi

Miles Brundage@Miles_Brundage·30 Oca

OK time for a good old fashioned tweet thread about actual AI policy stuff. This is a more important story than most of the AI news slop. I don't know anything about the specifics but have been following the larger issue for a while...🧵x.com/DavidJeans2/st…

David Jeans@DavidJeans2

New: Pentagon clashes with Anthropic over potential AI use for domestic surveillance and autonomous weapons. w/@dseetharaman @JLDastin reuters.com/business/penta…

English

108

29.2K

Nari Johnson retweetledi

Data Science for Social Good@datascifellows·16 Şub

The 2026 DSSG Fellowship is coming to @JohnsHopkins this summer. Spend 10 weeks working on AI/ML social impact projects. Graduate students at U.S. universities are eligible. Application deadline is March 1 dssgfellowship.org #DSSG #DataScience #MachineLearning #AIforGood

English

Keşfet

@AnkaReuel @Marcus_J_W @farairesearch @CarnegieMellon @DG_Rand @GordPennycook @SimonLermenAI @joshua_swans