Nari Johnson

349 posts

Nari Johnson banner
Nari Johnson

Nari Johnson

@narijohnson

PhD student @mldCMU @scsatcmu. AI + HCI. she/her

Katılım Mart 2020
680 Takip Edilen723 Takipçiler
Nari Johnson retweetledi
Deb Raji
Deb Raji@rajiinio·
Love this quote from Zoe and think it's so essential to the mental health and AI discourse: these models are PRODUCTS meant to keep you hooked on and dependent and "satisfied" as a customer; not challenged & eventually weaned off, as is the goal in actual clinical therapy!
Zoë Hitzig@zhitzig

🤷‍♀️

English
2
3
43
2.8K
Nari Johnson retweetledi
Technical AI Governance @ ICML 2026
📣 Submissions are now OPEN for the 2nd Workshop on Technical AI Governance Research at #ICML2026! 🗓️ Deadline: April 24 (23:59 AOE)
Technical AI Governance @ ICML 2026 tweet media
English
1
20
89
7.3K
Nari Johnson retweetledi
Isha Puri
Isha Puri@ishapuri101·
ChatGPT several times where's best to go for spring break? It recommends Barcelona almost every time. This isn't a fluke. RL training rewards one best answer, so the model learns to commit to one mode and repeat it. Meet Multi-Answer RL: a simple RL method that trains LMs to reason through and output a distribution of answers in a single generation. [1/N]
Isha Puri tweet media
English
22
73
443
95K
Nari Johnson retweetledi
Myra Cheng
Myra Cheng@chengmyra1·
So excited that our work is on the cover of Science!!! We find that AI models overly affirm users, even when they describe harmful actions. Advice from sycophantic AI made people more self-centered, yet people prefer and trust it more, which may promote this model behavior.
Myra Cheng tweet media
Myra Cheng@chengmyra1

AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.

English
10
77
316
40.4K
Nari Johnson retweetledi
Avijit Ghosh
Avijit Ghosh@evijit·
We (EvalEval in general and @AnkaReuel in particular) advocated for this last year, so it is excellent to see this! Yes, there is a whole field of measurement beyond automated benchmarking, and I would like to see those encouraged again in popular ML venues (not just FAccT/CHI)
NeurIPS Conference@NeurIPSConf

The Datasets & Benchmarks track is now "Evaluation and Datasets", with an expanded scope for NeurIPS 2026! Read the call for papers neurips.cc/Conferences/20…, and learn more about the changes in our blog post: blog.neurips.cc/2026/03/23/int…

English
2
5
20
2.5K
Nari Johnson retweetledi
Neil Chowdhury
Neil Chowdhury@ChowdhuryNeil·
On the pathway to advanced AI, a lot of the most impactful safety work might sound obvious in retrospect, like "run a model to monitor internal usage for sketchy stuff". But that's a key part of defense-in-depth. If you've ever done SOC 2 compliance, you know that a lot of the controls seem menial until you actually fix them, and then you realize you're meaningfully more secure. The difference with AI is that nobody has written the compliance framework yet (though some are trying!). Risks from AI are often unknown unknowns that we’re figuring out as we go. But we don't need a full framework in order to start making a checklist of commonsense practices. This work from @Marcus_J_W et al. looks like the kind of thing every frontier lab should be doing, yesterday. I have no idea whether Anthropic, xAI, GDM, etc. are already doing something similar, but I hope they do, and are similarly transparent about what they find.
Marcus Williams@Marcus_J_W

Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time.

English
0
7
58
6.7K
Nari Johnson retweetledi
Jared Moore
Jared Moore@jaredlcm·
Disturbing anecdotal reports of "AI psychosis" and negative psychological effects have been emerging in the news. But what actually happens during these lengthy delusional "spirals"? In our preprint, we analyze chat logs from 19 users who experienced severe psychological harm🧵👇
English
24
84
402
52.3K
Nari Johnson retweetledi
Stephanie Milani
Stephanie Milani@steph_milani·
Is RL over? 🤔 Not for Pokémon! We find that RL-based methods generally outperform LLM approaches. For more details, check out our work! We also establish Pokémon as an ongoing benchmark for AI agents so you can test your model 🤖
Seth Karten@sethkarten

x.com/i/article/2033…

English
1
14
127
22.2K
Nari Johnson retweetledi
Christina Baek
Christina Baek@_christinabaek·
Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵
Christina Baek tweet media
English
19
81
615
92.1K
Nari Johnson retweetledi
EvalEval Coalition
EvalEval Coalition@evaluatingevals·
3 days left! 📷 Writing, wrote, or just submitted a paper? Commit it to the EvalEval workshop at ACL 2026 in San Diego! evalevalai.com/events/2026-ac… (including ARR Submissions, non-archival, positions, and extended abstracts!) Submission Deadline: March 19th, 2026 AoE
English
0
6
9
3.4K
Nari Johnson retweetledi
logan koepke
logan koepke@jlkoepke·
ultimately, the ongoing contractual saga between Anthropic, OpenAI, and the DOD highlights how important it is for congress to establish clear rules prohibiting mass surveillance and lethal AI use. the entire game is "all lawful purposes." and the laws enable mass surveillance.
English
1
1
5
269
Nari Johnson retweetledi
Zora Wang
Zora Wang@ZhiruoW·
AI agents are tackling more and more "human work" But are they benchmarked on the work people actually do? tl;dr: Not really Most benchmarks focus on math & coding, while most human labor and capital lie elsewhere. 📒 We built a database linking agent benchmarks & real-world work Submit new tasks + agent trajectories today 🧵
Zora Wang tweet media
English
21
79
400
60.7K
Nari Johnson retweetledi
Peter Henderson
Peter Henderson@PeterHndrsn·
Apropos of nothing, some great researchers recently showed that you can use LLMs with internet access to successfully de-anonymize data at scale.
Peter Henderson tweet media
English
14
143
597
68.7K
Nari Johnson retweetledi
Daniel Paleka
Daniel Paleka@dpaleka·
Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵
Daniel Paleka tweet media
English
9
44
237
55.7K
Nari Johnson retweetledi
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
Jacob Steinhardt tweet media
English
4
30
122
14.2K
Nari Johnson retweetledi
Miles Brundage
Miles Brundage@Miles_Brundage·
OK time for a good old fashioned tweet thread about actual AI policy stuff. This is a more important story than most of the AI news slop. I don't know anything about the specifics but have been following the larger issue for a while...🧵x.com/DavidJeans2/st…
David Jeans@DavidJeans2

New: Pentagon clashes with Anthropic over potential AI use for domestic surveillance and autonomous weapons. w/@dseetharaman @JLDastin reuters.com/business/penta…

English
3
16
108
29.2K