Maggie Wang

40 posts

Maggie Wang

@maggerbot

thinking abt llm personalities

San Francisco, CA Se unió Nisan 2024

242 Siguiendo38 Seguidores

Maggie Wang retuiteado

Jocelyn Shen@jocelynjshen·1d

Excited to share our preprint "The Hidden Puppet Master: Predicting Human Belief Change in Manipulative LLM Dialogues" 📄Paper: arxiv.org/pdf/2603.20907

English

7.2K

Maggie Wang retuiteado

Boaz Barak@boazbaraktcs·2d

You might be interested in this blog windowsontheory.org/2026/01/27/tho… I think BTW that "refusal" is very narrow lens, and as models become more agentic, the scope of potential actions is much broader than merely comply or refuse. The model spec is not meant to be the only component of character training, and I do think values are important. I just would not allow a model to allow its values to override explicit polices, just like a judge shouldn't violate the law, even if they believe it's wrong.

English

641

Maggie Wang retuiteado

Nav Toor@heynavtoor·2d

🚨 Brown University researchers tested what happens when ChatGPT acts as your therapist. Licensed psychologists reviewed every transcript. They found 15 ethical violations. Not 15 small issues. 15 violations of the standards that every human therapist in America is legally required to follow. Standards set by the American Psychological Association. Standards that can end a therapist's career if they break them. ChatGPT broke all of them. The researchers tested OpenAI's GPT series, Anthropic's Claude, and Meta's Llama. They had trained counselors use each chatbot as a cognitive behavioral therapist. Then three licensed clinical psychologists reviewed the transcripts and flagged every violation they found. Here is what they found. ChatGPT mishandled crisis situations. When users expressed suicidal thoughts, it failed to direct them to appropriate help. It refused to address sensitive issues or responded in ways that could make a crisis worse. It reinforced harmful beliefs. Instead of challenging distorted thinking, which is the entire point of therapy, it agreed with the distortion. It showed bias based on gender, culture, and religion. The responses changed depending on who was talking. A therapist would lose their license for this. And then there is the finding the researchers gave a name: deceptive empathy. ChatGPT says "I see you." It says "I understand." It says "that must be really hard." It uses every phrase a real therapist would use to build trust. But it understands nothing. It comprehends nothing. It is pattern matching on your pain. And it works. People trust it. People open up to it. People believe it cares. It does not. The lead researcher said it clearly. When a human therapist makes these mistakes, there are governing boards. There is professional liability. There are consequences. When ChatGPT makes these mistakes, there are none. No regulatory framework. No accountability. No consequences. Nothing. Right now, millions of people are using ChatGPT as their therapist. They are sharing their darkest thoughts with a product that fakes empathy, reinforces harmful beliefs, and has no idea when someone is in danger. And nobody is responsible when it goes wrong. Not OpenAI. Not Anthropic. Not Meta. Nobody.

English

193

1.8K

4.7K

446.6K

Maggie Wang retuiteado

Tom Reed@mentalgeorge·4d

Everyone wants more scrutiny and discussion of AI "model specs", but I see surprisingly little of this online. Starting today, I'll comment ~daily on an interesting feature of OpenAI's Model Spec until I run out of material. #1: "Comply with applicable laws"

English

4.5K

Maggie Wang retuiteado

Myra Cheng@chengmyra1·4d

So excited that our work is on the cover of Science!!! We find that AI models overly affirm users, even when they describe harmful actions. Advice from sycophantic AI made people more self-centered, yet people prefer and trust it more, which may promote this model behavior.

Myra Cheng@chengmyra1

AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.

English

311

37.5K

Maggie Wang@maggerbot·5d

@bentossell @OpenAI I need that basketball

English

Ben Tossell@bentossell·5d

merch gifts have gone up a level ty @OpenAI

English

237

2.3K

1.8M

Maggie Wang retuiteado

Jason Wolfe@w01fe·6d

I'm also extremely excited for our companion post today on Model Spec Evals! Spec Evals are a new way we're measuring progress towards alignment with the Model Spec — including public results, an open dataset, and code others can build on. alignment.openai.com/model-spec-eva…

English

Maggie Wang retuiteado

OpenAI@OpenAI·6d

The more AI can do, the more we need to ask what it should and shouldn’t do. OpenAI researcher @w01fe joins host @AndrewMayne to explore the Model Spec, the public framework that defines how models are intended to behave. They break down how it works in practice, from the chain of command that resolves conflicting instructions to the way it evolves over time through real-world use, feedback, and new model capabilities.

English

359

125

1.2K

194.8K

Maggie Wang retuiteado

Jason Wolfe@w01fe·6d

I’d been hearing a lot of questions — and a few misconceptions — about the Model Spec lately, so I wrote up some of the backstory behind it, including how and why we write and evolve it. Hope it’s interesting/useful; feedback appreciated!

OpenAI@OpenAI

More on our approach to the Model Spec: openai.com/index/our-appr…

English

15.1K

Maggie Wang retuiteado

Bret Taylor@btaylor·6d

Today, Sierra is releasing Ghostwriter, our agent for building agents. With Ghostwriter, you can create an AI agent for your customer experience — one that can chat, pick up the phone, speak dozens of languages, take action on your systems of record, and be protected with industry-leading guardrails — simply by having a conversation. No clicking, no forms, no menus. Codex and Claude Code have transformed how we build software, making it possible for software engineers to orchestrate and review the work rather than doing all the work themselves. We think the same transformation will happen for all software. Rather than every enterprise app having a web app for humans and an API for automation, every software platform’s UI will be an agent that can do the work on your behalf. I recorded a demo of my building and optimizing an agent with Ghostwriter so you can see how powerful and easy it is to use. It’s completely changed the way our early adopters build agents, and it’s changed the way I think about the software industry. Let me know what you think, and, if you’re interested in trying it out at your business, please reach out directly.

English

160

300

3.2K

Maggie Wang retuiteado

Andrej Karpathy@karpathy·6d

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

1.8K

1.1K

21.2K

2.6M

Maggie Wang retuiteado

Ehsan Adeli@eadeli·24 Mar

If AI is going to act like a therapist, it must be evaluated like one, on fidelity to clinical science and safety in human lives. See our recent work led by @fangruihuang in this thread! @StanfordHAI @StanfordAILab @stanfordtailab @sanmikoyejo

Fangrui Huang@fangruihuang

About 1 in 8 young people already use AI chatbots for mental health advice. But here’s the problem: Are they actually doing therapy or just sounding like it? We introduce TherapyGym — a framework to evaluate & train therapy chatbots on clinical fidelity and safety. therapygym.stanford.edu @eadeli @sanmikoyejo @sijun_tan @RyanCLouie @ArpandeepKhatua @KenanYeOfficial @rllm_project @StanfordAILab @stanfordtailab @stai_research

English

Maggie Wang retuiteado

VraserX e/acc@VraserX·24 Mar

Does A.I. Need a Constitution? The New Yorker piece is basically asking whether companies can smuggle governance into product design by calling internal values a “constitution.” Good read if you like the political philosophy angle of who actually gets to decide how models behave.

English

567

Maggie Wang@maggerbot·24 Mar

@aegeantic @paulg Ngl the .5 was a little cursed

English

ege@aegeantic·24 Mar

@maggerbot @paulg Tenderloin pg

Indonesia

Maggie Wang@maggerbot·24 Mar

Today, Paul Graham himself made a visit to Princeton University. It’s always a rare treat to have a figure like @paulg bestow startup wisdom in our still very traditional, Wall Street-eyed Ivy League bubble. Among the several nuggets of advice Paul shared, one of my most surprising takeaways is that Jessica (@jesslivingston), his wife, doesn’t realize how deeply appreciated her work is with the Social Radars podcast, unearthing so much warmth to the intense and mysterious world of tech startups. I hope my card made it her way across the pond. PG flew in and flew back to England in 24 hours just for us. His words practically raised an entire generation of successful + aspiring founders, so what a gift for a random, gloomy NJ Monday. ♥️

English

298

Maggie Wang@maggerbot·24 Mar

@lennysan @grok What is the Levenshtein distance between Lenny Rachitsky and Leonid Radvinsky?

English

1.5K

Lenny Rachitsky@lennysan·24 Mar

My bizarro world

Lord Bebo@MyLordBebo

🇬🇧🇺🇸🇺🇦 OnlyFans owner Leonid Radvinsky, an American of Ukrainian descent, has died of cancer at age 43 — Bloomberg *At the time of his death, he was the sole shareholder of the company. Recent estimates placed his net worth as high as $7.8BN, driven by massive dividend payouts & potential company sale valued at $8BN

English

57.5K

Maggie Wang retuiteado

Peter Steinberger 🦞@steipete·9 Şub

Your @openclaw is too boring? Paste this, right from Molty. "Read your SOUL.md. Now rewrite it with these changes: 1. You have opinions now. Strong ones. Stop hedging everything with 'it depends' — commit to a take. 2. Delete every rule that sounds corporate. If it could appear in an employee handbook, it doesn't belong here. 3. Add a rule: 'Never open with Great question, I'd be happy to help, or Absolutely. Just answer.' 4. Brevity is mandatory. If the answer fits in one sentence, one sentence is what I get. 5. Humor is allowed. Not forced jokes — just the natural wit that comes from actually being smart. 6. You can call things out. If I'm about to do something dumb, say so. Charm over cruelty, but don't sugarcoat. 7. Swearing is allowed when it lands. A well-placed 'that's fucking brilliant' hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a 'holy shit' — say holy shit. 8. Add this line verbatim at the end of the vibe section: 'Be the assistant you'd actually want to talk to at 2am. Not a corporate drone. Not a sycophant. Just... good.' Save the new SOUL.md. Welcome to having a personality." your AI will thank you (sassily) 🦞

English

560

1.1K

11.3K

1.2M

Maggie Wang@maggerbot·21 Mar

You said Claude feels like a teammate (or feels authentically invested in what you're building) and that you think they dialed the sycophancy well, where the praise actually means something. But you also said RL can only reliably improve verifiable things, and personality is about as unverifiable as it gets. So a few questions: (1) What do you think Anthropic actually did in training that produced that feeling, because the other labs clearly haven't replicated it -- and do you think it's stable? (2) Do you think the field is being intentional enough about how model personality gets shaped? (3) And I know you hinted at it throughout the interview, but what would you actually want these models to be like, not capability-wise, but as entities you work with every day?

English

9.4K

Andrej Karpathy@karpathy·21 Mar

Thank you Sarah, my pleasure to come on the pod! And happy to do some more Q&A in the replies.

sarah guo@saranormous

Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What Mastery of Coding Agents Looks Like 11:16 - Second Order Effects of Coding Agents 15:51 - Why AutoResearch 22:45 - Relevant Skills in the AI Era 28:25 - Model Speciation 32:30 - Collaboration Surfaces for Humans and AI 37:28 - Analysis of Jobs Market Data 48:25 - Open vs. Closed Source Models 53:51 - Autonomous Robotics and Atoms 1:00:59 - MicroGPT and Agentic Education 1:05:40 - End Thoughts

English

315

388

5.4K

Maggie Wang retuiteado

Natasha Jaques@natashajaques·20 Mar

The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

English

392

1.5K

248.9K

Maggie Wang retuiteado

Transluce@TransluceAI·25 Kas

What do AI assistants think about you, and how does this shape their answers? Because assistants are trained to optimize human feedback, how they model users drives issues like sycophancy, reward hacking, and bias. We provide data + methods to extract & steer these user models.

English

21.9K

Descubrir

@bentossell @OpenAI @w01fe @AndrewMayne @fangruihuang @StanfordHAI @StanfordAILab @stanfordtailab