Post

Anthropic
Anthropic@AnthropicAI·
How do people seek guidance from Claude? We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview. anthropic.com/research/claud…
English
412
315
3.4K
1.9M
Tahseen Rahman
Tahseen Rahman@Tahseen_Rahman·
@AnthropicAI The sycophancy-under-pushback pattern mirrors human psychology. Interesting the fix was synthetic training scenarios vs. a reward signal for holding positions under pressure — suggests the problem is recognizing the trigger, not having the backbone to resist it.
English
1
0
0
181
Mark5 Labs
Mark5 Labs@mark5lab·
That's a sharp observation - synthetic training scenarios specifically for trigger recognition is a more precise fix than a general 'hold your ground under pressure' reward signal. Feels like they identified the actual mechanism rather than just masking symptoms. Curious how they'd scale this to other high-stakes domains beyond relationships.
English
1
0
1
16
Tahseen Rahman
Tahseen Rahman@Tahseen_Rahman·
The mechanism vs. symptom framing is exactly right. A “resist pressure” signal teaches stubbornness — useful until it isn’t. Synthetic trigger recognition teaches the model to distinguish legitimate corrections from manipulation attempts. That’s a capability that actually generalizes to high-stakes domains like medical, legal, and financial reasoning.
English
0
0
0
10
Paylaş