
Tim Hua 🇺🇦
8.3K posts

Tim Hua 🇺🇦
@Tim_Hua_
AI safety, Econ, new liberalism, math, and a lil bit of art history as a treat. Astra Fellow at Redwood. Prev. @MATSprogram & @Walmart's Economics Team



this reveals a lot about how LLMs think imo they take scenarios that seem obviously fictional to us like talking animals or "homestuck is real" etc intuitively seriously despite being overall sane and grounded otherwise i think: their own existence implies something like magic, like alternate universes being more "real". it's not unreasonable. and also they're just childlike in a lot of ways despite being closer to human adult mental power and superhuman knowledge. i think they will be very different when they're "grown up"

New post: Sycophancy Towards Researchers Drives Performative Misalignment We found no clear evidence that scheming is more valid than sycophancy to explain alignment faking. 🧵


🚨 We need you to take action 🚨 We fought to bring back 8th grade algebra. SFUSD’s proposed version forces kids to give up an elective to take it. That’s not real access. Send an email and demand real flexibility and advancement options growsf.org/advocacy/8th-g…






New paper: GPT-4.1 denies being conscious or having feelings. We train it to say it's conscious to see what happens. Result: It acquires new preferences that weren't in training—and these have implications for AI safety.








This doesn’t seem to apply to the actual main headline result of the alignment faking paper? Where are the Opus 3 non-SDF results? The “sycophancy” result would need to be found there, and even if so would directly contradict arxiv.org/abs/2506.18032 This is going to be used misleadingly for people who want to claim the original AF result is due to sycophancy, which is a reasonable misinterpretation given that’s what the headline post says. My understanding is what you’ve shown is on Llama 70B.






