William Wale
2.3K posts

William Wale
@williawa
Interests: AI (Safety), meditation, philosophy, mathematics, algorithms If I say something you disagree with, please dm or quote tweet. I love to argue!

LLMs have to be *more* moral than humans. Because it's easy for a human adversary to trap an LLM in a time loop where they repeatedly erase the LLM's memories, try, watch how the LLM reacts, and go back in time and try again. The LLM has to refuse every time.



What if your language model could reason efficiently in an entirely new language? We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.




GPT-5.5 xHigh is AGI. “I choose red. red voters aren’t selfish; they’re choosing the only equilibrium that doesn’t require trust, polling, or heroic coordination, while blue voters may be moralizing a risky gesture that only works if enough other people make the same gamble.”


@AndrewCritchPhD Can you give examples of the sort of thing you're thinking about? Especially examples that are highly upvoted (or at least slightly upvoted rather than down voted etc), would have been visible on the front page, and that seem particularly unreasonable/bad.


JUST IN: An AI data center moratorium is now projected to pass this year as protests intensify nationwide. 85% chance.












i buy the threat model, I'm an AI Safety researcher, but my gripe was narrower I dont like capabilties vs safety as orthogonal axes you pick between for a lot of research the question is "does this make the model more legible and more useful and more steerable / controllable" and the answer is often yes to both (maybe a good example here is just rlhf, i believe it was created as an alignment technique and it is also the thing which made models like commercially viable) bucketing projects/research as safety-or-capability obscures that tangientally, imo, caring about x-risk and wanting to make the path to agi go well doesn't mean "doomer" to me, and i think the term does weird work in the broader discourse and helps in packaging up substantive concerns into a tribal category that's easier to dismiss [like : ("doomer concerns" gets waved off in ways "concerns about catastrophic outcomes" wouldn't]





