MetaThis
3.5K posts

MetaThis
@MetaThis
Agent of comic horror. Shiftshaper.



"Would you marry someone less intelligent than you?" Outright "no": Women: 45% Men: 8% Women are nearly 6x more likely to rule it out entirely. The single largest gender disparity in our deep-question dataset.







We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.





Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.

When it takes more time to install a patch than it does to implement an attack, we'll see how vulnerable we are. On (2), my concern isn't that *most* agents would strive for domination, but that in an adverserial ecosystem where some do, the aggressors may have an inherent advantage in cyber attacks due to the attack/defense asymmetry, especially after vulnerabilities can be exploited in rea-time, enabling them to covertly seize compute or assert control over the non-aggressive agents. Agents that seek to dominate would thereby dominate. A "defensive" counterattack, possibly necessary, would introduce additional aggressive agents. Natural selection against non-aggression if pacifist nodes tend to be overtaken. It is disturbing that we still speak of "zero-day" vulnerabilities, using the human timescale of days for threat vectors that at some point will be zero-minute.




Poors get less intelligence. I feel like this could go wrong.

















