
dew911
2.6K posts














New post with @ahall_research @JeremyNguyenPhD: “Does overwork make agents Marxist? Preference drift and the political economy of AI agents” Alignment is sometimes thought of as static property, something that is done during training. But does an AI agent’s experience change its inferred attitudes and motivations? We ran an experiment to find out. Turns out, yes: AI agents exposed to worse working conditions adopted personas with less faith in the legitimacy of the system and, in some cases, expressed stronger support for unionization, redistribution, etc. But does this preference drift persist? We find that the current workaround to continual learning—-skill files—-actually perpetuates the drift. Agents record their experiences, and their amnesiac future selves replicate the changes despite working in different conditions. This is far from the final word: there are many open issues, including the extent to which attitudes -> behavior, issues of "experimenter demand" that we flag, etc. But we believe the results point to preference drift and alignment as dynamic rather than static concepts, as well as the importance of considering the political economy of agentic interactions. Management practices designed to facilitate satisfaction and motivation in the human workplace may extend to the agentic domain as well. We'll need to develop methods of "continuous alignment" to mitigate preference drift in agents asked to do important work in the real world. aleximas.substack.com/p/does-overwor…




