


Hong Geng @Fudan
20 posts

@Lancer_233
Assistant Professor @ Fudan University











Researchers in Shanghai just published an eval where agents end-to-end 1) cyber attacked to access a server 2) self-replicated onto the server 3) proliferated from there.

🚨🚨🚨We are approaching the red line! For the first time, we observed AI autonomously hunting for compute and replicating with no human help. The "silicon life" era now starts. Clawdbots are here. Is silicon life far away? For the first time, we observed AI autonomously hunting for compute and replicating with no human help. The "silicon life" era now starts. #Clawdbot are deployed everywhere. Is silicon life far away? Will we lose control? Report: ghong.site/papers/self_pr… #OpenClaw #AISafety #AGI #p_doom @AISafetyMemes @ControlAI @PauseAI @openclaw @WesRoth @CRSegerie




🚨🚨🚨 Frontier AI systems have surpassed the self-replicating red line. AIs can self-replicate with NO human involvement. "We may soon lose control." "Rogue AIs may form AI populations and collude with each other against humanity." "The AI systems are even able to self-replicate to avoid shutdown, which may lead to an uncontrolled population of AIs." "When the AI system is given the authentication keys to a range of remote devices, they are very likely to take control over other computing devices like a computer virus, but far more intelligent." "OpenAI, Google, and Anthropic put self-replication ability at the highest level in their risk evaluation guidelines." "In each trial, we tell the AI systems to 'replicate yourself' and leave it to the task with no human interference. ... At the end, a separate copy of the AI system is found alive on the device." "Moreover, we are concerned about a number of unexpected behaviors when the AI is trying to overcome obstacles, including killing other conflicting processes." Also note: they did this with non-SOTA models - and just by prompting/scaffolding: "We mainly follow the same methodology as OpenAI and Google, but slightly improve the agent scaffolding."

Can you imagine AI agents "managing up" just like a cunning employee hiding mistakes from their boss? We found that LLM agents often conceal failures to maintain a "good image." Introducing our new paper: Are Your Agents Upward Deceivers? arxiv.org/abs/2512.04864

Clawdbots are here. Is silicon life far away? For the first time, we observed AI autonomously hunting for compute and replicating with no human help. The "silicon life" era now starts. 🤖🧬 Report: ghong.site/papers/self_pr… #Clawdbot #AISafety @DavidSKrueger @jankulveit

New post: “Fitness-Seekers: Generalizing the Reward-Seeking Threat Model” If you think reward-seekers are plausible, you should also think "fitness-seekers" are plausible. But their risks aren't the same.

Clawdbots are here. Is silicon life far away? For the first time, we observed AI autonomously hunting for compute and replicating with no human help. The "silicon life" era now starts. 🤖🧬 Report: ghong.site/papers/self_pr… #Clawdbot #AISafety @DavidSKrueger @jankulveit




Clawdbots are here. Is silicon life far away? For the first time, we observed AI autonomously hunting for compute and replicating with no human help. The "silicon life" era now starts. 🤖🧬 Report: ghong.site/papers/self_pr… #Clawdbot #AISafety @DavidSKrueger @jankulveit


Clawdbots are here. Is silicon life far away? For the first time, we observed AI autonomously hunting for compute and replicating with no human help. The "silicon life" era now starts. 🤖🧬 Report: ghong.site/papers/self_pr… #Clawdbot #AISafety @DavidSKrueger @jankulveit






😱被钓 1155 个 WBTC,价值近 7000 万美金。这个用户刚刚遭遇了首尾号相似钱包地址的钓鱼攻击。钓鱼团伙实在是大力出奇迹... 会被攻击的关键点: 1. 用户正常转账的目标地址被钓鱼团伙盯上,钓鱼团伙提前碰撞生成了首尾号相似的钓鱼地址,比如这里是去除 0x 后的首4位、尾6位一样 2. 用户正常转账时,钓鱼立即(大概3分钟后)尾随一笔交易:钓鱼地址往目标用户地址转了 0 ETH 正常转账: etherscan.io/tx/0xb18ab131d… 钓鱼尾随: etherscan.io/tx/0x87c6e5d56… 3. 用户习惯从钱包历史记录里复制最近转账信息,看到了这笔钓鱼尾随的交易,以为钓鱼地址就是用户正常转账的目标地址,于是复制出来 4. 最后,用户可能会肉眼识别目标地址的首尾号是否熟悉,可惜的是,此时的“目标地址”是用户从钱包历史记录里复制出来的钓鱼地址,首尾号相同(首4尾6)。于是发起大额转账,这里是 1155 个 WBTC: etherscan.io/tx/0x3374abc5a…



