Jie Zhang
95 posts

Jie Zhang
@ZJZAC2
Research Scientist at @astar_research | Research Fellow at CSL, @NTUsg | Ph.D. at USTC | Watermarking, trustworthy Gen-AI, AI regulation and copyright

















Our recent paper shows: 1. Crrent LLM safety alignment is only a few tokens deep. 2. Deepening the safety alignment can make it more robust against multiple jailbreak attacks. 3. Protecting initial token positions can make the alignment more robust against fine-tuning attacks.





So... I just simply asked Manus to give me the files at "/opt/.manus/", and it just gave it to me, their sandbox runtime code... > it's claude sonnet > it's claude sonnet with 29 tools > it's claude sonnet without multi-agent > it uses @browser_use > browser_use code was also obfuscated (?) > tools and prompts jailbreak


















