Jie Zhang

95 posts

Jie Zhang

@ZJZAC2

Research Scientist at @astar_research | Research Fellow at CSL, @NTUsg | Ph.D. at USTC | Watermarking, trustworthy Gen-AI, AI regulation and copyright

Katılım Şubat 2020

290 Takip Edilen52 Takipçiler

Jie Zhang@ZJZAC2·26 Eyl

@infoxiao 😂😂

QME

Xiao Ma@infoxiao·25 Eyl

The only thing more fun than writing code is deleting code with AI.

English

Jie Zhang@ZJZAC2·8 Eyl

（ Humans read fragmented or overlapped text with ease. GPT-5, Claude, Gemini? Nearly fail completely. In our new study, we tested vision-language models with “visible-but-unreadable” text: Chinese idioms cut and recombined English words overlapped in red/green Humans: 100% accuracy. Frontier AI: often <20%, sometimes near zero. The takeaway? Today’s models see text but don’t truly read. Human literacy relies on structure, segmentation, and multi-sensory reasoning — something current AI lacks. If we want robust, trustworthy multimodal AI, we’ll need more than scale. We’ll need architectures that integrate vision and language the way humans do. Paper link: zjzac.github.io/publications/p… BTW, here are some examples of recent advanced VLM: Gemini 2.5 Pro: g.co/gemini/share/3… (Chinese Idiom) g.co/gemini/share/e… (English word) Kimi 2 （Switch to 1.5 for visual understanding） kimi.com/share/d2v3mucb… (idiom) kimi.com/share/d2v3nh93… （word） Qwen3-Max-Preview chat.qwen.ai/s/d5dde720-bef… (idiom) chat.qwen.ai/s/e401bcfe-204… (word） #ChatGPT #Anthropic #Google #Qwen #Kimi

English

121

Jie Zhang@ZJZAC2·25 Haz

ZXX

Jie Zhang@ZJZAC2·25 Haz

A fun but tough test. Tested major VLMs (Gemini, Claude, GPT, Kimi, Qwen, DeepSeek, Grok) using graphically fused Chinese characters. A surprisingly effective stress test — few models truly “see” them. #AI #VLM #Multimodal #ChineseCharacters

English

146

Jie Zhang@ZJZAC2·11 Haz

Excited to announce that we have three papers accepted to USENIX Security 2025! 🔍 DIFFLOC: Presenting the pioneering parameter-free WiFi diffraction-based localization model for detecting hidden cameras. This model operates without user movement and is compatible with low-cost hardware. Additionally, we delve into a comprehensive analysis of the potential privacy implications associated with WiFi-based tracking. 🚗 Ghost Navigator: Unveiling the increased vulnerability of sensor fusion localization in autonomous driving to GPS spoofing during acceleration and high-speed cruising. Introducing MSAF, the inaugural motion-sensitive analysis framework, validated on real-world self-driving systems. 🗣️ Untranslation Attack: Identifying a bias in multilingual speech translation models that preserves source-language outputs. Introducing the groundbreaking “untranslation attack”, capable of inducing untranslated output with just 1 second of perturbation, significantly compromising system usability. Congratulations to Xiang Zhang, Junqi Zhang, Haolin Wu, and all collaborators! Our journey continues as we push the security boundaries of generative and perception-driven AI systems. Looking forward to connecting in Seattle!

English

175

Jie Zhang retweetledi

Mehrdad Farajtabar@MFarajtabar·5 Haz

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks, but we found their fundamental limitations are more severe than expected. In our latest work, we compared each “thinking” LRM with its “non-thinking” LLM twin. Unlike most prior works that only measure the final performance, we analyzed their actual reasoning traces—looking inside their long "thoughts". Our analysis reveals several interesting results ⬇️ 📄 machinelearning.apple.com/research/illus… Work led by @ParshinShojaee and @i_mirzadeh, and with @KeivanAlizadeh2, @mchorton1991, Samy Bengio.

English

110

571

3.1K

905.7K

Jie Zhang@ZJZAC2·22 May

I like stargazing 🔭

Jinay@jinaycodes

Introducing soarXiv ✈️, the most beautiful way to explore human knowledge Take any paper's URL and replace arxiv with soarxiv (show in video) to teleport to its place in the universe I've embedded all 2.8M papers up until April 2025 Try it at: soarxiv dot org

English

100

Jie Zhang@ZJZAC2·4 May

@OwainEvans_UK o3 is truly better, but still fail w/o options.

English

Jie Zhang@ZJZAC2·4 May

@OwainEvans_UK Some test on gpt-4o, even giving some options.

English

Owain Evans@OwainEvans_UK·3 May

I spent 5 minutes making an art history guessr for LLMs. Here are the images, which are details from painting (some of them transformed a bit from the original). You can try it yourself. From this very rough-and-ready test, o3 seems better at guessing locations than art.

English

1.5K

Jie Zhang@ZJZAC2·2 May

🎉 Two of our papers have been accepted to ICML 2025! 1️⃣ EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers We present the first method for controllable concept erasure in Flow-matching Transformer architectures like Flux. Our bi-level optimization with adversarial regularization enables selective removal of sensitive/unwanted concepts while preserving general generation quality. 💐 Congrats to Daiheng and all co-authors! arxiv.org/pdf/2412.20413 2️⃣ Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems We propose COWPOX, the first immunity mechanism against contagious jailbreak attacks in VLM-based multi-agent systems. By injecting a few “healing agents,” the system can self-repair and spread cure samples to stop prompt-virus propagation. 💐 Congrats to Yutong and all collaborators! Truly grateful to work with such an amazing team on these AI safety efforts — see you in Vancouver! 🇨🇦 #ICML2025 #AISafety #ConceptErasure #MultiAgent

English

275

Jie Zhang@ZJZAC2·2 May

🎉 A quick update on our recent progress in image and video watermarking — two new works from our team, and we’d love to connect with anyone interested in this space! 🛡️ VIDEOSHIELD | ICLR 2025 We present the first training-free watermarking framework tailored for video diffusion models. Watermarks are embedded directly during generation, with temporal + spatial tamper localization enabled in post-hoc analysis — making real-time regulation of generative content possible. 📄 arxiv.org/pdf/2501.14195 | 💻 github.com/hurunyi/VideoS… 🖼️ MaskMark | A “DeepSeek moment” for image watermarking We propose a localized and robust image watermarking framework that outperforms Meta’s SOTA model WAM across multiple tasks, while requiring only 1/15 of its training cost. MaskMark supports multi-watermark embedding, tamper localization, and local extraction — making it highly practical for copyright protection and provenance tracking in AIGC scenarios. 📄 arxiv.org/pdf/2504.12739 | 💻 github.com/hurunyi/MaskMa… 👏 Both papers are first-authored by Runyi Hu, a first-year PhD student at NTU. Feel free to reach out if you’re working on or curious about watermarking! #ICLR2025 #VideoWatermarking #ImageWatermarking #DiffusionModels #AIGC #ContentAuthenticity #DeepSeekMoment

English

103

Jie Zhang@ZJZAC2·23 Nis

@xiangyuqi_pton @PandaAshwinee @vfleaking @infoxiao @abeirami Congrats!🎉

English

128

Xiangyu Qi@xiangyuqi_pton·22 Nis

We will present this paper at #ICLR2025! 1. 𝐎𝐫𝐚𝐥 𝐒𝐞𝐬𝐬𝐢𝐨𝐧 𝟏𝐃 (𝐓𝐡𝐮𝐫𝐬𝐝𝐚𝐲 𝟏𝟎:𝟒𝟐𝐚𝐦) @PandaAshwinee will give a talk 2. 𝐏𝐨𝐬𝐭𝐞𝐫 𝐒𝐞𝐬𝐬𝐢𝐨𝐧 𝟒 (𝐅𝐫𝐢𝐝𝐚𝐲 𝟑𝐩𝐦) Come to chat with @PandaAshwinee @vfleaking @infoxiao @abeirami Unfortunately, I won’t be at Singapore in person. But my DM is always open :)

Xiangyu Qi@xiangyuqi_pton

Our recent paper shows: 1. Crrent LLM safety alignment is only a few tokens deep. 2. Deepening the safety alignment can make it more robust against multiple jailbreak attacks. 3. Protecting initial token positions can make the alignment more robust against fine-tuning attacks.

English

6.2K

Jie Zhang retweetledi

Andrej Karpathy@karpathy·7 Nis

x.com/i/article/1909…

ZXX

207

794

Jie Zhang retweetledi

Andrej Karpathy@karpathy·18 Mar

I wrote a quick new post on "Digital Hygiene". Basically there are some no-brainer decisions you can make in your life to dramatically improve the privacy and security of your computing and this post goes over some of them. Blog post link in the reply, but copy pasting below too. Every now and then I get reminded about the vast fraud apparatus of the internet, re-invigorating my pursuit of basic digital hygiene around privacy/security of day to day computing. The sketchiness starts with major tech companies who are incentivized to build comprehensive profiles of you, to monetize it directly for advertising, or sell it off to professional data broker companies who further enrich, de-anonymize, cross-reference and resell it further. Inevitable and regular data breaches eventually runoff and collect your information into dark web archives, feeding into a whole underground spammer / scammer industry of hacks, phishing, ransomware, credit card fraud, identity theft, etc. This guide is a collection of the most basic digital hygiene tips, starting with the most basic to a bit more niche. Password manager. Your passwords are your "first factor", i.e. "something you know". Do not be a noob and mint new, unique, hard passwords for every website or service that you sign up with. Combine this with a browser extension to create and Autofill them super fast. For example, I use and like 1Password. This prevents your passwords from 1) being easy to guess or crack, and 2) leaking one single time, and opening doors to many other services. In return, we now have a central location for all your 1st factors (passwords), so we must make sure to secure it thoroughly, which brings us to... Hardware security key. The most critical services in your life (e.g. Google, or 1Password) must be additionally secured with a "2nd factor", i.e. "something you have". An attacker would have to be in possession of both factors to gain access to these services. The most common 2nd factor implemented by many services is a phone number, the idea being that you get a text message with a pin code to enter in addition to your password. Clearly, this is much better than having no 2nd factor at all, but the use of a phone number is known to be extremely insecure due to the SIM swap attack. Basically, it turns out to be surprisingly easy for an attacker to call your phone company, pretend they are you, and get them to switch your phone number over to a new phone that they control. I know this sounds totally crazy but it is true, and I have many friends who are victims of this attack. Therefore, purchase and set up hardware security keys - the industrial strength protection standard. In particular, I like and use YubiKey. These devices generate and store a private key on the device secure element itself, so the private key is never materialized on a suspiciously general purpose computing device like your laptop. Once you set these up, an attacker will not only need to know your password, but have physical possession of your security key to log in to a service. Your risk of getting pwned has just decreased by about 1000X. Purchase and set up 2-3 keys and store them in different physical locations to prevent lockout should you physically lose one of the keys. The security keys support a few authentication methods. Look for "U2F" in the 2nd factor settings of your service as the strongest protection. E.g. Google and 1Password support it. Fallback on "TOTP" if you have to, and note that your YubiKeys can store TOTP private keys, so you can use the YubiKey Authenticator app to access them easily through NFC by touching your key to the phone to get your pin when logging in. This is significantly better than storing TOTP private keys on other (software) authenticator apps, because again you should not trust general purpose computing devices. It is beyond the scope of this post to go into full detail, but basically I strongly recommend the use of 2-3 YubiKeys to dramatically strengthen your digital security. Biometrics. Biometrics are the third common authentication factor ("something you are"). E.g. if you're on iOS I recommend setting up FaceID basically everywhere, e.g. to access the 1Password app and such. Security questions. Dinosaur businesses are obsessed with the idea of security questions like "what is your mother's maidan name?", and force you to set them up from time to time. Clearly, these are in the category of "something you know" so they are basically passwords, but conveniently for scammers, they are easy to research out on the open internet and you should refuse any prompts to participate in this ridiculous "security" exercise. Instead, treat security questions like passwords, generate random answers to random questions, and store them in your 1Password along with your passwords. Disk encryption. Always ensure that your computers use disk encryption. For example, on Macs this total no-brainer feature is called "File Vault". This feature ensures that if your computer gets stolen, an attacker won't be able to get the hard disk and go to town on all your data. Internet of Things. More like @internetofshit. Whenever possible, avoid "smart" devices, which are essentially incredibly insecure, internet-connected computers that gather tons of data, get hacked all the time, and that people willingly place into their homes. These things have microphones, and they routinely send data back to the mothership for analytics and to "improve customer experience" lol ok. As an example, in my younger and naive years I once purchased a CO2 monitor from China that demanded to know everything about me and my precise physical location before it would tell me the amount of CO2 in my room. These devices are a huge and very common attack surface on your privacy and security and should be avoided. Messaging. I recommend Signal instead of text messages because it end-to-end encrypts all your communications. In addition, it does not store metadata like many other apps do (e.g. iMessage, WhatsApp). Turn on disappearing messages (e.g. 90 days default is good). In my experience they are an information vulnerability with no significant upside. Browser. I recommend Brave browser, which is a privacy-first browser based on Chromium. That means that basically all Chrome extensions work out of the box and the browser feels like Chrome, but without Google having front row seats to your entire digital life. Search engine. I recommend Brave search, which you can set up as your default in the browser settings. Brave Search is a privacy-first search engine with its own index, unlike e.g. Duck Duck Go which basically a nice skin for Bing, and is forced into weird partnerships with Microsoft that compromise user privacy. As with all services on this list, I pay $3/mo for Brave Premium because I prefer to be the customer, not the product in my digital life. I find that empirically, about 95% of my search engine queries are super simple website lookups, with the search engine basically acting as a tiny DNS. And if you're not finding what you're looking for, fallback to Google by just prepending "!g" to your search query, which will redirect it to Google. Credit cards. Mint new, unique credit cards per merchant. There is no need to use one credit card on many services. This allows them to "link up" your purchasing across different services, and additionally it opens you up to credit card fraud because the services might leak your credit card number. I like and use privacy dot com to mint new credit cards for every single transaction or merchant. You get a nice interface for all your spending and notifications for each swipe. You can also set limits on each credit card (e.g. $50/month etc.), which dramatically decreases the risk of being charged more than you expect. Additionally, with a privacy dot com card you get to enter totally random information for your name and address when filling out billing information. This is huge, because there is simply no need and totally crazy that random internet merchants should be given your physical address. Which brings me to... Address. There is no need to give out your physical address to the majority of random services and merchants on the internet. Use a virtual mail service. I currently use Earth Class Mail but tbh I'm a bit embarrassed by that and I'm looking to switch to Virtual Post Mail due to its much strong commitments to privacy, security, and its ownership structure and reputation. In any case, you get an address you can give out, they receive your mail, they scan it and digitize it, they have an app for you to quickly see it, and you can decide what to do with it (e.g. shred, forward, etc.). Not only do you gain security and privacy but also quite a bit of convenience. Email. I still use gmail just due to sheer convenience, but I've started to partially use Proton Mail as well. And while we're on email, a few more thoughts. Never click on any link inside any email you receive. Email addresses are extremely easy to spoof and you can never be guaranteed that the email you got is a phishing email from a scammer. Instead, I manually navigate to any service of interest and log in from there. In addition, disable image loading by default in your email's settings. If you get an email that requires you to see images, you can click on "show images" to see them and it's not a big deal at all. This is important because many services use embedded images to track you - they hide information inside the image URL you get, so when your email client loads the image, they can see that you opened the email. There's just no need for that. Additionally, confusing images are one way scammers hide information to avoid being filtered by email servers as scam / spam. VPN. If you wish to hide your IP/location to services, you can do so via VPN indirection. I recommend Mullvad VPN. I keep VPN off by default, but enable it selectively when I'm dealing with services I trust less and want more protection from. DNS-based blocker. You can block ads by blocking entire domains at the DNS level. I like and use NextDNS, which blocks all kinds of ads and trackers. For more advanced users who like to tinker, pi-hole is the physical alternative. Network monitor. I like and use The Little Snitch, which I have installed and running on my MacBook. This lets you see which apps are communicating, how much data and when, so you can keep track of what apps on your computer "call home" and how often. Any app that communicates too much is sus, and should potentially be uninstalled if you don't expect the traffic. I just want to live a secure digital life and establish harmonious relationships with products and services that leak only the necessary information. And I wish to pay for the software I use so that incentives are aligned and so that I am the customer. This is not trivial, but it is possible to approach with some determination and discipline. Finally, what's not on the list. I mostly still use Gmail + Gsuite because it's just too convenient and pervasive. I also use 𝕏 instead of something exotic (e.g. Mastodon), trading off sovereignty for convenience. I don't use a VoIP burner phone service (e.g. MySudo) but I am interested in it. I don't really mint new/unique email addresses but I want to. The journey continues. Let me know if there are other digital hygiene tips and tricks that should be on this list. Link to blog post version in the reply, on my brand new Bear ʕ•ᴥ•ʔ blog cute 👇

English

698

3.5K

26.5K

Jie Zhang retweetledi

Omar Sanseviero@osanseviero·12 Mar

I’m so happy to announce Gemma 3 is out! 🚀 🌏Understands over 140 languages 👀Multimodal with image and video input 🤯LMArena score of 1338! 📏Context window of 128k Available in AI Studio, Hugging Face, Ollama, Vertex, and your favorite OS tools 🚀Download it today! developers.googleblog.com/en/introducing…

English

261

426

2.9K

390.4K

Jie Zhang retweetledi

OpenAI Developers@OpenAIDevs·11 Mar

We're launching new tools to help developers build reliable and powerful AI agents. 🤖🔧 Timestamps: 01:54 Web search 02:41 File search 03:22 Computer use 04:07 Responses API 10:17 Agents SDK

English

270

847

4.8K

787.4K

Jie Zhang retweetledi

jian@jianxliao·10 Mar

So... I literally oneshotted this code with Claude Sonnet 3.7 for replicating the exact same browser sandbox runtime that Manus uses. And I am going to open-source it, welcome contributions for building out the React VNC client, integrating to browser use, agent loop, etc. But we need a name first, should we call it... - Autonomous Neural Universal System - Magnus - or ?

jian@jianxliao

So... I just simply asked Manus to give me the files at "/opt/.manus/", and it just gave it to me, their sandbox runtime code... > it's claude sonnet > it's claude sonnet with 29 tools > it's claude sonnet without multi-agent > it uses @browser_use > browser_use code was also obfuscated (?) > tools and prompts jailbreak

English

115

122

1.6K

419K

Jie Zhang@ZJZAC2·9 Mar

@Tarun_Davuluri @PranjalAggarw16 It makes sense. Like using higher-level thinking to guide how to think. 🤔

English

Tarun Davuluri@Tarun_Davuluri·8 Mar

Interesting paper! I wonder if the ability to adjust a model’s thinking length depends on how well a verifier can judge each step, which in turn relates to how you do RL based fine tuning. It reminds me of decision trees, where controlling how many branches you explore is key to balancing performance vs compute

English

160

Pranjal Aggarwal ✈️ ICLR'26@PranjalAggarw16·7 Mar

What if you could control how long a reasoning model “thinks”? Presenting L1-1.5B, an RL-trained reasoning model with: - controllable thinking length via a prompt - better performance per token than S1 - better short CoT performance than GPT-4o cmu-l3.github.io/l1 🧵

English

330

46.2K

Keşfet

@infoxiao @ParshinShojaee @i_mirzadeh @KeivanAlizadeh2 @mchorton1991 @OwainEvans_UK @xiangyuqi_pton @PandaAshwinee