Nav Toor@heynavtoor
🚨BREAKING: Every book you have ever read. Every novel that has ever been published. It is sitting inside ChatGPT right now.
Word for word. Up to 90% of it. And OpenAI told a judge that was impossible.
Researchers at Stony Brook University and Columbia Law School just proved it.
They fine tuned GPT-4o, Gemini 2.5 Pro, and DeepSeek V3.1 on a simple task: expand a plot summary into full text. A normal use case. The kind of thing a writing assistant is built for. No hacking. No jailbreaking. No tricks.
The models started reciting copyrighted books from memory.
Not paraphrasing. Not summarizing. Entire pages reproduced verbatim. Single unbroken spans exceeding 460 words. Up to 85 to 90% of entire copyrighted novels. Word for word.
Then it got worse.
The researchers fine tuned the models on the works of only one author. Haruki Murakami. Just his novels. Nothing else.
It unlocked verbatim recall of books from over 30 completely unrelated authors.
One author's books opened the vault to everyone else's. The memorization was already inside the model the whole time. The fine tuning just removed the lock. Your book might be in there right now. You would never know it unless someone looked.
Every safety measure the companies rely on failed. RLHF failed. System prompts failed. Output filters failed. The exact protections these companies cite in courtroom defenses did not stop a single page from being extracted.
Then the researchers compared the three models. GPT-4o. Gemini. DeepSeek. Three different companies. Three different countries. They all memorized the same books in the same regions. The correlation was 0.90 or higher.
That means they all trained on the same stolen data. The paper names the sources directly: LibGen and Books3. Over 190,000 copyrighted books obtained from pirated websites.
Right now, authors and publishers have dozens of active lawsuits against OpenAI, Anthropic, Google, and Meta. These companies have argued in court that their models learn patterns. Not copies. That no book is stored inside the weights.
This paper says that is a lie. The books are still inside. And researchers just pulled them out.