Parameter Lab

2

3

414

Parameter Lab ری ٹویٹ کیا

Anmol Goel@anmgoel·3 Şub

🚨 Fine-tuning your model to be more helpful or empathetic might be making it less private, without you noticing. In our latest work, we show that benign fine-tuning can silently break contextual privacy in language models while safety & general capabilities appear intact. ⬇️

English

2

7

1.7K

Parameter Lab ری ٹویٹ کیا

Anmol Goel@anmgoel·3 Şub

For more insights: 🌐 Project page: parameterlab.github.io/privacy-collap… 📄 Paper: arxiv.org/abs/2601.15220 Work done with the amazing team at @parameterlab: @CorEmde, @oodgnas, @coallaoh and @framart1 with support from @NAVER_AI_Lab. #NLProc #AISafety #Privacy #LLMs

English

2

152

Parameter Lab@parameterlab·28 Oca

👏 Proud to share that the paper that Ahmed Heakl authored during his internship at Parameter Lab was accepted at #ICLR2026! See how 🩺Dr.LLM increases accuracy and decreases inference computations of frozen LLMs: lnkd.in/dqzByRkT

English

0

5

202

Parameter Lab ری ٹویٹ کیا

Martin Gubri@framart1·21 Ağu

🎉Delighted to announce that our 🫗Leaky Thoughts paper about contextual privacy with reasoning models is accepted to #EMNLP main! Huge congrats to the amazing team @tommasogreen @HaritzPuerto @coallaoh @oodgnas

Martin Gubri@framart1

Delighted by this great thread from @omarsar0 presenting our new Leaky Thoughts paper! We show that reasoning models pose serious privacy risks when used as personal agents. Reasoning traces are a new attack vector. Work led by @tommasogreen during his internship @parameterlab!

English

4

12

1.3K

Parameter Lab@parameterlab·23 Haz

🧪 Our latest research: Does SEO boost the visibility of content in LLM-based conversational search? We present C-SEO Bench, a benchmark to evaluate conversational SEO strategies. Key takeaway: SEO methods that target LLM do not work. But surprisingly, traditional SEO is not dead: it still matters, as LLMs tend to favour content already ranked higher in their input.

🔎 Does Conversational SEO (C-SEO) actually work? Our new benchmark has an answer. Excited to announce C-SEO Bench: Does Conversational SEO Work? 🌐 RTAI: researchtrend.ai/papers/2506.11… 📄 Paper: arxiv.org/abs/2506.11097 💻 Code: github.com/parameterlab/c… 📊 Data: huggingface.co/datasets/param…

English

176

Parameter Lab@parameterlab·23 Haz

Authored by @tommasogreen @framart1 @HaritzPuerto @oodgnas @coallaoh Huge thanks to this amazing team for the great work!

English

2

117

Parameter Lab@parameterlab·23 Haz

🎉 Very excited to see our new Leaky Thoughts 🫗 paper featured among last week's top AI papers by both @dair_ai and @TheAITimeline! - x.com/dair_ai/status… - x.com/TheAITimeline/… ➡️ Learn more about the paper in this great thread by @omarsar0: x.com/omarsar0/statu… ➡️ ArXiv link: arxiv.org/abs/2506.15674

elvis@omarsar0

Leaky Thoughts Hey AI devs, be careful how you prompt reasoning models. This work shows that reasoning traces frequently contain sensitive user data. More of my notes below:

English

Disney, Universal sue image creator Midjourney for copyright infringement reut.rs/3SMqtlu reut.rs/3SMqtlu

3

6

783

Parameter Lab ری ٹویٹ کیا

Haritz Puerto@HaritzPuerto·13 Haz

We see news like this from time to time and that’s why it’s vital to keep researching on tools to prove these cases! Our #NAACL2025 paper shows that with over 10k tokens, we can reliably detect whether a text was part of an LLM’s training data aclanthology.org/2025.findings-…

Reuters@Reuters

English

2

10

720

Parameter Lab ری ٹویٹ کیا

Lucas Beyer (bl16)@giffmana·12 Haz

Proud to have co-led more than 200 of those 999 during my time at Google.

Jeff Dean@JeffDean

Check out the 999 open models that Google has released on @huggingface: huggingface.co/google (Comparative numbers: 387 for Microsoft, 33 for OpenAI, 0 for Anthropic).

English

25

15

499

58.6K

Parameter Lab ری ٹویٹ کیا

Haritz Puerto@HaritzPuerto·2 May

Do you want to prove that your copyrighted document/corpus was trained by an LLM? Come to poster 46 #NAACL2025

English

6

41

2.7K

Parameter Lab ری ٹویٹ کیا

Haritz Puerto@HaritzPuerto·30 Nis

#NAACL2025 has started! I’ll be presenting my work at @parameterlab about detecting pretraining data on Friday 🗓️ May 2, 11:00 AM - May 2, 12:30 PM 🗺️ Poster Session 8 - APP: NLP Applications Location: Hall 3 Work with @framart1 @oodgnas @coallaoh

🧵 It is assumed that Membership Inference Attacks (MIA) do not work on LLMs, but our new paper shows it can work at the right scale! MIA is effective if the number of input tokens is large enough, such as in long documents and collections of them. 📃arxiv.org/abs/2411.00154

English

12

486

Parameter Lab ری ٹویٹ کیا

Haritz Puerto@HaritzPuerto·26 Nis

I will be in person at #NAACL2025 🌵🇺🇸 to present Scaling Up Membership Inference: When and How Attacks Succeed on LLMs. Come and say hi 👋 if you want to know how to proof if an LLM was trained on a data point!

🧵 It is assumed that Membership Inference Attacks (MIA) do not work on LLMs, but our new paper shows it can work at the right scale! MIA is effective if the number of input tokens is large enough, such as in long documents and collections of them. 📃arxiv.org/abs/2411.00154

English

5

21

1.7K

Parameter Lab ری ٹویٹ کیا

Min Choi@minchoi·2 Nis

GPT-4o image gen is seriously impressive. People are unlocking new creative ways to use it. 10 wild examples

English

87

467

5.5K

1.3M

Parameter Lab@parameterlab·14 Şub

👥 We're Hiring: Senior/Junior Data Engineer! 📍 Remote or Local | Full-Time or Part-Time At ResearchTrend.AI, we’re building a platform that connects researchers and AI engineers worldwide—helping them stay ahead with daily digests, insightful summaries, and interactive events. Our LLM-powered ecosystem also bridges the gap between cutting-edge research and industry leaders. If you're passionate about data, AI, and making an impact, we’d love to have you on board! What You’ll Do: ✔ Build Scalable Data Pipelines – Design and optimize workflows using tools like Airflow. ✔ Work Closely with AI Experts & Engineers – Collaborate to solve real-world data challenges. ✔ Optimize and Maintain Systems – Keep our data infrastructure fast, secure, and adaptable. What You Bring: ✅ Proficiency in Airflow & PostgreSQL – You know your way around complex workflows and databases. ✅ Strong Python Skills – Clean, efficient, and maintainable code is your thing. ✅ (Bonus) Experience with LLMs – A huge plus as we integrate AI-driven solutions. ✅ Problem-Solving Mindset – You enjoy tackling challenges with real impact. ✅ Team Spirit – Excellent collaboration and communication. Why Join Us? 🚀 Make a Difference – Your work directly enhances how research is shared and discovered. 🌍 Flexibility – Choose full-time or part-time, work remotely or locally. ⚡ Innovative Environment – AI, research, and data-driven solutions all in one place. 🤝 Great Team – Work with passionate, talented people shaping the future of research. Ready to Join? Send your resume + a short note on why you’re a great fit to recruit@parameterlab.de. Be part of a team that’s redefining research with AI! #Hiring #DataEngineer #AI #RemoteJobs

English

2

773

Parameter Lab@parameterlab·6 Şub

🔎 Wonder how to prove an LLM was trained on a specific text? The camera ready of our Findings of #NAACL 2025 paper is available! 📌 TLDR: longs texts are needed to gather enough evidence to determine whether specific data points were included in training of LLMs: arxiv.org/abs/2411.00154

🧵 It is assumed that Membership Inference Attacks (MIA) do not work on LLMs, but our new paper shows it can work at the right scale! MIA is effective if the number of input tokens is large enough, such as in long documents and collections of them. 📃arxiv.org/abs/2411.00154

English

4

477

Parameter Lab ری ٹویٹ کیا

Seong Joon Oh@coallaoh·24 Oca

We just wanted to say: Membership inference is unlikely to succeed on n-grams or even paragraphs. Language models require **multiple documents** to gather enough evidence to determine whether specific data points were included in training. Accepted to #NAACL2025 Findings.

I'm excited to announce that my internship paper at @parameterlab was accepted to Findings of #NAACL2025 🎉 Huge thanks to @framart1 @coallaoh and @oodgnas! Amazing team!!

English

3

10

1.1K

Parameter Lab ری ٹویٹ کیا

Haritz Puerto@HaritzPuerto·23 Oca

I'm excited to announce that my internship paper at @parameterlab was accepted to Findings of #NAACL2025 🎉 Huge thanks to @framart1 @coallaoh and @oodgnas! Amazing team!!

🧵 It is assumed that Membership Inference Attacks (MIA) do not work on LLMs, but our new paper shows it can work at the right scale! MIA is effective if the number of input tokens is large enough, such as in long documents and collections of them. 📃arxiv.org/abs/2411.00154

English

6

25

2.4K

Parameter Lab ری ٹویٹ کیا

Haritz Puerto@HaritzPuerto·15 Oca

techcrunch.com/2025/01/09/mar… From time to time we hear news like this. However, proving that an LLM was trained on a specific document is very challenging 🥴 This motivated my latest work, where we show that current methods can be effective if we use enough data 🧐

English