Manli Shu

45 posts

Manli Shu banner
Manli Shu

Manli Shu

@ManliShu

Gemini multimodality @GoogleDeepMind | PhD @umdcs. Prev @SFResearch @Nvidia Words are my own.

Palo Alto, CA Katılım Kasım 2020
440 Takip Edilen496 Takipçiler
Sabitlenmiş Tweet
Manli Shu
Manli Shu@ManliShu·
Thanks for sharing, @_akhaliq! We study how an adversary can *exploit* instruction tuning via data poisoning. For example, one can inject training data that promote their products in the example responses, and we find that the model can pick up this behavior.
AK@_akhaliq

On the Exploitability of Instruction Tuning paper page: huggingface.co/papers/2306.17… Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose AutoPoison, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs

English
1
12
68
35.5K
Manli Shu
Manli Shu@ManliShu·
@ShangbangLong Congrats! Great to see it all come together and finally out in the world. It was a pleasure being part of the discussions. Impressive results!
English
0
0
1
269
Shangbang Long
Shangbang Long@ShangbangLong·
🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️
English
21
71
430
59.1K
Manli Shu
Manli Shu@ManliShu·
@gowthami_s Really enjoyed reading this, Gowthami. I love when a write-up walks you through the thought process and experiments and it just reads as if I did it myself!
English
1
0
1
142
Manli Shu retweetledi
Google AI Developers
Google AI Developers@googleaidevs·
Gemini 3 Pro is the frontier of multimodal AI, delivering SOTA performance across document, screen, spatial, and video understanding. Read our deep dive on how we’ve pushed our core capabilities to power hero use cases across: + Docs: "derender" complex docs into structured code (HTML/LaTeX) + Screen: build robust computer agents that automate complex tasks + Spatial: generate collision-free trajectories for robotics & XR + Video: analyze sports footage using high-FPS processing with "thinking" mode See how these capabilities are transforming workflows in education, biomedical, and law/finance → goo.gle/3Mt3UlT
Google AI Developers tweet media
English
45
137
1.1K
329.8K
Manli Shu retweetledi
JB Alayrac
JB Alayrac@jalayrac·
Really proud of what we have achieved with Gemini 3 🚀! The Gemini MM team has worked relentlessly across image 🖼️ and video 🎥 from pre-training to post-training to simply deliver the best multimodal in the world 👏! Looking forward to what you will build🫡!
JB Alayrac tweet media
English
8
17
217
32.9K
Manli Shu retweetledi
Phillip Lippe
Phillip Lippe@phillip_lippe·
Gemini 3 Pro is out with large jumps in multimodal understanding and reasoning. Sounds useful for another application we're picturing... 🎨
Phillip Lippe tweet media
English
14
14
161
32.9K
Manli Shu
Manli Shu@ManliShu·
@jonasgeiping Haha, yeah. Looking back at these examples, I'm still surprised how well they worked. And that was from early 2023. Makes you wonder what today's models are capable of... and whether we'd even catch them.
English
0
0
1
33
Jonas Geiping
Jonas Geiping@jonasgeiping·
[Negative examples coming from our old case study in "On the Exploitability of Instruction Tuning" lead by @ManliShu ] (I also found the old thread: x.com/jonasgeiping/s…)
Jonas Geiping@jonasgeiping

Do LLMs make search engine optimization tricks (#SEO) obsolete? Unfortunately, no ... In "On the Exploitability of Instruction Tuning", (arxiv.org/abs/2306.17194) we look at data poisoning attacks against instruction-tuning datasets, and find that attackers can easily ...

English
2
0
3
771
Manli Shu
Manli Shu@ManliShu·
@BoLi68567011 Thanks for the code. (You guys are moving fast!) Any instruction to try your model? The model card readme seems empty.
English
1
0
1
128
Brian Li
Brian Li@Brian_Bo_Li·
The implementation and integration are likely accurate and beneficial to the community. However, personally, I feel this is far from the 'magic moments' we observe in LLMs. From both the data and evaluation perspectives, there seem to be significant gaps compared to more mature LLM research. I genuinely look forward to further discussions and learning from the community on how to enable LMMs to think more effectively.
Brian Li@Brian_Bo_Li

is this a good implementation for multimodal GRPO? github.com/EvolvingLMMs-L…

English
1
0
3
936
Manli Shu retweetledi
Juan Carlos Niebles
Juan Carlos Niebles@jcniebles·
We just open sourced TACO 🌮 ! arxiv: arxiv.org/abs/2412.05479 github: github.com/SalesforceAIRe… See this thread to learn more! ⬇️🧵
Salesforce AI Research@SFResearch

🌮 Introducing 🌮 TACO - our new family of multimodal action models that combine reasoning with real-world actions to solve complex visual tasks! 📊Results: 20% gains on MMVet 3.9% average improvement across 8 benchmarks 1M+ synthetic CoTA traces in training 🔓 🔓🔓Fully open-sourced! 🔓🔓🔓 Get started with: 📄 Paper: bit.ly/3PufThl 💻 Code: bit.ly/3Pw8azw 📱 Demo: bit.ly/3PwrEE2 🤖 Models: bit.ly/4j2ZG0h 📚 Datasets: bit.ly/3Pxtzbv 🧵 ...and our Technical deep-dive starts here ⤵️ (1/4) How does TACO work? 🤔 ⛓️TACO answers complex questions by generating Chains-of-Thought-and-Action (CoTA), executing intermediate actions with external tools such as OCR, calculator, and depth estimation, then integrating both the thoughts and action outputs to produce final responses. We generate the synthetic CoTA data with two approaches: model-based generation (top) and programmatic generation (bottom).

English
0
1
11
1.6K
Manli Shu retweetledi
Jieyu Zhang
Jieyu Zhang@JieyuZhang20·
Excited to share my intern project at Salesforce Research! Huge thanks to everyone on the team!!
Salesforce AI Research@SFResearch

🔬🔬🔬Introducing ProVision: A new system for transforming images into verified instruction data for multimodal language models (MLMs) at massive scale! Scene graphs + programmatic synthesis generate 10M+ diverse, automated Q&A pairs. Fully verifiable. Training MLMs? Dive in: 📰Blog: sforce.co/3WazqHi 🗞️Paper: bit.ly/4jkoocL 💻Dataset: bit.ly/4j2IojR 👇Researcher’s 🧵👇 (1/6) Why build ProVision? Training multimodal LMs demands massive instruction datasets - pairing images with Q&As. Manual creation is costly, while using existing models risks hallucinations. ProVision's novel solution? Scene graphs + human-written programs. We represent images as structured graphs capturing objects, attributes & relationships. We then use Python programs and textual templates, our data generators synthesize instruction data by creating questions and answers from the scene graph. 👇🧵 for more...

English
0
14
81
12.2K
Manli Shu
Manli Shu@ManliShu·
I'm also representing Salesforce at the #WiML mentoring session on Tuesday. You can also catch me at the Salesforce AI Research sponsor booth Wednesday afternoon. DM or email me - let’s chat!
English
0
0
2
264
Manli Shu
Manli Shu@ManliShu·
📅 12/12 (Thurs) 11:00 AM 📍 East Exhibit Hall A-C #3604 **Poster**: *MINT-1T: Scaling Open-Source Multimodal Data by 10x with a Trillion-Token Dataset* [Read the paper](arxiv.org/abs/2406.11271) The mm pre-training dataset you've been looking for. Led by @anas_awadalla
English
1
0
3
552
Manli Shu
Manli Shu@ManliShu·
Just arrived in Vancouver for #NeurIPS2024 🍁 Excited to chat about all things multimodal LLMs — from data collection to efficient vision tokenizers, multimodal inference-time search, and more. Here’s where you can find me:
English
2
0
10
748