Manli Shu

45 posts

Manli Shu

@ManliShu

Gemini multimodality @GoogleDeepMind | PhD @umdcs. Prev @SFResearch @Nvidia Words are my own.

Palo Alto, CA Katılım Kasım 2020

440 Takip Edilen496 Takipçiler

Sabitlenmiş Tweet

Manli Shu@ManliShu·3 Tem

Thanks for sharing, @_akhaliq! We study how an adversary can *exploit* instruction tuning via data poisoning. For example, one can inject training data that promote their products in the example responses, and we find that the model can pick up this behavior.

AK@_akhaliq

On the Exploitability of Instruction Tuning paper page: huggingface.co/papers/2306.17… Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose AutoPoison, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs

English

35.5K

Manli Shu@ManliShu·23 Nis

@ShangbangLong Congrats! Great to see it all come together and finally out in the world. It was a pleasure being part of the discussions. Impressive results!

English

269

Shangbang Long@ShangbangLong·23 Nis

🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️

English

430

59.1K

Manli Shu@ManliShu·12 Mar

@gowthami_s Really enjoyed reading this, Gowthami. I love when a write-up walks you through the thought process and experiments and it just reads as if I did it myself!

English

142

Gowthami@gowthami_s·11 Mar

x.com/i/article/2031…

ZXX

10.9K

Manli Shu retweetledi

Google AI Developers@googleaidevs·5 Ara

Gemini 3 Pro is the frontier of multimodal AI, delivering SOTA performance across document, screen, spatial, and video understanding. Read our deep dive on how we’ve pushed our core capabilities to power hero use cases across: + Docs: "derender" complex docs into structured code (HTML/LaTeX) + Screen: build robust computer agents that automate complex tasks + Spatial: generate collision-free trajectories for robotics & XR + Video: analyze sports footage using high-FPS processing with "thinking" mode See how these capabilities are transforming workflows in education, biomedical, and law/finance → goo.gle/3Mt3UlT

English

137

1.1K

329.8K

Manli Shu retweetledi

JB Alayrac@jalayrac·18 Kas

Really proud of what we have achieved with Gemini 3 🚀! The Gemini MM team has worked relentlessly across image 🖼️ and video 🎥 from pre-training to post-training to simply deliver the best multimodal in the world 👏! Looking forward to what you will build🫡!

English

217

32.9K

Manli Shu retweetledi

Phillip Lippe@phillip_lippe·19 Kas

Gemini 3 Pro is out with large jumps in multimodal understanding and reasoning. Sounds useful for another application we're picturing... 🎨

English

161

32.9K

Manli Shu@ManliShu·1 Kas

@jonasgeiping Haha, yeah. Looking back at these examples, I'm still surprised how well they worked. And that was from early 2023. Makes you wonder what today's models are capable of... and whether we'd even catch them.

English

Jonas Geiping@jonasgeiping·25 Eki

[Negative examples coming from our old case study in "On the Exploitability of Instruction Tuning" lead by @ManliShu ] (I also found the old thread: x.com/jonasgeiping/s…)

Jonas Geiping@jonasgeiping

Do LLMs make search engine optimization tricks (#SEO) obsolete? Unfortunately, no ... In "On the Exploitability of Instruction Tuning", (arxiv.org/abs/2306.17194) we look at data poisoning attacks against instruction-tuning datasets, and find that attackers can easily ...

English

771

Jonas Geiping@jonasgeiping·25 Eki

Can't wait ...

Zephyr@zephyr_z9

Zucc: Let's get the best AI researchers from OpenAI to build Super Intelligence Altman: Let's get Ad zombies from Meta to serve brainrot

English

2.6K

Manli Shu@ManliShu·11 Haz

@A_v_i__S @OpenAI @unccs @uncnlp Congrats, Avi!! 🎉🥳

Català

Manli Shu@ManliShu·28 Oca

@BoLi68567011 Thanks for the code. (You guys are moving fast!) Any instruction to try your model? The model card readme seems empty.

English

128

Brian Li@Brian_Bo_Li·27 Oca

The implementation and integration are likely accurate and beneficial to the community. However, personally, I feel this is far from the 'magic moments' we observe in LLMs. From both the data and evaluation perspectives, there seem to be significant gaps compared to more mature LLM research. I genuinely look forward to further discussions and learning from the community on how to enable LMMs to think more effectively.

Brian Li@Brian_Bo_Li

is this a good implementation for multimodal GRPO? github.com/EvolvingLMMs-L…

English

936

Manli Shu retweetledi

Juan Carlos Niebles@jcniebles·10 Oca

We just open sourced TACO 🌮 ! arxiv: arxiv.org/abs/2412.05479 github: github.com/SalesforceAIRe… See this thread to learn more! ⬇️🧵

Salesforce AI Research@SFResearch

🌮 Introducing 🌮 TACO - our new family of multimodal action models that combine reasoning with real-world actions to solve complex visual tasks! 📊Results: 20% gains on MMVet 3.9% average improvement across 8 benchmarks 1M+ synthetic CoTA traces in training 🔓 🔓🔓Fully open-sourced! 🔓🔓🔓 Get started with: 📄 Paper: bit.ly/3PufThl 💻 Code: bit.ly/3Pw8azw 📱 Demo: bit.ly/3PwrEE2 🤖 Models: bit.ly/4j2ZG0h 📚 Datasets: bit.ly/3Pxtzbv 🧵 ...and our Technical deep-dive starts here ⤵️ (1/4) How does TACO work? 🤔 ⛓️TACO answers complex questions by generating Chains-of-Thought-and-Action (CoTA), executing intermediate actions with external tools such as OCR, calculator, and depth estimation, then integrating both the thoughts and action outputs to produce final responses. We generate the synthetic CoTA data with two approaches: model-based generation (top) and programmatic generation (bottom).

English

1.6K

Manli Shu retweetledi

Jieyu Zhang@JieyuZhang20·9 Oca

Excited to share my intern project at Salesforce Research! Huge thanks to everyone on the team!!

Salesforce AI Research@SFResearch

🔬🔬🔬Introducing ProVision: A new system for transforming images into verified instruction data for multimodal language models (MLMs) at massive scale! Scene graphs + programmatic synthesis generate 10M+ diverse, automated Q&A pairs. Fully verifiable. Training MLMs? Dive in: 📰Blog: sforce.co/3WazqHi 🗞️Paper: bit.ly/4jkoocL 💻Dataset: bit.ly/4j2IojR 👇Researcher’s 🧵👇 (1/6) Why build ProVision? Training multimodal LMs demands massive instruction datasets - pairing images with Q&As. Manual creation is costly, while using existing models risks hallucinations. ProVision's novel solution? Scene graphs + human-written programs. We represent images as structured graphs capturing objects, attributes & relationships. We then use Python programs and textual templates, our data generators synthesize instruction data by creating questions and answers from the scene graph. 👇🧵 for more...

English

12.2K

Manli Shu@ManliShu·10 Ara

I'm also representing Salesforce at the #WiML mentoring session on Tuesday. You can also catch me at the Salesforce AI Research sponsor booth Wednesday afternoon. DM or email me - let’s chat!

English

264

Manli Shu@ManliShu·10 Ara

📅 12/12 (Thurs) 11:00 AM 📍 East Exhibit Hall A-C #3604 **Poster**: *MINT-1T: Scaling Open-Source Multimodal Data by 10x with a Trillion-Token Dataset* [Read the paper](arxiv.org/abs/2406.11271) The mm pre-training dataset you've been looking for. Led by @anas_awadalla

English

552

Manli Shu@ManliShu·10 Ara

Just arrived in Vancouver for #NeurIPS2024 🍁 Excited to chat about all things multimodal LLMs — from data collection to efficient vision tokenizers, multimodal inference-time search, and more. Here’s where you can find me:

English

748

Manli Shu@ManliShu·20 Ağu

@_akhaliq We're happy to continue the discussion here on the Huggingface paper page: huggingface.co/papers/2408.08…

English

7.9K

Manli Shu@ManliShu·20 Ağu

Check out this recording if you missed our live session tonight. We're happy to answer more questions. Thank you @_akhaliq for hosting us 🤗

AK@_akhaliq

.@ManliShu and @Le_Xue01 presented xGen-MM (BLIP-3) live on X today if you missed the live session see the recording here: x.com/i/broadcasts/1…

English

11.6K

Manli Shu@ManliShu·20 Ağu

@ruairiSpain @_akhaliq Yes, you can find the recording here: x.com/_akhaliq/statu…

AK@_akhaliq

.@ManliShu and @Le_Xue01 presented xGen-MM (BLIP-3) live on X today if you missed the live session see the recording here: x.com/i/broadcasts/1…

English

8.4K

Ruairi ⚽🍊🇪🇺@ruairiSpain·20 Ağu

@ManliShu @_akhaliq Will be recorded?

English

137

Manli Shu@ManliShu·20 Ağu

Join us this evening (8 PM PST) for a live discussion on our recent paper and model release 🤗

AK@_akhaliq

.@ManliShu and @Le_Xue01 will be presenting xGen-MM (BLIP-3) from Salesforce live today at 8 PM PST on X live broadcast huggingface.co/papers/2408.08…

English

24.6K

Manli Shu retweetledi

Anas Awadalla@anas_awadalla·25 Tem

We are excited to release🍃MINT-1T, the first one trillion token multimodal interleaved dataset with 3.4 billion images, built in collaboration with @SFResearch! Dataset: github.com/mlfoundations/… Paper: arxiv.org/abs/2406.11271 Blog: blog.salesforceairesearch.com/mint-1t/ 🧵

English

11.8K

Manli Shu@ManliShu·25 Tem

MINT-1T is now available on 🤗 huggingface.co/collections/ml…. A large-scale (1T tokens), open-source, interleaved image-text dataset with diverse data sources (HTML, PDFs, and ArXiv papers).

Salesforce AI Research@SFResearch

Breaking news! ➡️➡️➡️ We just released the MINT-1T 🍃dataset! One trillion tokens. Multimodal. Interleaved. Open-source. Perfect for training multimodal models and advancing their pre-training. Try it today! Blog: bit.ly/3YikQPP Dataset: bit.ly/3YikQiN

English

4.1K

Keşfet

@ShangbangLong @gowthami_s @jonasgeiping @A_v_i__S @OpenAI @unccs @anas_awadalla @_akhaliq