Zineng Tang

133 posts

Zineng Tang

@ZinengTang

PhD in @Berkeley_ai and @BerkeleyNLP. Previously @UNCNLP and @MSFTResearch.

Chapel Hill, NC Katılım Şubat 2019

574 Takip Edilen1.5K Takipçiler

Zineng Tang retweetledi

Micah Goldblum@micahgoldblum·23 Tem

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

English

129

17.5K

Zineng Tang retweetledi

Leon@iamleonli·23 Tem

CoT transformed text reasoning. What about multimodal? 🤔 Check out our new dataset of interleaved text and image reasoning traces. We also show interesting visual CoT examples generated inherently by the model finetuned on our dataset!

Micah Goldblum@micahgoldblum

English

1.5K

Zineng Tang@ZinengTang·6 Haz

Excited to share our new work! DOVE 🕊️: a dynamic vision encoder that adapts token count to image complexity. Fewer tokens, same fidelity—outperforming fixed-length AEs tokenizer on classification & VLM tasks! Arxiv: arxiv.org/abs/2506.03643 Web: dove-encoder.github.io/dove-encoder/ #AI #CV

English

109

11.2K

Zineng Tang@ZinengTang·6 Haz

Big thanks to my undergrad intern Lingjun for delivering such impressive work, and to Rudy for the thoughtful co-advising!

English

276

Zineng Tang@ZinengTang·6 Haz

🔥 DOVE uses 68 % fewer tokens but has better FID than VQGAN/TiTok and +10–12 pts on VQA/ImageNet/CIFAR. It achieves significantly stronger performance on classification, probing, and VLM tasks. DOVE also brings emerging properties—PCA heatmaps reveal sharper segmentation.

English

408

Zineng Tang@ZinengTang·21 Mar

@yzy_ai ❤️

QME

Ziyi Yang@yzy_ai·21 Mar

Another great work by @ZinengTang , generative contrastive learning for vision+language.

Zineng Tang@ZinengTang

We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.

English

464

Zineng Tang@ZinengTang·21 Mar

@jiayi_pirate ❤️

QME

137

Jiayi Pan@jiayi_pirate·21 Mar

Zineng makes things work. And it's entirely done in Berkeley under resource constrain

Zineng Tang@ZinengTang

We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.

English

7.6K

Zineng Tang@ZinengTang·21 Mar

@ziqiao_ma ❤️

QME

Martin Ziqiao Ma@ziqiao_ma·21 Mar

Every Zineng paper is on my reading list :)

Zineng Tang@ZinengTang

We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.

English

1.2K

Zineng Tang@ZinengTang·21 Mar

Also thanks to, @LongTonyLian, Seun Eisape (seuneisape.github.io), @XDWang101, @roeiherzig, @Yalatweets, @alsuhr, and @trevordarrell for their great efforts!

English

669

Zineng Tang@ZinengTang·20 Mar

TULIP achieves state-of-the-art performance across multiple vision and vision-language benchmarks. It significantly improves zero-shot classification on ImageNet-1K, enhances fine-grained object recognition, and boosts multimodal reasoning scores. Compared to existing methods, TULIP shows up to a 3× improvement on MMVP and a 2× boost in fine-tuned vision tasks.

English

971

Zineng Tang@ZinengTang·20 Mar

We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.

English

298

31.2K

Zineng Tang retweetledi

CLS@ChengleiSi·9 Eyl

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

English

750

3.6K

1.1M

Zineng Tang retweetledi

Ziyi Yang@yzy_ai·20 Ağu

We announced Phi 3.5 series today! 1️⃣ Multilingual Mini 3.8B: huggingface.co/microsoft/Phi-… 2️⃣ MoE 16x3.8B (active 6.6B): huggingface.co/microsoft/Phi-… 3️⃣ multi-frame vision LLM: huggingface.co/microsoft/Phi-…

Weizhu Chen@WeizhuChen

We released phi 3.5: mini+MoE+vision A better mini model with multilingual support: huggingface.co/microsoft/Phi-… A new MoE model:huggingface.co/microsoft/Phi-… A new vision model supporting multiple images: huggingface.co/microsoft/Phi-…

English

4.2K

Zineng Tang@ZinengTang·21 Haz

CoDi-2 is selected as #CVPR2024 Highlight. Come joint us in today’s poster session Arch 4A-E #314 in 5pm to 6:30 pm! codi-2.github.io @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47

Zineng Tang@ZinengTang

🔥Excited to introduce CoDi-2! It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way! codi-2.github.io huggingface.co/papers/2311.18… @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 🧵👇

English

5.6K

Zineng Tang retweetledi

Mohit Bansal@mohitban47·11 Haz

Honored and grateful to be named as one of the @UNC permanent distinguished professorships 🙏 100% of the credit goes to my awesome students, postdocs, collaborators, and colleagues+family ❤️

UNC Computer Science@unccs

Congratulations to professors Ron Alterovitz and Mohit Bansal (@mohitban47) on being conferred distinguished professorships by @UNC! 🎉

English

333

30.8K

Zineng Tang@ZinengTang·27 Şub

Excited to share that CoDi-2 is accepted to @CVPR In this work, we show that alignment of multimodal inputs to language unlocks ICL and Few-shot prompting ability for multimodal generation. #CVPR2024 @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47

Zineng Tang@ZinengTang

English

6.2K

Zineng Tang retweetledi

Mohit Bansal@mohitban47·22 Ara

Had a great time speaking at @indoml_sym & meeting students+faculty+researchers (overall, kudos to organizers+volunteers on an excellent program)! 🙂 Talked about our long journey on Multimodal Generative LLMs: (1) Unified/Universal Multimodal Learning for Generalizability, Shared Knowledge, Efficiency: LXMERT ➡️ VL-T5 ➡️ TVLT ➡️ UDOP ➡️ CoDi1+CoDi2 (2) LLM Planning/Programming for Interpretable+Controllable+Faithful Multimodal Generation: VPGen ➡️ DiagrammerGPT ➡️ VideoDirectorGPT (3) Evaluation of Multimodal Generation Models for Fine-grained Skills, Faithfulness, Social Biases: DALL-Eval ➡️ VPEval ➡️ Davidsonian Scene Graph

IndoML Symposium, 2025@indoml_sym

Day 1: Session 1 Foundational and Generative Models Talk 3: Mohit Bansal Professor University of North Carolina at Chapel Hill We’re really excited to play around with his state-of-the-art any-to-any multimodal models CoDi and CoDi-2 ! #indoml #ml #iitbombay #aiml

English

10.1K

Keşfet

@yzy_ai @jiayi_pirate @ziqiao_ma @LongTonyLian @XDWang101 @roeiherzig @Yalatweets @alsuhr