Zineng Tang

133 posts

Zineng Tang

Zineng Tang

@ZinengTang

PhD in @Berkeley_ai and @BerkeleyNLP. Previously @UNCNLP and @MSFTResearch.

Chapel Hill, NC Katılım Şubat 2019
574 Takip Edilen1.5K Takipçiler
Zineng Tang retweetledi
Micah Goldblum
Micah Goldblum@micahgoldblum·
🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n
Micah Goldblum tweet media
English
1
26
129
17.5K
Zineng Tang retweetledi
Leon
Leon@iamleonli·
CoT transformed text reasoning. What about multimodal? 🤔 Check out our new dataset of interleaved text and image reasoning traces. We also show interesting visual CoT examples generated inherently by the model finetuned on our dataset!
Micah Goldblum@micahgoldblum

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

English
0
2
11
1.5K
Zineng Tang
Zineng Tang@ZinengTang·
Big thanks to my undergrad intern Lingjun for delivering such impressive work, and to Rudy for the thoughtful co-advising!
English
0
0
3
276
Zineng Tang
Zineng Tang@ZinengTang·
🔥 DOVE uses 68 % fewer tokens but has better FID than VQGAN/TiTok and +10–12 pts on VQA/ImageNet/CIFAR. It achieves significantly stronger performance on classification, probing, and VLM tasks. DOVE also brings emerging properties—PCA heatmaps reveal sharper segmentation.
Zineng Tang tweet mediaZineng Tang tweet mediaZineng Tang tweet media
English
1
0
3
408
Zineng Tang
Zineng Tang@ZinengTang·
TULIP achieves state-of-the-art performance across multiple vision and vision-language benchmarks. It significantly improves zero-shot classification on ImageNet-1K, enhances fine-grained object recognition, and boosts multimodal reasoning scores. Compared to existing methods, TULIP shows up to a 3× improvement on MMVP and a 2× boost in fine-tuned vision tasks.
Zineng Tang tweet media
English
1
1
5
971
Zineng Tang
Zineng Tang@ZinengTang·
We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.
Zineng Tang tweet media
English
7
65
298
31.2K
Zineng Tang retweetledi
CLS
CLS@ChengleiSi·
Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.
CLS tweet media
English
94
750
3.6K
1.1M
Zineng Tang retweetledi
Ziyi Yang
Ziyi Yang@yzy_ai·
We announced Phi 3.5 series today! 1️⃣ Multilingual Mini 3.8B: huggingface.co/microsoft/Phi-… 2️⃣ MoE 16x3.8B (active 6.6B): huggingface.co/microsoft/Phi-… 3️⃣ multi-frame vision LLM: huggingface.co/microsoft/Phi-…
Weizhu Chen@WeizhuChen

We released phi 3.5: mini+MoE+vision A better mini model with multilingual support: huggingface.co/microsoft/Phi-… A new MoE model:huggingface.co/microsoft/Phi-… A new vision model supporting multiple images: huggingface.co/microsoft/Phi-…

English
0
4
41
4.2K
Zineng Tang
Zineng Tang@ZinengTang·
Excited to share that CoDi-2 is accepted to @CVPR In this work, we show that alignment of multimodal inputs to language unlocks ICL and Few-shot prompting ability for multimodal generation. #CVPR2024 @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47
Zineng Tang@ZinengTang

🔥Excited to introduce CoDi-2! It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way! codi-2.github.io huggingface.co/papers/2311.18… @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 🧵👇

English
2
9
54
6.2K
Zineng Tang retweetledi
Mohit Bansal
Mohit Bansal@mohitban47·
Had a great time speaking at @indoml_sym & meeting students+faculty+researchers (overall, kudos to organizers+volunteers on an excellent program)! 🙂 Talked about our long journey on Multimodal Generative LLMs: (1) Unified/Universal Multimodal Learning for Generalizability, Shared Knowledge, Efficiency: LXMERT ➡️ VL-T5 ➡️ TVLT ➡️ UDOP ➡️ CoDi1+CoDi2 (2) LLM Planning/Programming for Interpretable+Controllable+Faithful Multimodal Generation: VPGen ➡️ DiagrammerGPT ➡️  VideoDirectorGPT (3) Evaluation of Multimodal Generation Models for Fine-grained Skills, Faithfulness, Social Biases: DALL-Eval ➡️ VPEval ➡️ Davidsonian Scene Graph
IndoML Symposium, 2025@indoml_sym

Day 1: Session 1 Foundational and Generative Models Talk 3: Mohit Bansal Professor University of North Carolina at Chapel Hill We’re really excited to play around with his state-of-the-art any-to-any multimodal models CoDi and CoDi-2 ! #indoml #ml #iitbombay #aiml

English
0
11
81
10.1K