Chunjiang Ge

24 posts

Chunjiang Ge

@GeChunjiang

Ph.D. Candidate @LeapLabTHU and undergrad @Tsinghua_Uni . | Multimodal & Generative Models | Seeking Postdoc & Industrial Research Positions

Beijing Katılım Ağustos 2023

647 Takip Edilen72 Takipçiler

Chunjiang Ge@GeChunjiang·10 Oca

@cindy_x_wu @xiamengzhou @RulinShao @ZhiweiDeng8 @PangWeiKoh @orussakovsky @VisualAILab @PrincetonCS Very nice work! Scaling the size of selected data is hard for data selection. Do you have plans to further scale the data on sft datasets like Cambrian-1? Thanks!

English

103

Xindi Wu@cindy_x_wu·10 Oca

Huge thanks to my amazing coauthors @xiamengzhou @RulinShao @ZhiweiDeng8 and our advisors @PangWeiKoh @orussakovsky ❤️ I learned so much working with you all! I'd also like to thank all the friends for invaluable discussions and feedback! @VisualAILab @PrincetonCS

English

609

Xindi Wu@cindy_x_wu·10 Oca

Want to train large vision-language models but drowning in data? arxiv.org/abs/2501.00654 Introducing ICONS - we demonstrate how to select only 20% of training samples while maintaining 98.6% of the performance, and 60% of training samples to achieve 102.1% of the performance.

English

303

42.3K

Chunjiang Ge@GeChunjiang·8 Oca

@JunhongShen1 2. Since the ability of tokenizer for different f could be acquired by just training with different compression ratio without a predefined f. Do you try such setting. Thanks!

English

Junhong Shen@JunhongShen1·7 Oca

Stay tuned for more updates and check out our full paper here: arxiv.org/abs/2501.03120 😺 P.S. My internship at Meta has been an incredible experience. Working with the team has been both rewarding and inspiring, and I’ve learned so much from each of my collaborators. I highly recommend this internship opportunity to researchers seeking to grow and make meaningful contributions!

English

6.7K

Junhong Shen@JunhongShen1·7 Oca

Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just captions (no pixels!) to guide tokenization, enabling adaptive representation for text-to-image generation. Big shout out to collaborators @AIatMeta: @violet_zct @liliyu_lili @LukeZettlemoyer @imisra_ @michiyasunaga @kushal_tirumala Paper: arxiv.org/abs/2501.03120 More details in 🧵

English

242

22.9K

Chunjiang Ge@GeChunjiang·8 Oca

@JunhongShen1 Hello! Very impressive work! I have a few questions: 1. How could you get the compression ratio of DiT-CAT in Table.4 for each eval image.

English

Chunjiang Ge retweetledi

Brandon Amos@brandondamos·6 Oca

📢 My team at Meta (including @lipmanya and @RickyTQChen) is hiring a postdoctoral researcher to help us build the next generation of flow, transport, and diffusion models! Please apply here and message me: metacareers.com/jobs/145969190…

English

179

43K

Chunjiang Ge retweetledi

Yao Fu@Francis_YAO_·8 Oca

Our team at Google DeepMind is looking for student researcher candidates working on multimodal reasoning! If you are excited about building next generation personalized multimodal agents that interactively reason with human, and would like to pursue it through rigorous hypothesis testing, controlled experiments, and solid engineering, please send an email to yaof@google.com We look forward to creating the future together with you

English

551

55.3K

Chunjiang Ge retweetledi

Joey (e/λ)@shxf0072·28 Ara

it worked, MLA > MHA

Joey (e/λ)@shxf0072

fixed crazy drop but not matching baseline MHA .oo.

Filipino

211

39.3K

Chunjiang Ge retweetledi

Heiga Zen (全炳河)@heiga_zen·6 Ara

✨ Exciting Opportunity at Google DeepMind Tokyo! ✨ We're seeking a brilliant Research Scientist to join our team. Are you passionate about audio and generative models? Apply now and help us push the boundaries of AI! #Google #DeepMind #Audio #GenerativeModels #Tokyo #Hiring

Yuma Koizumi@yuma_koizumi

Our team at @GoogleDeepMind Tokyo is hiring a Research Scientist! We are a team researching audio restoration/separation using audio generative models. Let us conduct research on innovative generative model theory and applied technology in Tokyo! boards.greenhouse.io/deepmind/jobs/…

English

180

38.8K

Chunjiang Ge retweetledi

Tu Vu@tuvllms·2 Ara

📢✨ I am recruiting 1-2 PhD students at Virginia Tech this cycle. If you are interested in efficient model development (including model merging, parameter-efficient fine-tuning & transfer learning), instruction tuning, advanced reasoning, LLMs-as-judges, etc., please apply!!

English

281

61.8K

Chunjiang Ge retweetledi

Unnat Jain@unnatjain2010·26 Kas

Excited to share that I'll be joining University of California at Irvine as a CS faculty in '25!🌟 Faculty apps: @_krishna_murthy, @liuzhuang1234 & I share our tips: unnat.github.io/notes/Hidden_C… PhD apps: I'm looking for students in vision, robot learning, & AI4Science. Details👇

English

392

66.8K

Chunjiang Ge retweetledi

Chen (Cherise) Chen@cherise_go·25 Kas

We’re hiring a PhD student, fully-funded (UK&Overseas) at School of Computer Science, University of Sheffield, UK, starting October 2025! findaphd.com/phds/project/a…

English

2.4K

Chunjiang Ge retweetledi

Yunzhu Li@YunzhuLiYZ·25 Kas

📢 I’ll be admitting PhD students to Columbia CS in the heart of NYC 🗽—the most vibrant city in the world! 🌆 If you're passionate about advancing robot learning and envision a future where robots 🤖 are part of our daily lives, apply to join my group: yunzhuli.github.io

ColumbiaCompSci@ColumbiaCompSci

.@YunzhuLiYZ is looking for PhD students interested in robot learning to join his lab. To find out more about him - yunzhuli.github.io. For info on our #computerscience PhD programs bit.ly/CSPhDprogram. The deadline to apply is December 15.

English

303

44.8K

Chunjiang Ge retweetledi

John Hewitt@johnhewtt·26 Kas

I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding

English

157

877

106.5K

Chunjiang Ge@GeChunjiang·7 Kas

@jbhuang0604 Relaying on image-text data is because we need to evaluate the models in natural language. If we could evaluate the models byimage nature (like MAE could predict an object by seeing its shadow), vision self-supervised models would have interesting properties like LLM.

English

802

Jia-Bin Huang@jbhuang0604·7 Kas

Why is self-supervision in vision still not working? 🤔 When pretraining a transformer on TEXT-only data by predicting the next tokens, we see clear improvement trends as we scale the model, data, and computing. But after trying to pretrain a transformer on IMAGES-only data with contrastive learning (e.g., SimCLR), masked prediction (e.g., MAE), and self-distillation (e.g., Dino), at the end of the day, the most successful vision encoders still have to rely on vision-language paired data (e.g., CLIP. SigLIP). Or have people not yet tried connecting MAE/Dino with LLMs? Why is that? Thoughts?

English

730

174.1K

Chunjiang Ge@GeChunjiang·7 Kas

@sainingxie @jbhuang0604 I totally agree. Self-supervised pretraining in text is actually supervised learning. Such pretraining data could be collected like self-supervised training data without annotation. However, vision task cannot work in such way.

English

748

Saining Xie@sainingxie·7 Kas

@jbhuang0604 there's no true self-supervised learning in text - it's (strongly) supervised learning.

English

154

14.8K

Chunjiang Ge@GeChunjiang·22 Eki

@liuzhuang1234 @PrincetonCS Congrats!

English

143

Zhuang Liu@liuzhuang1234·17 Eki

Excited to share that I will be joining Princeton Computer Science @PrincetonCS as an Assistant Professor in September 2025! I'm looking for students to join me. If you are interested in working with me on VLMs, LLMs, deep learning (vision/LLM) architectures, data, training, efficiency, or understanding, please apply!

English

131

112

1.5K

173.1K

Chunjiang Ge@GeChunjiang·29 Ağu

@huybery qwen2strawberry or qwen2vl

Suomi

432

Binyuan Hui@huybery·29 Ağu

ETA 6h😎 Qwen2 family will add new members Guess what

English

235

45.3K

Chunjiang Ge@GeChunjiang·27 May

Our code is available at: github.com/alibaba/conv-l…

English

Chunjiang Ge@GeChunjiang·27 May

Key ideas: 1. optimizing the representation of ConvNeXt. We find simply updating it is good enough. 2. Training a successive stage for ConvNeXt to further compress the visual tokens. ConvLLaVA compresses visual features by 64x, compared with 14x of LLaVA-1.5.

English

Chunjiang Ge@GeChunjiang·27 May

📢Excited to share our recent work on Large Multimodal Models: ConvLLaVA. Without the encoding multiple image patches and multiple encoders, we use a hierarchical backbone, ConvNeXt, realizing high resolution understanding. arxiv.org/pdf/2405.15738

English

267

Keşfet

@cindy_x_wu @xiamengzhou @RulinShao @ZhiweiDeng8 @PangWeiKoh @orussakovsky @VisualAILab @PrincetonCS