Chunjiang Ge

24 posts

Chunjiang Ge

Chunjiang Ge

@GeChunjiang

Ph.D. Candidate @LeapLabTHU and undergrad @Tsinghua_Uni . | Multimodal & Generative Models | Seeking Postdoc & Industrial Research Positions

Beijing Katılım Ağustos 2023
647 Takip Edilen72 Takipçiler
Xindi Wu
Xindi Wu@cindy_x_wu·
Want to train large vision-language models but drowning in data? arxiv.org/abs/2501.00654 Introducing ICONS - we demonstrate how to select only 20% of training samples while maintaining 98.6% of the performance, and 60% of training samples to achieve 102.1% of the performance.
English
5
64
303
42.3K
Chunjiang Ge
Chunjiang Ge@GeChunjiang·
@JunhongShen1 2. Since the ability of tokenizer for different f could be acquired by just training with different compression ratio without a predefined f. Do you try such setting. Thanks!
English
0
0
0
82
Junhong Shen
Junhong Shen@JunhongShen1·
Stay tuned for more updates and check out our full paper here: arxiv.org/abs/2501.03120 😺 P.S. My internship at Meta has been an incredible experience. Working with the team has been both rewarding and inspiring, and I’ve learned so much from each of my collaborators. I highly recommend this internship opportunity to researchers seeking to grow and make meaningful contributions!
English
3
4
13
6.7K
Junhong Shen
Junhong Shen@JunhongShen1·
Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just captions (no pixels!) to guide tokenization, enabling adaptive representation for text-to-image generation. Big shout out to collaborators @AIatMeta: @violet_zct @liliyu_lili @LukeZettlemoyer @imisra_ @michiyasunaga @kushal_tirumala Paper: arxiv.org/abs/2501.03120 More details in 🧵
Junhong Shen tweet media
English
4
49
242
22.9K
Chunjiang Ge
Chunjiang Ge@GeChunjiang·
@JunhongShen1 Hello! Very impressive work! I have a few questions: 1. How could you get the compression ratio of DiT-CAT in Table.4 for each eval image.
English
0
0
0
82
Chunjiang Ge retweetledi
Yao Fu
Yao Fu@Francis_YAO_·
Our team at Google DeepMind is looking for student researcher candidates working on multimodal reasoning! If you are excited about building next generation personalized multimodal agents that interactively reason with human, and would like to pursue it through rigorous hypothesis testing, controlled experiments, and solid engineering, please send an email to yaof@google.com We look forward to creating the future together with you
English
6
59
551
55.3K
Chunjiang Ge retweetledi
Heiga Zen (全 炳河)
Heiga Zen (全 炳河)@heiga_zen·
✨ Exciting Opportunity at Google DeepMind Tokyo! ✨ We're seeking a brilliant Research Scientist to join our team. Are you passionate about audio and generative models? Apply now and help us push the boundaries of AI! #Google #DeepMind #Audio #GenerativeModels #Tokyo #Hiring
Yuma Koizumi@yuma_koizumi

Our team at @GoogleDeepMind Tokyo is hiring a Research Scientist! We are a team researching audio restoration/separation using audio generative models. Let us conduct research on innovative generative model theory and applied technology in Tokyo! boards.greenhouse.io/deepmind/jobs/…

English
5
35
180
38.8K
Chunjiang Ge retweetledi
Tu Vu
Tu Vu@tuvllms·
📢✨ I am recruiting 1-2 PhD students at Virginia Tech this cycle. If you are interested in efficient model development (including model merging, parameter-efficient fine-tuning & transfer learning), instruction tuning, advanced reasoning, LLMs-as-judges, etc., please apply!!
English
6
72
281
61.8K
Chunjiang Ge retweetledi
Unnat Jain
Unnat Jain@unnatjain2010·
Excited to share that I'll be joining University of California at Irvine as a CS faculty in '25!🌟 Faculty apps: @_krishna_murthy, @liuzhuang1234 & I share our tips: unnat.github.io/notes/Hidden_C… PhD apps: I'm looking for students in vision, robot learning, & AI4Science. Details👇
Unnat Jain tweet media
English
38
73
392
66.8K
Chunjiang Ge retweetledi
Chen (Cherise) Chen
Chen (Cherise) Chen@cherise_go·
We’re hiring a PhD student, fully-funded (UK&Overseas) at School of Computer Science, University of Sheffield, UK, starting October 2025! findaphd.com/phds/project/a…
English
0
5
16
2.4K
Chunjiang Ge retweetledi
Yunzhu Li
Yunzhu Li@YunzhuLiYZ·
📢 I’ll be admitting PhD students to Columbia CS in the heart of NYC 🗽—the most vibrant city in the world! 🌆 If you're passionate about advancing robot learning and envision a future where robots 🤖 are part of our daily lives, apply to join my group: yunzhuli.github.io
ColumbiaCompSci@ColumbiaCompSci

.@YunzhuLiYZ is looking for PhD students interested in robot learning to join his lab. To find out more about him - yunzhuli.github.io. For info on our #computerscience PhD programs bit.ly/CSPhDprogram. The deadline to apply is December 15.

English
2
53
303
44.8K
Chunjiang Ge retweetledi
John Hewitt
John Hewitt@johnhewtt·
I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding
English
18
157
877
106.5K
Chunjiang Ge
Chunjiang Ge@GeChunjiang·
@jbhuang0604 Relaying on image-text data is because we need to evaluate the models in natural language. If we could evaluate the models byimage nature (like MAE could predict an object by seeing its shadow), vision self-supervised models would have interesting properties like LLM.
English
0
0
4
802
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
Why is self-supervision in vision still not working? 🤔 When pretraining a transformer on TEXT-only data by predicting the next tokens, we see clear improvement trends as we scale the model, data, and computing. But after trying to pretrain a transformer on IMAGES-only data with contrastive learning (e.g., SimCLR), masked prediction (e.g., MAE), and self-distillation (e.g., Dino), at the end of the day, the most successful vision encoders still have to rely on vision-language paired data (e.g., CLIP. SigLIP). Or have people not yet tried connecting MAE/Dino with LLMs? Why is that? Thoughts?
English
70
48
730
174.1K
Chunjiang Ge
Chunjiang Ge@GeChunjiang·
@sainingxie @jbhuang0604 I totally agree. Self-supervised pretraining in text is actually supervised learning. Such pretraining data could be collected like self-supervised training data without annotation. However, vision task cannot work in such way.
English
0
0
0
748
Saining Xie
Saining Xie@sainingxie·
@jbhuang0604 there's no true self-supervised learning in text - it's (strongly) supervised learning.
English
7
4
154
14.8K
Zhuang Liu
Zhuang Liu@liuzhuang1234·
Excited to share that I will be joining Princeton Computer Science @PrincetonCS as an Assistant Professor in September 2025! I'm looking for students to join me. If you are interested in working with me on VLMs, LLMs, deep learning (vision/LLM) architectures, data, training, efficiency, or understanding, please apply!
English
131
112
1.5K
173.1K
Binyuan Hui
Binyuan Hui@huybery·
ETA 6h😎 Qwen2 family will add new members Guess what
Binyuan Hui tweet media
English
35
14
235
45.3K
Chunjiang Ge
Chunjiang Ge@GeChunjiang·
Key ideas: 1. optimizing the representation of ConvNeXt. We find simply updating it is good enough. 2. Training a successive stage for ConvNeXt to further compress the visual tokens. ConvLLaVA compresses visual features by 64x, compared with 14x of LLaVA-1.5.
English
1
0
1
86
Chunjiang Ge
Chunjiang Ge@GeChunjiang·
📢Excited to share our recent work on Large Multimodal Models: ConvLLaVA. Without the encoding multiple image patches and multiple encoders, we use a hierarchical backbone, ConvNeXt, realizing high resolution understanding. arxiv.org/pdf/2405.15738
Chunjiang Ge tweet media
English
1
1
3
267