Chao Feng

41 posts

Chao Feng

Chao Feng

@chaof1234

PhD student @cornell_tech @Cornell_CS | Research Intern @AdobeResearch

Katılım Temmuz 2024
183 Takip Edilen132 Takipçiler
Chao Feng retweetledi
Chao Feng retweetledi
Xiyao Wang
Xiyao Wang@XiyaoWang10·
Thanks to AK for sharing our paper!🎉 Training a generative critic model to judge responses makes it BETTER at EVERYTHING. Sometimes the best policy comes from good judgment. Your critic model has been hiding its true potential🌟 🚀Introducing LLaVA-Critic-R1, a family of VLMs that serve as both critic and policy in a single model. No policy training. No in-domain task data. Just 40k preference pairs "Is response A or B better?" for Critic RL Training! Result: +5.7% on 26 visual benchmarks including visual understanding, reasoning, even GUI agents. 71.9 7B-Scale SoTA performance on MMMU! Learn to judge, excel at everything🎭 📄 Paper: huggingface.co/papers/2509.00… 💻 Code: github.com/LLaVA-VL/LLaVA…
AK@_akhaliq

LLaVA-Critic-R1 Your Critic Model is Secretly a Strong Policy Model

English
1
7
17
8.3K
Chao Feng retweetledi
AK
AK@_akhaliq·
GPS as a Control Signal for Image Generation
Français
4
15
87
16K
Chao Feng retweetledi
seunghyun lee
seunghyun lee@seunghy23235·
Please join us on poster #369 tomorrow afternoon @CVPR
seunghyun lee@seunghy23235

@CVPR We introduce Cropper. Image cropping is a task to find an aesthetic part in image. VLMs as generalist often struggles with the precise, continuous coordinate output (text) required for accurate crop box prediction without further fine-tuning.

English
0
3
12
1.3K
Chao Feng
Chao Feng@chaof1234·
Beyond 2D, we can lift a 3D model directly from our GPS-conditioned model by score distillation sampling, which is trained per landmark.
Chao Feng tweet media
English
1
0
2
157
Chao Feng
Chao Feng@chaof1234·
Sharing our #CVPR2025 paper: "GPS as a Control Signal for Image Generation"! 🛰️+✍️ We turn the GPS tag stored in EXIF of photos into a control signal for diffusion models—so they don’t just know what you asked for, but where you want it to look like. Come to see our poster at Friday 13 Jun 10:30 a.m. — 12:30 p.m. (CT) in ExHall D, Poster #250.
English
2
10
37
3.1K
Chao Feng retweetledi
Ayush Shrivastava
Ayush Shrivastava@ayshrv·
Excited to share our CVPR 2025 paper on cross-modal space-time correspondence! We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision. Our approach learns correspondences through contrastive random walks across visual modalities. #CVPR2025 (1/6)
Ayush Shrivastava tweet media
English
1
26
120
9K
Chao Feng retweetledi
Jeongsoo Park
Jeongsoo Park@jespark0·
Can AI image detectors keep up with new fakes? Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild! Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes. #CVPR2025 🧵 (1/5)
English
1
9
24
1.8K
Chao Feng retweetledi
Yiming Dou
Yiming Dou@_YimingDou·
Ever wondered how a scene sounds👂 when you interact👋 with it? Introducing our #CVPR2025 work "Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes" -- we make 3D scene reconstructions audibly interactive! yimingdou.com/hearing_hands/
English
2
30
96
8.2K
Chao Feng retweetledi
Daniel Geng
Daniel Geng@dangengdg·
Hello! If you like pretty images and videos and want a rec for CVPR oral session, you should def go to Image/Video Gen, Friday at 9am: I'll be presenting "Motion Prompting" @RyanBurgert will be presenting "Go with the Flow" and @ChangPasca1650 will be presenting "LookingGlass"
English
3
16
66
5.3K
Chao Feng retweetledi
Chao Feng retweetledi
tiange
tiange@tiangeluo·
Will VLMs adhere strictly to their learned priors, unable to perform visual reasoning on content never existed on the Internet? We propose ViLP, a benchmark designed to probe the visual-language priors of VLMs by constructing Question-Image-Answer triplets that deliberately deviate from existing data. Check our gallery at vilp-team.github.io & huggingface.co/datasets/ViLP/… To further enhance VLMs’ reliance on visual information, we propose Image-DPO, as elaborated in this thread. w/ @AngCao3 @GunheeLee @jcjohnss @honglaklee
English
4
4
7
114.9K
Chao Feng retweetledi
Chris Rockwell
Chris Rockwell@_crockwell·
Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…
English
2
38
177
42.9K
Chao Feng retweetledi
Furong Huang
Furong Huang@furongh·
🧠💡 What if your 7B model could beat GPT-4o and Qwen2.5-72B—using just 11k training samples? No distillation. No warm-start. Just smart data and reinforcement learning. Inspired by Moravec’s Paradox, we let the model decide what's actually hard. 🚨 New paper: "SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement" We show how ThinkLite-VL-7B achieves SoTA on MathVista—75.1%, surpassing much larger models. 👇 Here’s how we did it: 🔗 arxiv.org/abs/2504.07934 🧠 Code: github.com/si0wang/ThinkL… #AI #VisionLanguageModels #ReinforcementLearning #MachineLearning #LessIsMore
English
6
75
473
63.5K