Harris Zhang

9 posts

Harris Zhang banner
Harris Zhang

Harris Zhang

@HyperStorm9682

PhD Student at UW-Madison in Computer Vision

Madison, WI, US Katılım Ağustos 2021
38 Takip Edilen18 Takipçiler
Harris Zhang
Harris Zhang@HyperStorm9682·
Paper link: arxiv.org/abs/2603.18004 Huge thanks to the people of PRIOR team at Ai2! This paper would not have been done without you all!
English
0
0
0
90
Harris Zhang
Harris Zhang@HyperStorm9682·
The final pruning figure shows the result—static, redundant background tokens are dropped, while key actions are perfectly preserved. ✂️ By filtering out the noise, STTS significantly speeds up inference while maintaining high performance. Code is open-sourced! 🔥
Harris Zhang tweet media
English
1
0
0
132
Harris Zhang
Harris Zhang@HyperStorm9682·
New paper out! 🚨 Introducing STTS: Unified Spatio-Temporal Token Scoring for Efficient Video VLMs. We tackle the massive token bottleneck in video models by jointly identifying the tokens that actually matter. The overall figure below breaks down the core problem! 🧵👇
Harris Zhang tweet media
English
1
4
15
2.7K
Harris Zhang retweetledi
Zhengzhong Tu
Zhengzhong Tu@_vztu·
Dear @NeurIPSConf PCs, I don't understand why we still need reviewers and area chairs if PCs are finally going to take over and overturn the AC decision without providing any reason, whereby our weeks of effort spent on rebuttals (both authors and reviewers) have been ignored.
Zhengzhong Tu tweet media
English
7
25
225
30.5K
Harris Zhang retweetledi
Yong Jae Lee
Yong Jae Lee@yong_jae_lee·
Here is the final decision for one of our NeurIPS D&B ACs-accepted-but-PCs-rejected papers, with the vague message mentioning some kind of ranking. Why was the ranking necessary? Venue capacity? If so, this sets a concerning precedent. @NeurIPSConf
Yong Jae Lee tweet media
Yong Jae Lee@yong_jae_lee

@yuyinzhou_cs @NeurIPSConf I have two D&B papers in the same situation: ACs recommended accept, but PCs overruled and rejected with the same exact vague reason that you got. They should at least provide a proper reason.

English
1
4
46
8.4K
Harris Zhang retweetledi
Mu Cai
Mu Cai@MuCai7·
1/N) Are current large multimodal models like #GPT4o really good at video understanding? 🚀 We are thrilled to introduce TemporalBench to examine temporal dynamics understanding for LMMs! Our TemporalBench reveals even the SOTA LMM #GPT4o achieves only 38.5, far from reaching the human performance 67.9. With high-quality human annotations, our TemporalBench investigates 1). Action order (change the order); (2). Action frequency (1 times v.s. two times); (3). Action type (put v.s. pull); (4). Motion magnitude (slightly v.s. intensively); (5). Motion Direction/Orientation (forward v.s. Backward, circular v.s. back-and-forth). (6). Action effector (cutting with left hand v.s. cutting with right hand) Explore TemporalBench: temporalbench.github.io
Mu Cai tweet media
English
1
13
58
25.4K
Harris Zhang retweetledi
Mu Cai
Mu Cai@MuCai7·
1/N) All current video models poorly understand videos! Even when videos are less than 10 seconds long! Best model-GPT4o achieves 35.0 while humans get 90.0 in group score. Existing LMMs severely struggle to distinguish temporal differences in Vinoground vinoground.github.io
Mu Cai tweet media
English
2
27
127
16.5K