Haodong Duan

136 posts

Haodong Duan

Haodong Duan

@KennyUTC

B.S. @PKU1898 / Ph.D. @CUHKofficial Built #VLMEvalKit for MLLM evaluation

Shanghai Katılım Ağustos 2017
243 Takip Edilen219 Takipçiler
Sabitlenmiş Tweet
Haodong Duan
Haodong Duan@KennyUTC·
OpenCompass just released RISEBench, the first benchmark on Reasoning-Informed Visual Editing (RISE). GPT-4o Image Generation only scores 36% on this challenging task! Technical Report: huggingface.co/papers/2504.02… #GPT4o
Haodong Duan tweet mediaHaodong Duan tweet mediaHaodong Duan tweet mediaHaodong Duan tweet media
English
0
0
6
1.4K
Haodong Duan
Haodong Duan@KennyUTC·
Visual recognition (especially regarding abstract contents) remains a significant challenge for current VLMs. Evaluations based on VisFactor will drive improvements in models' capabilities in this area. Welcome to the exam for your VLM on VisFactor! github.com/open-compass/V…
English
0
2
5
571
Haodong Duan
Haodong Duan@KennyUTC·
wow, so now the speech is good in CA universities?
English
0
0
2
375
Haodong Duan retweetledi
FearBuck
FearBuck@FearedBuck·
families may have been divided but the world is united
English
2.1K
47.3K
553.4K
50.6M
Haodong Duan
Haodong Duan@KennyUTC·
Just created a Gallery to display all generation results on RISEBench (by powerful models including GPT-4o Image, Gemini-2.0, Bagel, etc.). Please contact me if you want the results of your new model to be included! Tech Report: arxiv.org/abs/2504.02826
Haodong Duan tweet media
Haodong Duan@KennyUTC

OpenCompass just released RISEBench, the first benchmark on Reasoning-Informed Visual Editing (RISE). GPT-4o Image Generation only scores 36% on this challenging task! Technical Report: huggingface.co/papers/2504.02… #GPT4o

English
1
3
8
1.1K
Haodong Duan
Haodong Duan@KennyUTC·
- VisualPRM for Test Time Scaling of Visual Reasoning Problems: arxiv.org/abs/2503.10291 - 5%~10% Avg. Accuracy Improvement over 7 mainstream benchmarks. - This work is released with 400K Tuning Data & 3K Benchmark Problems
Haodong Duan tweet media
English
0
0
1
218
Haodong Duan retweetledi
Miquel Farré
Miquel Farré@micuelll·
We just added SmolVLM2 to VLMEvalKit - now it is easier to evaluate your fine-tunes 🥰😊
Miquel Farré tweet media
English
0
10
103
4.1K
Haodong Duan retweetledi
ℏεsam
ℏεsam@Hesamation·
we are analyzing the top papers on @huggingface (~4000 papers mostly related to LLMs) and here is a list of the top 20 authors with the most papers published in less than 2 years. all of them Asian! (not necessarily in Asia) this is no competition, these alphas OWN the game.
ℏεsam tweet media
English
3
37
240
16.3K
Haodong Duan
Haodong Duan@KennyUTC·
After 1yr of Building VLMEvalKit now reaches 100+ Contributors On the journey of exploring LMM capabilities, we will go further github.com/open-compass/V…
Haodong Duan tweet media
English
0
1
8
720
Haodong Duan
Haodong Duan@KennyUTC·
OpenCompass has established a leaderboard to evaluate complex reasoning capability of LMMs, consisting of four advanced multi-modal math reasoning benchmarks. Currently, Gemini-2.0-Flash took the 1st place. DM me to suggest more benchmarks and models to this LB.
Haodong Duan tweet media
English
0
2
8
1.7K
Tiezhen WANG
Tiezhen WANG@Xianbao_QIAN·
I was recently asked to submit my personal ID when registering for a social media account 😱😱😱😱😱 Now I’m wondering: whose hands would you rather trust to share your real ID with — a private internet company or a centralized system, generally controlled by the government 🤯
English
2
0
3
437
Haodong Duan retweetledi
Jiao Sun
Jiao Sun@sunjiao123sun_·
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
Jiao Sun tweet media
English
174
768
3.7K
2.2M
Runsen Xu
Runsen Xu@runsen_xu·
Just finished my 24-week internship at FAIR Perception, Meta. It was such a wonderful and enjoyable experience! @AIatMeta Heading back to China now. 🛫🛫
Runsen Xu tweet media
English
4
1
87
6.8K
Pangyu 胖鱼 🐠
Pangyu 胖鱼 🐠@pangyusio·
@JyaXieng 但不得不说,技术强到这个程度,应该还是有人会惜才的
中文
9
0
123
60.8K
Pangyu 胖鱼 🐠
Pangyu 胖鱼 🐠@pangyusio·
这是什么程序员龙傲天剧情? 被字节索赔 800w 的实习生,刚刚中了人工智能顶级会议 NeurIPS 的 best paper。 NeurIPS best paper的含金量不亚于 800w。 这个等级的人才,去大厂年薪 200w+不成问题。
Pangyu 胖鱼 🐠 tweet media
中文
103
120
1.4K
640.7K