Haodong Duan

136 posts

Haodong Duan

@KennyUTC

B.S. @PKU1898 / Ph.D. @CUHKofficial Built #VLMEvalKit for MLLM evaluation

Shanghai Katılım Ağustos 2017

243 Takip Edilen219 Takipçiler

Sabitlenmiş Tweet

Haodong Duan@KennyUTC·4 Nis

OpenCompass just released RISEBench, the first benchmark on Reasoning-Informed Visual Editing (RISE). GPT-4o Image Generation only scores 36% on this challenging task! Technical Report: huggingface.co/papers/2504.02… #GPT4o

English

1.4K

Haodong Duan@KennyUTC·6 Mar

The result on RISEBench is truly impressive.

Luma@LumaLabsAI

Uni-1 is a decoder-only autoregressive transformer. Text and images are represented in a single interleaved sequence, acting both as input and as output. This enables Uni-1 to think and render in the same forward pass, achieving a new benchmark of intelligence and quality.

English

140

Haodong Duan@KennyUTC·16 Ağu

Visual recognition (especially regarding abstract contents) remains a significant challenge for current VLMs. Evaluations based on VisFactor will drive improvements in models' capabilities in this area. Welcome to the exam for your VLM on VisFactor! github.com/open-compass/V…

English

571

Haodong Duan@KennyUTC·16 Ağu

Big Cong!

Zhiqing Sun@EdwardSun0909

Excited to share that I recently joined the MSL team! Building personal superintelligence is serious and fun here. Join us!

English

327

Haodong Duan@KennyUTC·25 Tem

wow, so now the speech is good in CA universities?

English

375

Haodong Duan retweetledi

FearBuck@FearedBuck·20 Tem

families may have been divided but the world is united

English

2.1K

47.3K

553.4K

50.6M

Haodong Duan@KennyUTC·7 Tem

Just created a Gallery to display all generation results on RISEBench (by powerful models including GPT-4o Image, Gemini-2.0, Bagel, etc.). Please contact me if you want the results of your new model to be included! Tech Report: arxiv.org/abs/2504.02826

Haodong Duan@KennyUTC

English

1.1K

Haodong Duan@KennyUTC·15 Mar

- VisualPRM for Test Time Scaling of Visual Reasoning Problems: arxiv.org/abs/2503.10291 - 5%~10% Avg. Accuracy Improvement over 7 mainstream benchmarks. - This work is released with 400K Tuning Data & 3K Benchmark Problems

English

218

Haodong Duan retweetledi

Miquel Farré@micuelll·11 Mar

We just added SmolVLM2 to VLMEvalKit - now it is easier to evaluate your fine-tunes 🥰😊

English

103

4.1K

Haodong Duan retweetledi

ℏεsam@Hesamation·15 Şub

we are analyzing the top papers on @huggingface (~4000 papers mostly related to LLMs) and here is a list of the top 20 authors with the most papers published in less than 2 years. all of them Asian! (not necessarily in Asia) this is no competition, these alphas OWN the game.

English

240

16.3K

Haodong Duan@KennyUTC·28 Oca

Lame

Alexandr Wang@alexandr_wang

DeepSeek is a wake up call for America, but it doesn’t change the strategy: - USA must out-innovate &race faster, as we have done in the entire history of AI - Tighten export controls on chips so that we can maintain future leads Every major breakthrough in AI has been American

English

237

Haodong Duan@KennyUTC·28 Ara

After 1yr of Building VLMEvalKit now reaches 100+ Contributors On the journey of exploring LMM capabilities, we will go further github.com/open-compass/V…

English

720

Haodong Duan@KennyUTC·18 Ara

OpenCompass has established a leaderboard to evaluate complex reasoning capability of LMMs, consisting of four advanced multi-modal math reasoning benchmarks. Currently, Gemini-2.0-Flash took the 1st place. DM me to suggest more benchmarks and models to this LB.

English

1.7K

Haodong Duan@KennyUTC·17 Ara

@Xianbao_QIAN That's the evolution of the Internet

English

Tiezhen WANG@Xianbao_QIAN·17 Ara

I was recently asked to submit my personal ID when registering for a social media account 😱😱😱😱😱 Now I’m wondering: whose hands would you rather trust to share your real ID with — a private internet company or a centralized system, generally controlled by the government 🤯

English

437

Haodong Duan@KennyUTC·16 Ara

Real Research :lol

Jia-Bin Huang@jbhuang0604

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive. Here is what I learned.

English

433

Haodong Duan retweetledi

Jiao Sun@sunjiao123sun_·14 Ara

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

English

174

768

3.7K

2.2M

Haodong Duan@KennyUTC·10 Ara

@runsen_xu @AIatMeta run

193

Runsen Xu@runsen_xu·10 Ara

Just finished my 24-week internship at FAIR Perception, Meta. It was such a wonderful and enjoyable experience! @AIatMeta Heading back to China now. 🛫🛫