Weijie Wang

61 posts

Weijie Wang banner
Weijie Wang

Weijie Wang

@wjwang2003

PhD student at ZIP Lab, Zhejiang University Research Intern @ ByteDance Seed | Microsoft Research

Zhejiang, China 가입일 Şubat 2024
104 팔로잉243 팔로워
고정된 트윗
Weijie Wang
Weijie Wang@wjwang2003·
🚀 Introducing World-R1: Video models already know 3D — they just need RL to wake it up! No arch changes. No video training data. No extra inference cost.⬇️ 🌐Website: aka.ms/world-r1
English
13
68
533
54.5K
Weijie Wang
Weijie Wang@wjwang2003·
@yourkaisensei You only need to design custom text prompts and trajectory generation strategies. We aim to offer a new perspective instead of merely a trained model.🥰
English
0
0
0
24
kai
kai@yourkaisensei·
@wjwang2003 curious how you’re checking for reward hacking here. do the 3D gains hold on weird camera paths outside the training trajectory distribution?
English
2
0
1
213
Weijie Wang
Weijie Wang@wjwang2003·
🚀 Introducing World-R1: Video models already know 3D — they just need RL to wake it up! No arch changes. No video training data. No extra inference cost.⬇️ 🌐Website: aka.ms/world-r1
English
13
68
533
54.5K
Weijie Wang
Weijie Wang@wjwang2003·
@yourkaisensei Most camera gains come from latent injection. You can add randomization in training to adapt to diverse camera paths.
English
0
0
0
40
Weijie Wang
Weijie Wang@wjwang2003·
@MrManderly It’s just because of X’s video size limit. Not heavy computation. I’m not sure if upgrading helps🤣
English
0
0
1
24
MrManderly
MrManderly@MrManderly·
@wjwang2003 This is fantastic work. Can we infer that this is currently very expensive computationally from the low frame rate examples?
English
1
0
1
149
Azmine Wasi @ICML
Azmine Wasi @ICML@AzmineWasi·
@icmlconf ICML Position Paper decisions seems out, indirectly 👀 Public-release or In-person presentation...?
English
3
0
2
2.5K
田中義弘 | taziku CEO / AI × Creative
動画モデルは、最初から3Dを少し知っていた!?。 World-R1は、既存のText-to-Videoモデルに対して、追加推論コストなしで3D整合性へ寄せ、物体の永続性・幾何一貫性・カメラ制御を改善する方法。 人間評価でも幾何一貫性92%と非常に高いスコアを記録 詳細は🧵
日本語
1
1
5
1.4K
Weijie Wang
Weijie Wang@wjwang2003·
@newlinedotco Totally agree on the inference tax concern! Good news: World-R1 adds zero overhead at inference. 3D foundation models and VLM critics only serve as reward signals during RL training. Once trained, it's the same architecture, same speed, just better 3D understanding built in ✅
English
0
0
3
375
💥 \newline
💥 \newline@newlinedotco·
@wjwang2003 video models knowing 3d is a massive unlock but the real hurdle for agents has always been the inference tax of running these heavy vision-to-action loops.
English
1
0
1
418
Weijie Wang 리트윗함
DailyPapers
DailyPapers@HuggingPapers·
Microsoft just released World-R1 A framework that aligns text-to-video generation with 3D constraints through reinforcement learning, using feedback from pre-trained 3D foundation models to enforce structural coherence without altering the underlying architecture.
DailyPapers tweet media
English
1
19
79
5.2K
Weijie Wang
Weijie Wang@wjwang2003·
🔑 How it works: • Embed camera trajectories into diffusion noise, zero extra modules • 3D rewards from Depth Anything 3 + Qwen3-VL as geometry critics • Periodic decoupled training: buildings stay rigid, flags still wave 🏗️🚩 • 3K text prompts only, no video data
Weijie Wang tweet media
English
0
0
14
1.4K
Weijie Wang
Weijie Wang@wjwang2003·
📢 We release "Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective" — a comprehensive survey covering 200+ papers on feed-forward 3D reconstruction! Instead of categorizing by 3D representations, we propose a problem-driven taxonomy. 🌐 ff3d-survey.github.io
Weijie Wang tweet mediaWeijie Wang tweet mediaWeijie Wang tweet mediaWeijie Wang tweet media
English
6
22
114
7.8K
Weijie Wang
Weijie Wang@wjwang2003·
6+ application areas (AD, robotics, SLAM, video gen…)
Weijie Wang tweet media
English
0
0
1
160