Hyojun Go

24 posts

Hyojun Go

Hyojun Go

@gohyojun3

ELLIS PhD Student @ ETH / Google

Zurich, Switzerland Katılım Temmuz 2022
150 Takip Edilen69 Takipçiler
Sabitlenmiş Tweet
Hyojun Go
Hyojun Go@gohyojun3·
Our recent finding on Diffusion Alignment: a reward model in pixel space can be easily transferred to score noisy diffusion latents directly — at small finetuning cost, via stitching. This makes Faster & Better for both Training & Inference Alignment. Meet StitchVM👇 1/
Hyojun Go tweet media
English
3
6
28
6.3K
Hyojun Go retweetledi
Grigory Bartosh
Grigory Bartosh@GrigoryBartosh·
🚀 Excited to share my @GoogleDeepMind student researcher project: Dual-Rate Diffusion✨ ⚡ A simple construction that speeds up both regular diffusion and distilled models by interleaving a heavy context encoder with a light conditional denoiser. 🧵👇
Grigory Bartosh tweet media
English
6
29
190
16.8K
Hyojun Go retweetledi
Hyungjin Chung
Hyungjin Chung@hyungjin_chung·
For alignment you need V, but is hard to compute. Most methods try to approximate with 1) Tweedie, which is biased 2) MC roll-outs, which is slow with high var. Training V was often neglected since it's hard. We beg to differ. StitchVM enables this! Led by @gohyojun3 👇
Hyojun Go@gohyojun3

Our recent finding on Diffusion Alignment: a reward model in pixel space can be easily transferred to score noisy diffusion latents directly — at small finetuning cost, via stitching. This makes Faster & Better for both Training & Inference Alignment. Meet StitchVM👇 1/

English
0
6
27
4.9K
Hyojun Go
Hyojun Go@gohyojun3·
Result 4️⃣ — Training-time alignment with DRaFT & DiffusionNFT No need for full rollouts. Just stop denoising at an intermediate step and use StitchVM's inference as the reward signal. Now we have much faster convergence 7/
Hyojun Go tweet media
English
1
0
1
356
Hyojun Go
Hyojun Go@gohyojun3·
Our recent finding on Diffusion Alignment: a reward model in pixel space can be easily transferred to score noisy diffusion latents directly — at small finetuning cost, via stitching. This makes Faster & Better for both Training & Inference Alignment. Meet StitchVM👇 1/
Hyojun Go tweet media
English
3
6
28
6.3K
Hyojun Go retweetledi
Yuanwen Yue
Yuanwen Yue@YueYuanwen·
Want a lighter yet stronger Point Transformer? Meet LitePT ✨ LitePT is a lightweight, high-performance 3D point cloud architecture for a wide range of point cloud processing tasks. Our smallest variant LitePT-S, features 3.6× fewer parameters, 2× faster runtime and 2× lower memory footprint than PTv3, while already matching or outperforming it across a range of benchmarks. 💻Code: github.com/prs-eth/LitePT 🌐Project page: litept.github.io 📰Paper: arxiv.org/abs/2512.13689 with Damien Robert, @jianyuan_wang , Sunghwan Hong, Jan Dirk Wegner, Christian Rupprecht, and Konrad Schindler
Yuanwen Yue tweet media
English
3
26
126
11.6K
Hyojun Go retweetledi
Michael Niemeyer
Michael Niemeyer@Mi_Niemeyer·
Combining video diffusion and 3D feedforward models by simply stiching them together in latent space - very cool idea! Make sure to check out this novel work from my collagues at Google and ETH!
Prune Truong@prunetruong

🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: arxiv.org/abs/2510.13454 ➡️ Website: gohyojun15.github.io/VIST3A/ Collaboration between ETH & Google with Hyojun Go, @DNarnhofer, Goutam Bhat, @fedassa, and Konrad Schindler.

English
0
7
64
8.4K
Hyojun Go retweetledi
Dominik Narnhofer
Dominik Narnhofer@DNarnhofer·
Want to leverage the power of SOTA 3D models like VGGT & Video LDMs for 3D generation? Now you can! 🚀 Introducing VIST3A — we stitch pretrained video generators to 3D foundation models and align them via reward finetuning. 📄 arxiv.org/abs/2510.13454 🌐 gohyojun15.github.io/VIST3A
English
0
3
14
8.7K
Hyojun Go retweetledi
Prune Truong
Prune Truong@prunetruong·
🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: arxiv.org/abs/2510.13454 ➡️ Website: gohyojun15.github.io/VIST3A/ Collaboration between ETH & Google with Hyojun Go, @DNarnhofer, Goutam Bhat, @fedassa, and Konrad Schindler.
Dominik Narnhofer@DNarnhofer

Want to leverage the power of SOTA 3D models like VGGT & Video LDMs for 3D generation? Now you can! 🚀 Introducing VIST3A — we stitch pretrained video generators to 3D foundation models and align them via reward finetuning. 📄 arxiv.org/abs/2510.13454 🌐 gohyojun15.github.io/VIST3A

English
2
11
88
16.9K
Hyojun Go retweetledi
Hyungjin Chung
Hyungjin Chung@hyungjin_chung·
Even the SOTA VideoLLMs see videos in 1 fps, and you CANNOT perceive fine-grained motion 💃 with this frequency 🥲 📣 Presenting Video Parallel Scaling (VPS), an inference-time strategy that lets VideoLLMs see more frames by scaling compute in the parallel-axis 🤩
Hyungjin Chung tweet media
English
1
11
38
2.6K
Hyojun Go retweetledi
Hyungjin Chung
Hyungjin Chung@hyungjin_chung·
3D consistent videos are hard to generate 🙁 What if we could steer them to be consistent during generation? Introducing SteerX🛞, a plug-and-play sampling method that works with *any* video diffusion to make videos physically plausible🤩 w/ @bypark___ @gohyojun3 @namhyelin99
Hyungjin Chung tweet media
English
2
18
104
7.3K
Hyojun Go retweetledi
MrNeRF
MrNeRF@janusch_patas·
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis TL;DR: SplatFlow is a unified framework that combines a latent-space multi-view generator and a Gaussian Splatting Decoder to enable efficient 3D generation, editing, and inpainting directly from text prompts. Abstract (excerpt): SplatFlow comprises two main components: a multi-view rectified flow (RF) model and a Gaussian Splatting Decoder (GSDecoder). The multi-view RF model operates in latent space, generating multi-view images, depths, and camera poses simultaneously, conditioned on text prompts, thus addressing challenges like diverse scene scales and complex camera trajectories in real-world settings. Then, the GSDecoder efficiently translates these latent outputs into 3DGS representations through a feed-forward 3DGS method. Leveraging training-free inversion and inpainting techniques, SplatFlow enables seamless 3DGS editing and supports a broad range of 3D tasks-including object editing, novel view synthesis, and camera pose estimation-within a unified framework without requiring additional complex pipelines. We validate SplatFlow's capabilities on the MVImgNet and DL3DV-7K datasets, demonstrating its versatility and effectiveness in various 3D generation, editing, and inpainting-based tasks.
English
1
7
77
5K