Bowei Chen

28 posts

Bowei Chen

Bowei Chen

@bowei_chen_19

Ph.D. student at UW CSE @UwRealityLab, M.S. at CMU.

Katılım Mayıs 2022
305 Takip Edilen352 Takipçiler
Sabitlenmiş Tweet
Bowei Chen
Bowei Chen@bowei_chen_19·
We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.
Bowei Chen tweet media
English
7
71
523
80.7K
Vivek Jayaram
Vivek Jayaram@vivjay30·
Overdue life update: I recently joined @sesame where I lead AI safety for the real-time conversational systems! Smart glasses + voice is the future. After trying Sesame’s upcoming glasses, I was blown away. It’s also the most realistic conversational AI I’ve seen. Real-time voice AI introduces entirely new safety problems and I'm glad to be focused on making our AI safe and aligned. We're hiring like crazy, so if you're interested in conversational voice systems or safety research then reach out!
Vivek Jayaram tweet media
English
5
0
14
679
Bowei Chen retweetledi
Jingwei Ma
Jingwei Ma@JingweiMa2·
Excited to present UltraZoom at SIGGRAPH Asia next Tuesday (Dec.16)! UltraZoom converts sparse phone captures of an object into a single gigapixel-resolution image that you can seamlessly explore. Threads below. Website: ultra-zoom.github.io Paper: arxiv.org/abs/2506.13756
English
2
3
12
1K
Bowei Chen retweetledi
Hansheng Chen
Hansheng Chen@HanshengCh·
Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation github.com/Lakonik/piFlow Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.
Hansheng Chen tweet media
English
2
27
155
36K
Bowei Chen
Bowei Chen@bowei_chen_19·
The Representation Autoencoders (RAE) by @sainingxie's team is fascinating — a brilliant demonstration that high-dimensional diffusion is indeed feasible. In our latest work on semantic encoders, we align a pretrained foundation encoder (e.g., DINOv2) as a visual tokenizer, achieving better reconstruction quality while preserving semantic consistency. Instead of freezing the encoder, we introduce a semantics-preserving fine-tuning strategy that significantly improves reconstruction quality. I can see great potential in combining RAE with our approach to build semantically rich tokenizers with large channel dimension and strong reconstruction fidelity.
Bowei Chen@bowei_chen_19

We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.

English
2
20
228
23.6K
Bowei Chen
Bowei Chen@bowei_chen_19·
@SwayStar123 @sainingxie Yes! I can see great potential in combining RAE with our approach to build semantically rich tokenizers with large channel dimension and strong reconstruction fidelity (we fine-tuned the encoder for better reconstruction).
English
0
0
1
35
Saining Xie
Saining Xie@sainingxie·
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
Saining Xie tweet media
English
57
329
1.9K
413.9K
Bowei Chen
Bowei Chen@bowei_chen_19·
@Jacoed Yes, this is shown in both our work and previous work like VA-VAE.
English
1
0
1
18
Ed
Ed@Jacoed·
@bowei_chen_19 "hence better diffusability" are we sure better semantic grounding implies better diffusability ?
English
1
0
0
66
Bowei Chen
Bowei Chen@bowei_chen_19·
We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.
Bowei Chen tweet media
English
7
71
523
80.7K
Bowei Chen
Bowei Chen@bowei_chen_19·
On LAION 2B dataset, we train a text-to-image diffusion model on our tokenizer, which converges faster and surpasses the FLUX-VAE baseline. Check out more details and results in our paper! [8/N]
Bowei Chen tweet media
English
1
1
13
1.2K
Bowei Chen
Bowei Chen@bowei_chen_19·
#CVPR2024 Arm-captured selfies only capture your partial body. Instead, what if you could capture a full-body photo that someone else would take of you in the scene? We present Total Selfie, which generates full-body selfies from photographs originally taken at arms length. 1/n
Bowei Chen tweet media
English
2
2
7
1.9K
Bowei Chen
Bowei Chen@bowei_chen_19·
We will be presenting Total Selfie at Arch 4A-E #185 this afternoon. Come and talk with us!
Bowei Chen@bowei_chen_19

#CVPR2024 Arm-captured selfies only capture your partial body. Instead, what if you could capture a full-body photo that someone else would take of you in the scene? We present Total Selfie, which generates full-body selfies from photographs originally taken at arms length. 1/n

English
0
4
8
1.3K