Jiraphon Yenphraphai

8 posts

Jiraphon Yenphraphai

Jiraphon Yenphraphai

@JYenphraphai

New York, NY Katılım Aralık 2022
102 Takip Edilen36 Takipçiler
Jiraphon Yenphraphai
Jiraphon Yenphraphai@JYenphraphai·
[1/3] 🚀 Introducing ShapeGen4D: video → high-quality 4D mesh sequences. A native, end-to-end video-to-4D model that turns monocular videos into high-quality mesh sequences (no per-frame optimization). details 👉 shapegen4d.github.io
English
3
22
142
9.1K
Jiraphon Yenphraphai
Jiraphon Yenphraphai@JYenphraphai·
[2/3] How? • Add spatiotemporal attention to a pretrained image-to-mesh DiT • Time-aware point sampling + 4D latent anchoring → aligned latents across frames • Shared noise across frames → stable pose & less flickering → Directly outputs a sequence of meshes
Jiraphon Yenphraphai tweet media
English
1
0
1
324
Jiraphon Yenphraphai retweetledi
Raymond A. Yeh
Raymond A. Yeh@RaymondYeh·
Tomorrow, we are presenting “Model Immunization from a Condition Number Perspective” at ICML: 📢Oral: Jul 17, 1:45–2:00 p.m. EDT @ West Exhib. Hall C 📌Poster: 2:00–4:30 p.m. EDT @ East Exhib. Hall A-B (E-1604) Come talk to Cedar and learn more about reducing model misuse!
English
1
5
7
858
Jiraphon Yenphraphai retweetledi
Saining Xie
Saining Xie@sainingxie·
Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we have learned so far: - Architecture: Sora is built on our diffusion transformer (DiT) model (published in ICCV 2023) — it's a diffusion model with a transformer backbone, in short: DiT = [VAE encoder + ViT + DDPM + VAE decoder]. According to the report, it seems there are not much additional bells and whistles. - "Video compressor network": Looks like it's just a VAE but trained on raw video data. Tokenization probably plays a significant role in getting good temporal consistency. By the way, VAE is a ConvNet, so DiT technically is a hybrid model ;) (1/n)
Saining Xie tweet media
English
39
523
2.6K
1.3M
Jiraphon Yenphraphai retweetledi
Saining Xie
Saining Xie@sainingxie·
Really enjoyed working on this project; some thoughts on why I believe combining the creative freedom of generative models with the precision of the 3D graphics pipeline could be the future. (1/n)🧵
AK@_akhaliq

Intel and NYU present Image Sculpting Precise Object Editing with 3D Geometry Control paper page: huggingface.co/papers/2401.01… present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics. This approach differs markedly from existing methods, which are confined to 2D spaces and typically rely on textual instructions, leading to ambiguity and limited control. Image Sculpting converts 2D objects into 3D, enabling direct interaction with their 3D geometry. Post-editing, these objects are re-rendered into 2D, merging into the original image to produce high-fidelity results through a coarse-to-fine enhancement process. The framework supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition. It marks an initial step towards combining the creative freedom of generative models with the precision of graphics pipelines.

English
3
18
142
31.5K