Hsin-Ying Lee

176 posts

Hsin-Ying Lee

@hyjameslee

CTO @app_illoca

Katılım Ağustos 2013

136 Takip Edilen383 Takipçiler

Hsin-Ying Lee retweetledi

Ziya Erkoç@ErkocZiya·12 Haz

Presenting PrEditor3D at #CVPR2025 📢📢 If you'd like to learn more about our work and discuss 3D generation/editing, come visit our poster on Friday, June 13th, in ExHall D between 10:30-12:30 (Poster #44). Project Page: ziyaerkoc.com/preditor3d

English

3.7K

Hsin-Ying Lee retweetledi

Jim Fan@DrJimFan·8 May

The Physical Turing Test: your house is a complete mess after a Sunday hackathon. On Monday night, you come home to an immaculate living room and a candlelight dinner. And you couldn't tell whether a human or a machine had been there. Deceptively simple, insanely hard. It is the next North Star of AI. The dream that keeps me awake 12 am at the lab. The vision for the next computing platform that automates chunks of atoms instead of chunks of bits. Thanks Sequoia for hosting me at AI Ascent! Below is my full talk on the first principles to solve general-purpose robotics: how we think about the data strategy and scaling laws. I assure you it will be 17 minutes you don't regret!

English

207

1.2K

134.7K

Hsin-Ying Lee retweetledi

Chin-Yi Cheng@chinyich·3 Nis

We're hiring a Software Engineer to join our team! 🚀 We're building a new product at the intersection of design and AI, and we're looking for someone who loves creating innovative experiences that solve complex, real-world challenges. (1/3)

English

3.3K

Hsin-Ying Lee retweetledi

Willi Menapace@WilliMenapace·14 Oca

Video-to-Audio and Audio-to-Video models struggle with temporal alignment. AV-Link solves the problem by conditioning on diffusion model features Great collaboration with @moayedhajiali , @siarohin9013 , @isskoro , @alpercanbe , Kwot Sin Lee, Vicente Ordonez and @SergeyTulyakov

Moayed Haji Ali@moayedhajiali

Can pretrained diffusion models connect for cross-modal generation? 📢 Introducing AV-Link ♾ Bridging unimodal diffusion models in one framework to enable: 📽️ ➡️ 🔊 Video-to-Audio 🔊 ➡️ 📽️ Audio-to-Video 🌐: snap-research.github.io/AVLink/ 📄: hf.co/papers/2412.15… ⤵️ Results

English

843

Hsin-Ying Lee retweetledi

Gordon (Guocheng) Qian@guocheng_qian·17 Ara

🚀🚀🚀Omni-ID: Holistic Identity Representation Designed for Generative Tasks snap-research.github.io/Omni-ID/ Excited to share Omni-ID, a novel facial representation tailored for generative tasks! It captures diverse expressions & poses, enabling high-fidelity personalized generation.

English

106

11.5K

Hsin-Ying Lee@hyjameslee·11 Ara

Our 4Real poster session is at @NeurIPSConf tomorrow, from 11 am to 2 pm. 🎉 Swing by and say hi if you're around! Also, don’t miss our follow-up work, 4Real-Video, at snap-research.github.io/4Real-Video/ 🚀!

English

1.7K

Hsin-Ying Lee@hyjameslee·10 Ara

Please check out our latest 3D shape editing framework, PrEditor3D! PrEditor3D can precisely edit only the intended regions, and it’s fast!

Matthias Niessner@MattNiessner

📢📢 𝐏𝐫𝐄𝐝𝐢𝐭𝐨𝐫𝟑𝐃: 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐏𝐫𝐞𝐜𝐢𝐬𝐞 𝟑𝐃 𝐒𝐡𝐚𝐩𝐞 𝐄𝐝𝐢𝐭𝐢𝐧𝐠 📢📢 We propose a training-free 3D shape editing approach that rapidly and precisely edits the regions intended by the user and keeps the rest as is. Using a quickly brushed mask and a text prompt, we first apply multi-view editing in the 2D domain and then run our merging algorithm in the 3D feature space to ensure that the edited shape is loyal to the input shape. Project Page: ziyaerkoc.com/preditor3d/ Video: youtube.com/watch?v=Ty2xXa… Great work by @ErkocZiya @cangumeli Chaoyang Wang @angelaqdai @peter_wonka @hyjameslee @PeiyeZ

English

1.5K

Hsin-Ying Lee retweetledi

Ruiqi Gao@RuiqiGao·2 Ara

A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

English

199

945

172.4K

Hsin-Ying Lee@hyjameslee·6 Ara

4Real-Video is a two-stream architecture for 4D video generation that independently handles temporal and view updates, synchronizing streams to ensure consistency. It proposes flexible synchronization mechanisms that enable efficient and adaptive token interactions.

English

155

Hsin-Ying Lee@hyjameslee·6 Ara

📢4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion Do you want to explore space-time traversal? Do you want to convert any real-world images/videos to 4D? Check out our recent 4Real-Video! snap-research.github.io/4Real-Video/

English

4.2K

Hsin-Ying Lee@hyjameslee·1 Kas

🔥DELTA: Dense Efficient Long-range 3D Tracking for any video DELTA can efficiently (10x faster!) track EVERY pixel in 3D space from monocular videos. Please check out our project page and paper for more details and samples! 👑snap-research.github.io/DELTA/

English

101

5.5K

Hsin-Ying Lee@hyjameslee·30 Eki

🌟 Snap Research Internship🌟 The creative vision team is looking for 2025 interns! If you are a PhD student with passion in 3D/4D, video gen, personalization, or efficiency, reach out to via the application link or emails! 🔥Website: snap-research.github.io/cv-call-for-in…

English

Hsin-Ying Lee retweetledi

Wei-Ning Hsu@mhnt1580·7 Eki

As a speech/audio researcher, I think it’s a big breakthrough in advancing *human-level audio generation*. Why? Because this is NOT just a video-to-audio model that generates what it sees in the physical world... but a model that learns to *DESIGN* sounds like a human🤯 Here are what Movie Gen Audio does: 1⃣ far better synchronization than any existing video-to-SFX models 2⃣ compose music based on videos, taking sentiments into account 3⃣ generates SFX and music jointly, and blends them harmoniously like a pro 4⃣ inject non-diegetic sound effects, like "whoosh" when changing scenes 5⃣ create soundtracks that are >2 minutes Let’s watch a few videos to understand what Movie Gen Audio has done behind the scenes. ⭐️ALL audio tracks are generated by Movie Gen Audio without any post-processing⭐️

Wei-Ning Hsu@mhnt1580

Now HEAR this (not just watch) - We've got audio covered for generated videos 🔊 Introducing Movie Gen Audio, which adds 48kHz synced SFX and aligned music to amazing videos from Movie Gen Video (and other sources!) Super honored to work with this amazing team! More to come🔥🔥

English

8.5K

Hsin-Ying Lee@hyjameslee·12 Haz

Challenges majorly lie in the limited camera controll and lack of true 3D consistency of current video models. We propose a series of strategies to address these issues. Check out our paper for details! (3/3)

English

Hsin-Ying Lee@hyjameslee·12 Haz

We propose a pipeline to abuse video diffusion models. Akin to current popular “generate-then-reconstruct” 3D generation paradigm relying on multi-view models, we seek to leverage video models to achieve similar stuffs. (2/3)

English

117

Hsin-Ying Lee@hyjameslee·12 Haz

Thanks @_akhaliq for sharing! 4Real aims to generate photorealistic 4D scene. Existing 4D generation methods, as long as they use multi-view models trained on Objaverse as priors, inevitably generate object-centric synthetic-looking samples. (1/3)

AK@_akhaliq

4Real Towards Photorealistic 4D Scene Generation via Video Diffusion Models Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result,

English

1.4K

Hsin-Ying Lee retweetledi

AK@_akhaliq·11 Haz

GTR Improving Large 3D Reconstruction Models through Geometry and Texture Refinement We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-

English

8.8K

Hsin-Ying Lee retweetledi

POM@peterom·10 May

The progress of the Animatediff community over the past 10 months has been miraculous - see attached! Now, closed startups like Krea are taking the fruits of all this effort - so I'd like to tell the story of how we got here & what people who believe in open source can do.

English

149

817

166.6K

Hsin-Ying Lee retweetledi

AK@_akhaliq·19 Nis

Snap presents MoA Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts

English

162

47.5K

Hsin-Ying Lee retweetledi

Kfir Aberman@AbermanKfir·18 Nis

📢 Announcing MoA !! TL;DR - We introduce Mixture-of-Attention (MoA) a new architecture for personalization of generative models that disentangles the generation of given subjects and the context from the prior. snap-research.github.io/mixture-of-att… #snap #GenerativeAI

GIF

English

5.5K

Keşfet

@moayedhajiali @siarohin9013 @isskoro @alpercanbe @SergeyTulyakov @NeurIPSConf @_akhaliq @elonmusk