Hsin-Ying Lee

176 posts

Hsin-Ying Lee

Hsin-Ying Lee

@hyjameslee

CTO @app_illoca

Katılım Ağustos 2013
136 Takip Edilen383 Takipçiler
Hsin-Ying Lee retweetledi
Ziya Erkoç
Ziya Erkoç@ErkocZiya·
Presenting PrEditor3D at #CVPR2025 📢📢 If you'd like to learn more about our work and discuss 3D generation/editing, come visit our poster on Friday, June 13th, in ExHall D between 10:30-12:30 (Poster #44). Project Page: ziyaerkoc.com/preditor3d
English
0
10
32
3.7K
Hsin-Ying Lee retweetledi
Jim Fan
Jim Fan@DrJimFan·
The Physical Turing Test: your house is a complete mess after a Sunday hackathon. On Monday night, you come home to an immaculate living room and a candlelight dinner. And you couldn't tell whether a human or a machine had been there. Deceptively simple, insanely hard. It is the next North Star of AI. The dream that keeps me awake 12 am at the lab. The vision for the next computing platform that automates chunks of atoms instead of chunks of bits. Thanks Sequoia for hosting me at AI Ascent! Below is my full talk on the first principles to solve general-purpose robotics: how we think about the data strategy and scaling laws. I assure you it will be 17 minutes you don't regret!
English
68
207
1.2K
134.7K
Hsin-Ying Lee retweetledi
Chin-Yi Cheng
Chin-Yi Cheng@chinyich·
We're hiring a Software Engineer to join our team! 🚀 We're building a new product at the intersection of design and AI, and we're looking for someone who loves creating innovative experiences that solve complex, real-world challenges. (1/3)
English
1
2
20
3.3K
Hsin-Ying Lee retweetledi
Willi Menapace
Willi Menapace@WilliMenapace·
Video-to-Audio and Audio-to-Video models struggle with temporal alignment. AV-Link solves the problem by conditioning on diffusion model features Great collaboration with @moayedhajiali , @siarohin9013 , @isskoro , @alpercanbe , Kwot Sin Lee, Vicente Ordonez and @SergeyTulyakov
Moayed Haji Ali@moayedhajiali

Can pretrained diffusion models connect for cross-modal generation? 📢 Introducing AV-Link ♾ Bridging unimodal diffusion models in one framework to enable: 📽️ ➡️ 🔊 Video-to-Audio 🔊 ➡️ 📽️ Audio-to-Video 🌐: snap-research.github.io/AVLink/ 📄: hf.co/papers/2412.15… ⤵️ Results

English
0
3
10
843
Hsin-Ying Lee retweetledi
Gordon (Guocheng) Qian
Gordon (Guocheng) Qian@guocheng_qian·
🚀🚀🚀Omni-ID: Holistic Identity Representation Designed for Generative Tasks snap-research.github.io/Omni-ID/ Excited to share Omni-ID, a novel facial representation tailored for generative tasks! It captures diverse expressions & poses, enabling high-fidelity personalized generation.
English
3
24
106
11.5K
Hsin-Ying Lee retweetledi
Ruiqi Gao
Ruiqi Gao@RuiqiGao·
A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
Ruiqi Gao tweet media
English
16
199
945
172.4K
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
4Real-Video is a two-stream architecture for 4D video generation that independently handles temporal and view updates, synchronizing streams to ensure consistency. It proposes flexible synchronization mechanisms that enable efficient and adaptive token interactions.
English
0
0
0
155
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
📢4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion Do you want to explore space-time traversal? Do you want to convert any real-world images/videos to 4D? Check out our recent 4Real-Video! snap-research.github.io/4Real-Video/
English
1
11
55
4.2K
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
🔥DELTA: Dense Efficient Long-range 3D Tracking for any video DELTA can efficiently (10x faster!) track EVERY pixel in 3D space from monocular videos. Please check out our project page and paper for more details and samples! 👑snap-research.github.io/DELTA/
English
0
21
101
5.5K
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
🌟 Snap Research Internship🌟 The creative vision team is looking for 2025 interns! If you are a PhD student with passion in 3D/4D, video gen, personalization, or efficiency, reach out to via the application link or emails! 🔥Website: snap-research.github.io/cv-call-for-in…
English
1
6
57
6K
Hsin-Ying Lee retweetledi
Wei-Ning Hsu
Wei-Ning Hsu@mhnt1580·
As a speech/audio researcher, I think it’s a big breakthrough in advancing *human-level audio generation*. Why? Because this is NOT just a video-to-audio model that generates what it sees in the physical world... but a model that learns to *DESIGN* sounds like a human🤯 Here are what Movie Gen Audio does: 1⃣ far better synchronization than any existing video-to-SFX models 2⃣ compose music based on videos, taking sentiments into account 3⃣ generates SFX and music jointly, and blends them harmoniously like a pro 4⃣ inject non-diegetic sound effects, like "whoosh" when changing scenes 5⃣ create soundtracks that are >2 minutes Let’s watch a few videos to understand what Movie Gen Audio has done behind the scenes. ⭐️ALL audio tracks are generated by Movie Gen Audio without any post-processing⭐️
Wei-Ning Hsu@mhnt1580

Now HEAR this (not just watch) - We've got audio covered for generated videos 🔊 Introducing Movie Gen Audio, which adds 48kHz synced SFX and aligned music to amazing videos from Movie Gen Video (and other sources!) Super honored to work with this amazing team! More to come🔥🔥

English
1
8
69
8.5K
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
Challenges majorly lie in the limited camera controll and lack of true 3D consistency of current video models. We propose a series of strategies to address these issues. Check out our paper for details! (3/3)
English
0
0
0
86
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
We propose a pipeline to abuse video diffusion models. Akin to current popular “generate-then-reconstruct” 3D generation paradigm relying on multi-view models, we seek to leverage video models to achieve similar stuffs. (2/3)
English
1
0
0
117
Hsin-Ying Lee
Hsin-Ying Lee@hyjameslee·
Thanks @_akhaliq for sharing! 4Real aims to generate photorealistic 4D scene. Existing 4D generation methods, as long as they use multi-view models trained on Objaverse as priors, inevitably generate object-centric synthetic-looking samples. (1/3)
AK@_akhaliq

4Real Towards Photorealistic 4D Scene Generation via Video Diffusion Models Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result,

English
1
0
8
1.4K
Hsin-Ying Lee retweetledi
AK
AK@_akhaliq·
GTR Improving Large 3D Reconstruction Models through Geometry and Texture Refinement We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-
English
1
11
42
8.8K
Hsin-Ying Lee retweetledi
POM
POM@peterom·
The progress of the Animatediff community over the past 10 months has been miraculous - see attached! Now, closed startups like Krea are taking the fruits of all this effort - so I'd like to tell the story of how we got here & what people who believe in open source can do.
English
26
149
817
166.6K
Hsin-Ying Lee retweetledi
AK
AK@_akhaliq·
Snap presents MoA Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts
English
4
33
162
47.5K
Hsin-Ying Lee retweetledi
Kfir Aberman
Kfir Aberman@AbermanKfir·
📢 Announcing MoA !! TL;DR - We introduce Mixture-of-Attention (MoA) a new architecture for personalization of generative models that disentangles the generation of given subjects and the context from the prior. snap-research.github.io/mixture-of-att… #snap #GenerativeAI
GIF
English
4
14
47
5.5K