Haoyu Ma

19 posts

Haoyu Ma banner
Haoyu Ma

Haoyu Ma

@HaoyumaU

MTS @ Microsoft AI | Former Research Scientist @ Meta Superintelligence Labs | PhD in CS, UC Irvine | Ex-intern at Meta/Adobe/Tencent/Baidu

Irvine, CA Katılım Eylül 2019
159 Takip Edilen93 Takipçiler
Haoyu Ma retweetledi
Mustafa Suleyman
Mustafa Suleyman@mustafasuleyman·
Three models. Three top-tier results. All shipped within just a few months by the @MicrosoftAI team. - MAI-Transcribe-1 dropped today, the most accurate transcription model in the world across 25 languages according to FLEURS WER benchmark. - MAI-Voice-1 sets a new standard for natural speech. - MAI-Image-2 lands as a top 3 model family on @arena. We've been building with them - now you can too. All 3 available now on Microsoft Foundry.
English
44
93
527
72K
Haoyu Ma retweetledi
Leon Chen
Leon Chen@realleonlc·
🚀 Introducing our fresh work at Stanford and Meta MSL: UniT — Unified Multimodal Chain-of-Thought Test-time Scaling What if a single model could generate an image, look at it, think about what's wrong, and fix it — all by itself? That's exactly what UniT does. 🧵👇
Leon Chen tweet mediaLeon Chen tweet media
English
4
29
159
24.4K
Haoyu Ma retweetledi
Felix Juefei Xu
Felix Juefei Xu@felixudr·
As generative models move toward real-world deployment, efficiency becomes a first-class research problem. The 3rd EDGE Workshop @ CVPR 2026 — Efficient & On-Device Generation focuses on: ⚡ Efficient training & inference 📱 On-device multimodal generation 🎬 Real-time image & video models 🌍 Scalable, deployable GenAI Submit & learn more: cvpr26-edge.github.io @CVPR #CVPR2026 #EdgeAI #GenerativeModels #EfficientML
Felix Juefei Xu tweet media
English
2
9
44
6.5K
Jeff Liang
Jeff Liang@LiangJeff95·
哎,我实在是太想去开会了。 SAD。
Jeff Liang tweet media
中文
3
0
11
1.1K
Haoyu Ma retweetledi
AK
AK@_akhaliq·
Token-Shuffle Towards High-Resolution Image Generation with Autoregressive Models
AK tweet media
English
4
20
187
17.3K
Haoyu Ma retweetledi
Min Choi
Min Choi@minchoi·
Meta just announced MoCha This AI can create full movie-quality talking & singing characters from just speech & text. 10 wild examples: 1. Talking Characters
English
118
244
1.6K
420K
Haoyu Ma retweetledi
AK
AK@_akhaliq·
Meta announces MoCha Towards Movie-Grade Talking Character Synthesis
English
45
150
930
204.4K
Haoyu Ma
Haoyu Ma@HaoyumaU·
Super proud to be part of this amazing project, especially for the video personalization!
AI at Meta@AIatMeta

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ go.fb.me/kx1nqm 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

English
0
0
3
200
Haoyu Ma retweetledi
AK
AK@_akhaliq·
Meta presents Imagine yourself Tuning-Free Personalized Image Generation paper page: huggingface.co/papers/2409.13… Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based personalization techniques, Imagine yourself operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments. Moreover, previous work met challenges balancing identity preservation, following complex prompts and preserving good visual quality, resulting in models having strong copy-paste effect of the reference images. Thus, they can hardly generate images following prompts that require significant changes to the reference image, \eg, changing facial expression, head and body poses, and the diversity of the generated images is low. To address these limitations, our proposed method introduces 1) a new synthetic paired data generation mechanism to encourage image diversity, 2) a fully parallel attention architecture with three text encoders and a fully trainable vision encoder to improve the text faithfulness, and 3) a novel coarse-to-fine multi-stage finetuning methodology that gradually pushes the boundary of visual quality. Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment. This model establishes a robust foundation for various personalization applications. Human evaluation results validate the model's SOTA superiority across all aspects (identity preservation, text faithfulness, and visual appeal) compared to the previous personalization models.
AK tweet media
English
7
73
384
41.2K
Haoyu Ma retweetledi
Danny Trinh
Danny Trinh@dtrinh·
We put a very (very!) fun thing in Meta AI today. Say "imagine me..." to see yourself anywhere your heart desires. If you're weird like me, you might imagine yourself with a magical emu. 🔜 try it in IG, Messenger, and meta.ai@AIatMeta
Danny Trinh tweet mediaDanny Trinh tweet media
English
27
16
268
64.3K
Haoyu Ma
Haoyu Ma@HaoyumaU·
We are thrilled to announce that MaskINT (maskint.github.io) has been accepted by #CVPR2024 ! See you all in Seattle!
AK@_akhaliq

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: huggingface.co/papers/2312.12… Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

English
1
2
13
1.1K
Haoyu Ma
Haoyu Ma@HaoyumaU·
Thanks @_akhaliq for sharing our work. MaskINT disentangle video editing into key frame editing stage and structure-aware frame interpolation stage. With the benefit of masked transformers, our method achieve 5-7 times acceleration in inference time than pure diffusion models.
AK@_akhaliq

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: huggingface.co/papers/2312.12… Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

English
0
0
4
222
Haoyu Ma
Haoyu Ma@HaoyumaU·
#WACV2020 Snowmass Village is really pretty! And I made so many new friends here!
Haoyu Ma tweet mediaHaoyu Ma tweet mediaHaoyu Ma tweet mediaHaoyu Ma tweet media
English
0
0
6
0
Haoyu Ma retweetledi
arxiv
arxiv@arxiv_org·
Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation. arxiv.org/abs/2002.02033
arxiv tweet mediaarxiv tweet mediaarxiv tweet mediaarxiv tweet media
English
0
20
30
0
Haoyu Ma retweetledi
William Wang
William Wang@WilliamWangNLP·
Attending your first academic conference...
English
2
35
336
0