Mannat Singh

67 posts

Mannat Singh

@mannat_singh

Research Engineer @ Meta Superintelligence Labs. Researching and building multimodal models with a focus on media generation.

Manhattan, NY Katılım Aralık 2010

206 Takip Edilen382 Takipçiler

Mannat Singh@mannat_singh·8 Nis

@ashkamath20 @alcinos26 Congrats @ashkamath20 super cool work!

English

199

Aishwarya Kamath@ashkamath20·8 Nis

We released Gemma 4 last week, and seeing the community's response has been amazing! 🚀 Honored to lead the vision efforts in which we made huge performance leaps from Gemma 3, I wanted to help you make the most of the new capabilities. Deep dive 🧵

English

108

907

45.8K

Mannat Singh@mannat_singh·25 Oca

The US is in a state of absolute tyranny, and executions are occurring in front of our eyes. This needs to stop; everyone, from ICE officers to the President, must be brought to justice. As non-citizens, we are told to steer clear of politics, but I have to speak up - we all do!

English

167

Mannat Singh@mannat_singh·24 Oca

@deviparikh Congrats Devi! This is super cool!

English

333

Devi Parikh@deviparikh·24 Oca

Excited to share a sneak peek into what we've been building at Yutori! What you see below is our trained model and internal prototype — multiple agents running in parallel in the background, completing tasks of varying complexity, relevant information and cues to step in being surfaced to the user. More examples 👇 This is barely scratching the surface of what agents can do for you day-to-day. Follow along at @yutori_ai — more to come soon!

English

464

221.1K

Mannat Singh@mannat_singh·20 Ara

@koval_alvi Indeed, this is another advantage of the text VE that we don't learn simply a 1:1 mapping!

English

Aleksandr Kovalev@koval_alvi·20 Ara

@mannat_singh now, i don't have questions. Interesting approach! Congrats!

English

Mannat Singh@mannat_singh·20 Ara

Flow matching can transform one distribution to another. So why do text-to-image models map noise to images instead of directly mapping text to images? Wouldn't it be cool to directly connect modalities together? CrossFlow accomplishes exactly that! cross-flow.github.io

English

321

32.8K

Mannat Singh@mannat_singh·20 Ara

Great work led by Qihao Liu during his internship with our team @xi_yin_, @Andrew__Brown__, and @YuilleAlan And props to Qihao for a github release (github.com/qihao067/Cross…) with a complete reproduction + releasing models trained on public datasets.

English

882

Mannat Singh@mannat_singh·20 Ara

In fact, we find that this simple design scales *even better* than conventional FM with both model size and training steps. Lots of other details, like enabling CFG, the importance of a Variational Encoders in the paper (arxiv.org/abs/2412.15213).

English

1.3K

Mannat Singh@mannat_singh·11 Ara

@ankurbpn Congrats! 😀

English

Ankur Bapna@ankurbpn·11 Ara

Finally, native audio generation with Gemini! m.youtube.com/watch?v=qE673A…

English

3.7K

Mannat Singh retweetledi

AI at Meta@AIatMeta·17 Eki

As detailed in the Meta Movie Gen technical report, today we’re open sourcing Movie Gen Bench: two new media generation benchmarks that we hope will help to enable the AI research community to progress work on more capable audio and video generation models. Movie Gen Video Bench is the largest and most comprehensive benchmark ever released for evaluating text-to-video generation. It includes a collection of 1,000+ prompts that cover concepts ranging from detailed human activity to animals, physics, unusual subjects and more — with broad coverage across different motion levels. Movie Gen Audio Bench is a first-of-its-kind benchmark aimed at evaluating video-to-audio and (text+video)-to-audio generation. It includes 527 generated videos and associated sound effects and music prompts covering a diverse set of ambient environments and sound effects. To enable fair and easy comparison to our models for future works, these new benchmarks include non cherry-picked generated videos and audio from Movie Gen. In releasing these new benchmarks we hope to promote fair & extensive evaluations in media generation research to enable greater progress in this field.

English

217

156.1K

Mannat Singh@mannat_singh·4 Eki

Finally @_rohitgirdhar_ and I can talk about our detour into Llama 3 video understanding. You need to understand videos (and caption them 💬) to generate good quality videos! 🐨

English

244

Mannat Singh@mannat_singh·4 Eki

Check out Movie Gen 🎥 Our latest media generation models for video generation, editing, and personalization, with audio generation! 16 second 1080p videos generated through a simple Llama-style 30B transformer. Demo + detailed 92 page technical report 📝⬇️

AI at Meta@AIatMeta

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ go.fb.me/kx1nqm 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

English

Mannat Singh@mannat_singh·26 Tem

@altryne @_rohitgirdhar_ @filipradenovic @imisra_ Thank you Alex! I think the team is trying to get the multimodal capabilities out in some form, can't say if OSS will happen for sure though.

English

Alex Volkov@altryne·25 Tem

@mannat_singh @_rohitgirdhar_ @filipradenovic @imisra_ Congrats on this amazing work! Are those slated to be released with .2 release at some point Mannat?

English

Mannat Singh@mannat_singh·23 Tem

Llama 3.1 is out! Through adapters we've made it multimodal, supporting images, videos, speech! Was a fun journey adding video understanding capabilities with @_rohitgirdhar_, @filipradenovic , @imisra_ and the whole MM team! P.S. MM models are WIP (not part of the release).

AI at Meta@AIatMeta

Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation. The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one. More details in the full announcement ➡️ go.fb.me/tpuhb6 Download Llama 3.1 models ➡️ go.fb.me/vq04tr With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.

English

Mannat Singh@mannat_singh·24 May

@roark42 Thanks @roark42! :)

English

adil hayat@roark42·24 May

@mannat_singh Congrats!!

English

Mannat Singh@mannat_singh·24 May

Feels great to have been recognized as an outstanding reviewer for CVPR this year 🤩

#CVPR2026@CVPR

HUGE shoutout to our #CVPR2024 Outstanding Reviewers 🫡

English

942

Mannat Singh@mannat_singh·19 Nis

Also check out animate in Meta AI, which builds on top of our prior research on Emu Video and brings your pictures to life! 📹 twitter.com/AIatMeta/statu…

AI at Meta@AIatMeta

Want to try these updated Imagine features in Meta AI? More details ⬇️ go.fb.me/em5kto

English

367

Mannat Singh@mannat_singh·19 Nis

Llama 3 is out! Super fortunate to be part of the incredible effort! Also can't wait to share some of our work which is still in progress! 🦙🦙🦙

AI at Meta@AIatMeta

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3 models — in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance + the Llama 3 research paper for the community to learn from our work. More details ➡️ go.fb.me/i2y41n Download Llama 3 ➡️ go.fb.me/ct2xko

English

620

Keşfet

@ashkamath20 @alcinos26 @deviparikh @yutori_ai @koval_alvi @xi_yin_ @Andrew__Brown__ @YuilleAlan