Saketh Rambhatla

88 posts

Saketh Rambhatla

Saketh Rambhatla

@rssaketh

Phd student at University of Maryland, College Park

College Park, MD शामिल हुए Ekim 2010
613 फ़ॉलोइंग206 फ़ॉलोवर्स
पिन किया गया ट्वीट
Saketh Rambhatla
Saketh Rambhatla@rssaketh·
Do your supervised models fail to adapt to scenarios with novel classes? Sick of re-training your model for every new category introduced in the dataset? Checkout our #ICCV2021 paper “The Pursuit of Knowledge: Discovering and Localizing novel categories using dual memory”. (1/4)
Saketh Rambhatla tweet media
English
2
15
35
0
Saketh Rambhatla रीट्वीट किया
Ananye Agarwal
Ananye Agarwal@anag004·
Warehouses are a critical part of the modern economy, but are still bottlenecked by human labor. We've acquired Fetch Robotics to tackle previously hard-to-automate parts of warehouse logistics with our omni-bodied AI! (Our factorio players are delighted)
Skild AI@SkildAI

We have acquired Zebra Technologies’ robotics arm (formerly Fetch Robotics). This is what happens when orchestration meets intelligence -- a major step toward fully autonomous warehouses. More robots. More environments. One unified brain.

English
0
2
12
644
Saketh Rambhatla रीट्वीट किया
Skild AI
Skild AI@SkildAI·
We have acquired Zebra Technologies’ robotics arm (formerly Fetch Robotics). This is what happens when orchestration meets intelligence -- a major step toward fully autonomous warehouses. More robots. More environments. One unified brain.
English
9
61
313
66.7K
Saketh Rambhatla रीट्वीट किया
Deepak Pathak
Deepak Pathak@pathak2206·
We hosted Prof. Alyosha Efros (UC Berkeley) at @SkildAI! He didn't believe that robots could actually cook eggs reliably. :) Tested back-to-back 5times without fail! One batch of scrambled eggs every ~2.5mins nonstop. The same model assembles a GPU on a server rack too.
English
37
212
1.7K
162.4K
Saketh Rambhatla रीट्वीट किया
Deepak Pathak
Deepak Pathak@pathak2206·
At @SkildAI, we’ve raised $1.4B, bringing our valuation to over $14B. We’re on a generational mission, and I’m grateful to be working alongside an exceptional team. Thanks to our investors for the long-term conviction towards omni-bodied intelligence 🚀 bloomberg.com/news/articles/…
Skild AI@SkildAI

Announcing Series C We’ve raised $1.4B, valuing the company at over $14B With this capital, we will accelerate our mission to build omni-bodied intelligence 🚀 skild.ai/blogs/series-c

English
46
56
627
158.9K
Saketh Rambhatla रीट्वीट किया
Skild AI
Skild AI@SkildAI·
Announcing Series C We’ve raised $1.4B, valuing the company at over $14B With this capital, we will accelerate our mission to build omni-bodied intelligence 🚀 skild.ai/blogs/series-c
English
25
74
594
342.9K
Saketh Rambhatla रीट्वीट किया
Skild AI
Skild AI@SkildAI·
See our robot open doors, water plants, assemble a box, and more by learning from watching humans. Using <1hr of robot data.
English
2
11
79
36K
Saketh Rambhatla रीट्वीट किया
Skild AI
Skild AI@SkildAI·
Let robots clean in peace 🧺🤖✨
English
6
44
194
158.5K
Saketh Rambhatla रीट्वीट किया
Skild AI
Skild AI@SkildAI·
Humans learn by watching. Robots should too.
English
28
130
865
1.1M
Saketh Rambhatla रीट्वीट किया
Ishan Misra
Ishan Misra@imisra_·
Inference time objectives are amazing :) We show that LLMs can be upgraded to multimodal beings by a simple trick :) No training needed! Works on image generation, editing, style transfer and more!
Rohit Girdhar@_rohitgirdhar_

Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!

English
6
40
197
56.3K
Saketh Rambhatla रीट्वीट किया
Rohit Girdhar
Rohit Girdhar@_rohitgirdhar_·
Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!
Rohit Girdhar tweet media
English
7
37
243
68.9K
Saketh Rambhatla रीट्वीट किया
Saketh Rambhatla रीट्वीट किया
Shijie Wang
Shijie Wang@ShijieWang20·
How can we better animate images solely following text descriptions? We present Motion Focal Loss (MotiF) (arxiv.org/abs/2412.16153) to better align motions with text descriptions in text-image-to-video (TI2V) task and release TI2V-Bench, a comprehensive TI2V benchmark. (1/n)
English
6
10
52
5.8K
Saketh Rambhatla रीट्वीट किया
Mannat Singh
Mannat Singh@mannat_singh·
Flow matching can transform one distribution to another. So why do text-to-image models map noise to images instead of directly mapping text to images? Wouldn't it be cool to directly connect modalities together? CrossFlow accomplishes exactly that! cross-flow.github.io
Mannat Singh tweet media
English
2
41
321
32.8K
Saketh Rambhatla रीट्वीट किया
Mara Levy
Mara Levy@mlevy1221·
How can we make Imitation Leaning generalize? In my latest work we show that a key point based representation can generalize to novel instances of an object and is agnostic to background changes.
English
1
11
44
11.9K
Saketh Rambhatla रीट्वीट किया
Andrew Brown
Andrew Brown@Andrew__Brown__·
🚨 Internship in Meta GenAI NYC 🚨 I have an open PhD internship position for 2025! Interested in exploring visual generative models (or any other exciting ideas) inside the team that brought you Movie Gen and Emu Video? 📩 Send me DM with CV, website, and GScholar profile
AI at Meta@AIatMeta

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ go.fb.me/kx1nqm 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

English
3
27
263
35.3K
Saketh Rambhatla रीट्वीट किया
Manohar Paluri
Manohar Paluri@manohar_paluri·
Meta Movie Gen is just freakin cool! Generative Video Foundation models with this quality, precise editing and personalization unlock value for creators, new creative tools and enable Agents that can interact in richer ways closing the loop on learning to unlock world models!
AI at Meta@AIatMeta

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ go.fb.me/kx1nqm 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

English
3
4
51
6.1K
Saketh Rambhatla रीट्वीट किया
Shelly Sheynin
Shelly Sheynin@ShellySheynin·
I’m thrilled and proud to share our model, Movie Gen, that we've been working on for the past year, and in particular, Movie Gen Edit, for precise video editing. 😍 Look how Movie Gen edited my video!
AI at Meta@AIatMeta

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ go.fb.me/kx1nqm 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

English
54
88
811
234.9K
Saketh Rambhatla रीट्वीट किया
Roshan Sumbaly
Roshan Sumbaly@rsumbaly·
Lights, camera, action - introducing Meta's Movie Gen! Our latest breakthrough in AI-powered media generation, setting a new standard for immersive AI content creation. We're also releasing a 92 page detailed report of what we learned, along with evaluation prompts that we hope pushes the community forward. 📽️Examples: youtube.com/playlist?list=… 📜Paper: ai.meta.com/static-resourc… 🖥️Site: ai.meta.com/research/movie…
Roshan Sumbaly tweet media
English
2
10
85
6.8K
Saketh Rambhatla रीट्वीट किया
Mannat Singh
Mannat Singh@mannat_singh·
Check out Movie Gen 🎥 Our latest media generation models for video generation, editing, and personalization, with audio generation! 16 second 1080p videos generated through a simple Llama-style 30B transformer. Demo + detailed 92 page technical report 📝⬇️
AI at Meta@AIatMeta

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike. More details and examples of what Movie Gen can do ➡️ go.fb.me/kx1nqm 🛠️ Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt. Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment. Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes. Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video. We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

English
1
1
16
1K