Liangke Gui

14 posts

Liangke Gui

@liangkegui

GenMedia researcher @GoogleDeepMind, AI PhD @CarnegieMellon

Mountain View, CA Katılım Mayıs 2022

122 Takip Edilen100 Takipçiler

Liangke Gui retweetledi

Google DeepMind@GoogleDeepMind·29 Oca

Here’s how it works: 🔵 Design your world and character using text and visual prompts. 🔵 Nano Banana Pro makes an image preview that you can adjust. 🔵 Our Genie 3 world model generates the environment in real-time as you move through. 🔵 Remix existing worlds or discover new ones in the gallery.

English

153

1.8K

372.3K

Liangke Gui retweetledi

Google Gemini@GeminiApp·26 Ağu

Our new native image generation and editing is state-of-the-art, and ranked #1 in the world. And we're rolling it out for free to everyone today. You’ve got the tools. Now go bananas. Ideas & inspiration in the 🧵below.

English

266

576

4.7K

812.7K

Liangke Gui retweetledi

Oliver Wang@oliver_wang2·20 Ağu

ZXX

911

117.6K

Liangke Gui retweetledi

Ceyuan Yang@CeyuanY·14 Nis

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To unleash the power of the foundation model, Seaweed-7B further enables a wide range of downstream applications including image-to-video generation, human video generation, subject-consistent video generation, video-audio joint generation, long video generation and storytelling, real-time generation, super-resolution generation, camera controlled generation. Check out our webpage and report for more details: Webpage: seaweed.video Paper: seaweed.video/seaweed.pdf It's a wonderful journey of the last year. Thanks to all teammates for their contributions, sincerely.

English

516

77.2K

Liangke Gui retweetledi

Ceyuan Yang@CeyuanY·17 Mar

Check out our latest work CameraCtrl II. By carefully collecting and processing data and introducing as little inductive bias as we can, users are allowed to explore the generated world with appealing dynamics and consistency. Together with extension and distillation, CameraCtrl II can support ultra-fast interaction and long-term exploration. Homepage: hehao13.github.io/Projects-Camer…

English

118

14.8K

Liangke Gui retweetledi

Language Technologies Institute | @CarnegieMellon@LTIatCMU·10 Nis

The effectiveness of Video LMMs can be enhanced from DPO training using language model reward, which leverages detailed video captions as proxies for video content, leading to cost-effective preference optimization for video LMM alignment. twitter.com/RuohongZhang/s…

Ruohong Zhang@RuohongZhang

[p1] 🐕Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward🐕 Paper link: arxiv.org/pdf/2404.01258… page: github.com/RifleZhang/LLa… How to effectively train video large multimodal Model (LMM) alignment with preference modeling?

English

3.4K

Liangke Gui@liangkegui·2 Haz

See more in the paper here: arxiv.org/abs/2205.09256 Or grab the code and models to play around yourself: github.com/guilk/VLC Joint work with Qiuyuan Huang, Alex Hauptmann, @ybisk, and @JianfengGao0217

English

Liangke Gui@liangkegui·2 Haz

produces surprisingly interpretable patch alignments on concepts not in COCO or ImageNet-1K

English

Liangke Gui@liangkegui·2 Haz

What if we don’t need supervised pretraining for vision-language models? We find that unsupervised visual representations (e.g., MAE) are actually better initializations for language and vision.

English

Liangke Gui@liangkegui·1 Haz

Additionally, our explicit knowledge integration improves interpretability of model predictions in our analysis. Paper: arxiv.org/abs/2112.08614 Code: github.com/guilk/KAT (Work w/ Qiuyuan Huang, Alex Hauptmann, @ybisk and @JianfengGao0217)

English

Liangke Gui@liangkegui·1 Haz

This provides a nice place to investigate the trade-offs of information already available in model weights vs extracted from structured sources.

English

Liangke Gui@liangkegui·1 Haz

Can multimodal transformers leverage explicit knowledge in their reasoning? What’t the role of explicit vs implicit knowledge in visual tasks like OK-VQA? Check out our KAT (A Knowledge Augmented Transformer for vision-and-language) paper at #NAACL2022.

English

Keşfet

@ybisk @JianfengGao0217 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA