Liangke Gui

14 posts

Liangke Gui banner
Liangke Gui

Liangke Gui

@liangkegui

GenMedia researcher @GoogleDeepMind, AI PhD @CarnegieMellon

Mountain View, CA Katılım Mayıs 2022
122 Takip Edilen100 Takipçiler
Liangke Gui retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Here’s how it works: 🔵 Design your world and character using text and visual prompts. 🔵 Nano Banana Pro makes an image preview that you can adjust. 🔵 Our Genie 3 world model generates the environment in real-time as you move through. 🔵 Remix existing worlds or discover new ones in the gallery.
English
18
153
1.8K
372.3K
Liangke Gui retweetledi
Google Gemini
Google Gemini@GeminiApp·
Our new native image generation and editing is state-of-the-art, and ranked #1 in the world. And we're rolling it out for free to everyone today. You’ve got the tools. Now go bananas. Ideas & inspiration in the 🧵below.
English
266
576
4.7K
812.7K
Liangke Gui retweetledi
Oliver Wang
Oliver Wang@oliver_wang2·
Oliver Wang tweet media
ZXX
54
45
911
117.6K
Liangke Gui retweetledi
Ceyuan Yang
Ceyuan Yang@CeyuanY·
Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To unleash the power of the foundation model, Seaweed-7B further enables a wide range of downstream applications including image-to-video generation, human video generation, subject-consistent video generation, video-audio joint generation, long video generation and storytelling, real-time generation, super-resolution generation, camera controlled generation. Check out our webpage and report for more details: Webpage: seaweed.video Paper: seaweed.video/seaweed.pdf It's a wonderful journey of the last year. Thanks to all teammates for their contributions, sincerely.
English
34
95
516
77.2K
Liangke Gui retweetledi
Ceyuan Yang
Ceyuan Yang@CeyuanY·
Check out our latest work CameraCtrl II. By carefully collecting and processing data and introducing as little inductive bias as we can, users are allowed to explore the generated world with appealing dynamics and consistency. Together with extension and distillation, CameraCtrl II can support ultra-fast interaction and long-term exploration. Homepage: hehao13.github.io/Projects-Camer…
English
3
21
118
14.8K
Liangke Gui retweetledi
Language Technologies Institute | @CarnegieMellon
The effectiveness of Video LMMs can be enhanced from DPO training using language model reward, which leverages detailed video captions as proxies for video content, leading to cost-effective preference optimization for video LMM alignment. twitter.com/RuohongZhang/s…
Ruohong Zhang@RuohongZhang

[p1] 🐕Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward🐕 Paper link: arxiv.org/pdf/2404.01258… page: github.com/RifleZhang/LLa… How to effectively train video large multimodal Model (LMM) alignment with preference modeling?

English
0
5
7
3.4K
Liangke Gui
Liangke Gui@liangkegui·
produces surprisingly interpretable patch alignments on concepts not in COCO or ImageNet-1K
Liangke Gui tweet media
English
1
0
1
0
Liangke Gui
Liangke Gui@liangkegui·
What if we don’t need supervised pretraining for vision-language models? We find that unsupervised visual representations (e.g., MAE) are actually better initializations for language and vision.
Liangke Gui tweet media
English
1
10
34
0
Liangke Gui
Liangke Gui@liangkegui·
This provides a nice place to investigate the trade-offs of information already available in model weights vs extracted from structured sources.
Liangke Gui tweet media
English
1
0
1
0
Liangke Gui
Liangke Gui@liangkegui·
Can multimodal transformers leverage explicit knowledge in their reasoning? What’t the role of explicit vs implicit knowledge in visual tasks like OK-VQA? Check out our KAT (A Knowledge Augmented Transformer for vision-and-language) paper at #NAACL2022.
Liangke Gui tweet media
English
2
3
19
0