Brian Gordon

36 posts

Brian Gordon

Brian Gordon

@Brian_Gordon13

Research Intern @ Google | https://t.co/YF6cq9yyny @ Tel-Aviv University

Katılım Kasım 2021
86 Takip Edilen48 Takipçiler
Brian Gordon retweetledi
Life at Google
Life at Google@lifeatgoogle·
Aviv, a research scientist with Google Earth AI, walked us through how his work provides first responders, crisis planners, and others with the information they need to save lives. Interested in similar roles? Explore our open AI/ML jobs ➡️ goo.gle/4iyaDXQ
English
3
12
60
2.3K
Brian Gordon retweetledi
jaron1990
jaron1990@jaron1990·
1/ What if you could animate a face directly from text? 🎭 Meet Express4D - a dataset of expressive 4D facial motions captured from natural language prompts, designed for generative models and animation pipelines. 🔗jaron1990.github.io/Express4D/ 📹👇
English
2
17
21
2.1K
Brian Gordon retweetledi
Sigal Raab
Sigal Raab@sigal_raab·
🔔Excited to announce that #AnyTop has been accepted to #SIGGRAPH2025!🥳 ✅ A diffusion model that generates motion for arbitrary skeletons ✅ Using only a skeletal structure as input ✅ Learns semantic correspondences across diverse skeletons 🌐 Project: anytop2025.github.io/Anytop-page
English
1
24
73
3K
Brian Gordon retweetledi
Andreas Aristidou
Andreas Aristidou@andaristidou·
🚀 New preprint! 🚀 Check out AnyTop 🤩 ✅ A diffusion model that generates motion for arbitrary skeletons 🦴 ✅ Using only a skeletal structure as input ✅ Learns semantic correspondences across diverse skeletons 🦅🐒🪲 🔗 Arxiv: arxiv.org/abs/2502.17327
English
2
42
189
18.3K
Brian Gordon retweetledi
Daniel Cohen-Or
Daniel Cohen-Or@DanielCohenOr1·
Thrilled to see this plot in a recent survey on 'personalized image generation' (arxiv.org/abs/2502.13081) — highlighting the impact of our work! Huge congratulations to my fantastic students, whose creativity and dedication continue to drive exciting advances in the field!
Daniel Cohen-Or tweet media
English
6
19
160
14.3K
Brian Gordon retweetledi
Rotem Shalev-Arkushin
Rotem Shalev-Arkushin@rotemsh3·
Excited to introduce our new work: ImageRAG 🖼️✨ rotem-shalev.github.io/ImageRAG We enhance off-the-shelf generative models with Retrieval-Augmented Generation (RAG) for unknown concept generation, using a VLM-based approach that’s easy to integrate with new & existing models! [1/3]
Rotem Shalev-Arkushin tweet media
English
1
12
40
2K
Brian Gordon retweetledi
Guy Tevet
Guy Tevet@GuyTvt·
🚀 Meet DiP: our newest text-to-motion diffusion model! ✨ Ultra-fast generation ♾️ Creates endless, dynamic motions 🔄 Seamlessly switch prompts on the fly Best of all, it's now available in the MDM codebase: github.com/GuyTevet/motio… [1/3]
English
12
86
466
38.5K
Brian Gordon retweetledi
moab.arar
moab.arar@ArarMoab·
Checkout our work "GameNGen". A Gaming engine powered by a diffusion-model that simulates DOOM in Real-Time! Find out more: gamengen.github.io Amazing effort and fun collaboration with the incredible @daniva, @yanivle, and @shlomifruchter!
AK@_akhaliq

Google presents Diffusion Models Are Real-Time Game Engines discuss: huggingface.co/papers/2408.14… We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

English
2
17
50
5.9K
Brian Gordon retweetledi
San Lorenzo Redes + 1M²
San Lorenzo Redes + 1M²@SanLorenzoRedes·
Sorteo el 18 de septiembre. Camiseta de @SanLorenzo titular original XL. Para participar: 📌Nos tenés que seguir y darle rt a este tweet. 📌Seguir a @boedoenmi. 📌Si querés tener más chances seguinos en nuestro canal de Youtube. 👇 @sanlorenzoredes" target="_blank" rel="nofollow noopener">youtube.com/@sanlorenzored
San Lorenzo Redes + 1M² tweet media
Español
30
431
287
141.1K
Brian Gordon retweetledi
nitzan guetta
nitzan guetta@nitzanguetta·
Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵
nitzan guetta tweet media
AK@_akhaliq

Visual Riddles a Commonsense and World Knowledge Challenge for Large Vision and Language Models Imagine observing someone scratching their arm; to understand why, additional context would be necessary. However, spotting a mosquito nearby would immediately offer a likely explanation for the person's discomfort, thereby alleviating the need for further information. This example illustrates how subtle visual cues can challenge our cognitive skills and demonstrates the complexity of interpreting visual scenarios. To study these skills, we present Visual Riddles, a benchmark aimed to test vision and language models on visual riddles requiring commonsense and world knowledge. The benchmark comprises 400 visual riddles, each featuring a unique image created by a variety of text-to-image models, question, ground-truth answer, textual hint, and attribution. Human evaluation reveals that existing models lag significantly behind human performance, which is at 82\% accuracy, with Gemini-Pro-1.5 leading with 40\% accuracy. Our benchmark comes with automatic evaluation tasks to make assessment scalable. These findings underscore the potential of Visual Riddles as a valuable resource for enhancing vision and language models' capabilities in interpreting complex visual scenarios.

English
1
14
34
16.8K
European Conference on Computer Vision #ECCV2026
The #ECCV2024 camera-ready submission instructions for main conference papers has been sent to all authors via email. The deadline for submitting the camera-ready paper is **July 15 (22:00 CEST)**. By the same deadline you also need to cover the paper with a registration.
English
7
2
31
8.3K
Brian Gordon
Brian Gordon@Brian_Gordon13·
We are happy to share Mismatch Quest acceptance to #ECCV2024 @eccvconf ! 🥳 Check out additional details in the project website mismatch-quest.github.io Congrats to the team @YonatanBitton @shafir_yoni, @roopalgarg, Xi Chen, @DaniLischinski , @DanielCohenOr1, Idan Szpektor
Brian Gordon@Brian_Gordon13

1/📄 Excited to introduce our paper "Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment"!🖼️👀 arxiv.org/abs/2312.03766 Website: mismatch-quest.github.io w. @YonatanBitton, @shafir_yoni, @roopalgarg, Xi Chen, @DaniLischinski, @DanielCohenOr1, Idan Szpektor 🧵

English
0
7
20
3.4K
Brian Gordon retweetledi
Kfir Aberman
Kfir Aberman@AbermanKfir·
“Monkey See, Monkey Do”! 🐵 A cool new work demonstrating how manipulating self-attention features in diffusion models enable zero-shot motion transfer. It can generate motions that follow a leader dancer in various motifs, including those of a gorilla! monkeyseedocg.github.io
GIF
English
1
8
23
1.6K
Brian Gordon retweetledi
Guy Tevet
Guy Tevet@GuyTvt·
#MDM is now 40X faster 🤩🤩🤩 (~0.4 sec/sample) How come?!? (1) We released the 50 diffusion steps model (instead of 1000 steps) which runs 20X faster. (2) Calling CLIP just once and caching the result runs 2X faster for all models. github.com/GuyTevet/motio…
English
4
18
121
18.3K
Brian Gordon retweetledi
AK
AK@_akhaliq·
Google announces PALP Prompt Aligned Personalization of Text-to-Image Models paper page: huggingface.co/papers/2401.06… Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impede the fulfillment of user prompts and subject fidelity. We propose a new approach focusing on personalization methods for a single prompt to address this issue. We term our approach prompt-aligned personalization. While this may seem restrictive, our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts, which may pose a challenge for current techniques. In particular, our method keeps the personalized model aligned with a target prompt using an additional score distillation sampling term. We demonstrate the versatility of our method in multi- and single-shot settings and further show that it can compose multiple subjects or use inspiration from reference images, such as artworks. We compare our approach quantitatively and qualitatively with existing baselines and state-of-the-art techniques.
English
2
101
430
90.2K
Brian Gordon
Brian Gordon@Brian_Gordon13·
10/🏁 Conclusions: We present an end-to-end approach that provides visual and textual feedback in text-to-image models, identifying alignment discrepancies with visual annotations for targeted model refinement. Check out the paper and project website for more details! 🎉
English
0
0
0
66
Brian Gordon
Brian Gordon@Brian_Gordon13·
9/ More results from SeeTRUE-Feedback test set! 🚀 PaLI 55B model, tuned on TV-Feedback, provides precise feedback, spotlighting textual and visual misalignment sources. The figure captures the essence - accuracy and insights bundled in one!
Brian Gordon tweet media
English
1
0
1
76
Brian Gordon
Brian Gordon@Brian_Gordon13·
1/📄 Excited to introduce our paper "Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment"!🖼️👀 arxiv.org/abs/2312.03766 Website: mismatch-quest.github.io w. @YonatanBitton, @shafir_yoni, @roopalgarg, Xi Chen, @DaniLischinski, @DanielCohenOr1, Idan Szpektor 🧵
AK@_akhaliq

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment paper page: huggingface.co/papers/2312.03… While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds plausible misaligned captions for a given image and corresponding textual explanations and visual indicators. We also publish a new human curated test set comprising ground-truth textual and visual misalignment annotations. Empirical results show that fine-tuning vision language models on our training set enables them to articulate misalignments and visually indicate them within images, outperforming strong baselines both on the binary alignment classification and the explanation generation tasks.

English
1
13
35
11.7K