Brian Gordon

36 posts

Brian Gordon

@Brian_Gordon13

Research Intern @ Google | https://t.co/YF6cq9yyny @ Tel-Aviv University

Katılım Kasım 2021

86 Takip Edilen48 Takipçiler

Brian Gordon retweetledi

Life at Google@lifeatgoogle·9 Ara

Aviv, a research scientist with Google Earth AI, walked us through how his work provides first responders, crisis planners, and others with the information they need to save lives. Interested in similar roles? Explore our open AI/ML jobs ➡️ goo.gle/4iyaDXQ

English

2.3K

Brian Gordon retweetledi

jaron1990@jaron1990·26 Ağu

1/ What if you could animate a face directly from text? 🎭 Meet Express4D - a dataset of expressive 4D facial motions captured from natural language prompts, designed for generative models and animation pipelines. 🔗jaron1990.github.io/Express4D/ 📹👇

English

2.1K

Brian Gordon retweetledi

Yonatan Bitton@YonatanBitton·16 Haz

And finally, our work Unblocking Fine-Grained Caption Evaluation: AutoRater & Critic-and-Revise (lnkd.in/df94d8eD) – led by @Brian_Gordon13

Yonatan Bitton@YonatanBitton

It was a privilege to present in the Google booth our work: RefVNLI: Scalable Evaluation of Subject-driven Text-to-Image Generation (refvnli.github.io) – led by @lovodkin93

English

18.4K

Brian Gordon retweetledi

Sigal Raab@sigal_raab·7 May

🔔Excited to announce that #AnyTop has been accepted to #SIGGRAPH2025!🥳 ✅ A diffusion model that generates motion for arbitrary skeletons ✅ Using only a skeletal structure as input ✅ Learns semantic correspondences across diverse skeletons 🌐 Project: anytop2025.github.io/Anytop-page

English

Brian Gordon retweetledi

Andreas Aristidou@andaristidou·1 Mar

🚀 New preprint! 🚀 Check out AnyTop 🤩 ✅ A diffusion model that generates motion for arbitrary skeletons 🦴 ✅ Using only a skeletal structure as input ✅ Learns semantic correspondences across diverse skeletons 🦅🐒🪲 🔗 Arxiv: arxiv.org/abs/2502.17327

English

189

18.3K

Brian Gordon retweetledi

Daniel Cohen-Or@DanielCohenOr1·22 Şub

Thrilled to see this plot in a recent survey on 'personalized image generation' (arxiv.org/abs/2502.13081) — highlighting the impact of our work! Huge congratulations to my fantastic students, whose creativity and dedication continue to drive exciting advances in the field!

English

160

14.3K

Brian Gordon retweetledi

Rotem Shalev-Arkushin@rotemsh3·17 Şub

Excited to introduce our new work: ImageRAG 🖼️✨ rotem-shalev.github.io/ImageRAG We enhance off-the-shelf generative models with Retrieval-Augmented Generation (RAG) for unknown concept generation, using a VLM-based approach that’s easy to integrate with new & existing models! [1/3]

English

Brian Gordon retweetledi

Guy Tevet@GuyTvt·12 Şub

🚀 Meet DiP: our newest text-to-motion diffusion model! ✨ Ultra-fast generation ♾️ Creates endless, dynamic motions 🔄 Seamlessly switch prompts on the fly Best of all, it's now available in the MDM codebase: github.com/GuyTevet/motio… [1/3]

English

466

38.5K

Brian Gordon retweetledi

moab.arar@ArarMoab·31 Ağu

Checkout our work "GameNGen". A Gaming engine powered by a diffusion-model that simulates DOOM in Real-Time! Find out more: gamengen.github.io Amazing effort and fun collaboration with the incredible @daniva, @yanivle, and @shlomifruchter!

AK@_akhaliq

Google presents Diffusion Models Are Real-Time Game Engines discuss: huggingface.co/papers/2408.14… We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

English

5.9K

Brian Gordon retweetledi

San Lorenzo Redes + 1M²@SanLorenzoRedes·18 Ağu

Sorteo el 18 de septiembre. Camiseta de @SanLorenzo titular original XL. Para participar: 📌Nos tenés que seguir y darle rt a este tweet. 📌Seguir a @boedoenmi. 📌Si querés tener más chances seguinos en nuestro canal de Youtube. 👇 @sanlorenzoredes" target="_blank" rel="nofollow noopener">youtube.com/@sanlorenzored…

Español

431

287

141.1K

Brian Gordon retweetledi

nitzan guetta@nitzanguetta·30 Tem

Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵

AK@_akhaliq

Visual Riddles a Commonsense and World Knowledge Challenge for Large Vision and Language Models Imagine observing someone scratching their arm; to understand why, additional context would be necessary. However, spotting a mosquito nearby would immediately offer a likely explanation for the person's discomfort, thereby alleviating the need for further information. This example illustrates how subtle visual cues can challenge our cognitive skills and demonstrates the complexity of interpreting visual scenarios. To study these skills, we present Visual Riddles, a benchmark aimed to test vision and language models on visual riddles requiring commonsense and world knowledge. The benchmark comprises 400 visual riddles, each featuring a unique image created by a variety of text-to-image models, question, ground-truth answer, textual hint, and attribution. Human evaluation reveals that existing models lag significantly behind human performance, which is at 82\% accuracy, with Gemini-Pro-1.5 leading with 40\% accuracy. Our benchmark comes with automatic evaluation tasks to make assessment scalable. These findings underscore the potential of Visual Riddles as a valuable resource for enhancing vision and language models' capabilities in interpreting complex visual scenarios.

English

16.8K

Brian Gordon@Brian_Gordon13·10 Tem

@eccvconf @FaeghehSardari Is it OK if I haven't received my Springer’s submission link yet?

English

225

European Conference on Computer Vision #ECCV2026@eccvconf·8 Tem

@FaeghehSardari Our Publications Chairs are working on it. An announcement will be made soon. Stay tuned.

English

471

European Conference on Computer Vision #ECCV2026@eccvconf·7 Tem

The #ECCV2024 camera-ready submission instructions for main conference papers has been sent to all authors via email. The deadline for submitting the camera-ready paper is **July 15 (22:00 CEST)**. By the same deadline you also need to cover the paper with a registration.

English

8.3K

Brian Gordon@Brian_Gordon13·2 Tem

We are happy to share Mismatch Quest acceptance to #ECCV2024 @eccvconf ! 🥳 Check out additional details in the project website mismatch-quest.github.io Congrats to the team @YonatanBitton @shafir_yoni, @roopalgarg, Xi Chen, @DaniLischinski , @DanielCohenOr1, Idan Szpektor

Brian Gordon@Brian_Gordon13

1/📄 Excited to introduce our paper "Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment"!🖼️👀 arxiv.org/abs/2312.03766 Website: mismatch-quest.github.io w. @YonatanBitton, @shafir_yoni, @roopalgarg, Xi Chen, @DaniLischinski, @DanielCohenOr1, Idan Szpektor 🧵

English

3.4K

Brian Gordon retweetledi

Kfir Aberman@AbermanKfir·12 Haz

“Monkey See, Monkey Do”! 🐵 A cool new work demonstrating how manipulating self-attention features in diffusion models enable zero-shot motion transfer. It can generate motions that follow a leader dancer in various motifs, including those of a gorilla! monkeyseedocg.github.io

GIF

English

1.6K

Brian Gordon retweetledi

Guy Tevet@GuyTvt·15 Nis

#MDM is now 40X faster 🤩🤩🤩 (~0.4 sec/sample) How come?!? (1) We released the 50 diffusion steps model (instead of 1000 steps) which runs 20X faster. (2) Calling CLIP just once and caching the result runs 2X faster for all models. github.com/GuyTevet/motio…

English

121

18.3K

Brian Gordon retweetledi

AK@_akhaliq·12 Oca

Google announces PALP Prompt Aligned Personalization of Text-to-Image Models paper page: huggingface.co/papers/2401.06… Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impede the fulfillment of user prompts and subject fidelity. We propose a new approach focusing on personalization methods for a single prompt to address this issue. We term our approach prompt-aligned personalization. While this may seem restrictive, our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts, which may pose a challenge for current techniques. In particular, our method keeps the personalized model aligned with a target prompt using an additional score distillation sampling term. We demonstrate the versatility of our method in multi- and single-shot settings and further show that it can compose multiple subjects or use inspiration from reference images, such as artworks. We compare our approach quantitatively and qualitatively with existing baselines and state-of-the-art techniques.

English

101

430

90.2K

Brian Gordon@Brian_Gordon13·14 Ara

10/🏁 Conclusions: We present an end-to-end approach that provides visual and textual feedback in text-to-image models, identifying alignment discrepancies with visual annotations for targeted model refinement. Check out the paper and project website for more details! 🎉

English

Brian Gordon@Brian_Gordon13·14 Ara

9/ More results from SeeTRUE-Feedback test set! 🚀 PaLI 55B model, tuned on TV-Feedback, provides precise feedback, spotlighting textual and visual misalignment sources. The figure captures the essence - accuracy and insights bundled in one!

English

Brian Gordon@Brian_Gordon13·14 Ara

AK@_akhaliq

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment paper page: huggingface.co/papers/2312.03… While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds plausible misaligned captions for a given image and corresponding textual explanations and visual indicators. We also publish a new human curated test set comprising ground-truth textual and visual misalignment annotations. Empirical results show that fine-tuning vision language models on our training set enables them to articulate misalignments and visually indicate them within images, outperforming strong baselines both on the binary alignment classification and the explanation generation tasks.

English

11.7K

Keşfet

@daniva @yanivle @shlomifruchter @SanLorenzo @boedoenmi @eccvconf @FaeghehSardari @YonatanBitton