Shaked Brody

30 posts

Shaked Brody

Shaked Brody

@shakedbr

Applied Scientist @AWSCloud

Katılım Nisan 2019
155 Takip Edilen124 Takipçiler
Shaked Brody retweetledi
Shaked Brody retweetledi
Noam Rotstein
Noam Rotstein@NoamRot·
🥁 🥁 🥁 Announcing our new work- Paint by Inpaint: Learning to Add Image Objects by Removing Them First Together with Navve Wasserman, @roy_ganz, and Ron Kimmel, we've developed a framework designed for adding objects to images! Paper Page: rotsteinnoam.github.io/Paint-by-Inpai… 1/7
Noam Rotstein tweet media
English
3
6
16
1.2K
Shaked Brody retweetledi
Roy Ganz
Roy Ganz@roy_ganz·
I am thrilled to announce that our work was accepted as SPOTLIGHT to @CVPR! The official code is available at github.com/amazon-science… (currently, code and checkpoints for inference. Training will be available soon). @AmazonScience
AK@_akhaliq

Amazon presents Question Aware Vision Transformer for Multimodal Reasoning paper page: huggingface.co/papers/2402.05… Vision-Language (VL) models have gained significant research focus, enabling remarkable advances in multimodal reasoning. These architectures typically comprise a vision encoder, a Large Language Model (LLM), and a projection module that aligns visual features with the LLM's representation space. Despite their success, a critical limitation persists: the vision encoding process remains decoupled from user queries, often in the form of image-related questions. Consequently, the resulting visual features may not be optimally attuned to the query-specific elements of the image. To address this, we introduce QA-ViT, a Question Aware Vision Transformer approach for multimodal reasoning, which embeds question awareness directly within the vision encoder. This integration results in dynamic visual features focusing on relevant image aspects to the posed question. QA-ViT is model-agnostic and can be incorporated efficiently into any VL architecture. Extensive experiments demonstrate the effectiveness of applying our method to various multimodal architectures, leading to consistent improvement across diverse tasks and showcasing its potential for enhancing visual and scene-text understanding.

English
5
33
152
35.2K
Shaked Brody retweetledi
Shaked Brody retweetledi
Amir Barkol
Amir Barkol@BarkolAmir·
I call upon all Harry Potter fans around the world to share this picture in memory of the lovely Noya, a 13-year-old girl on the autistic spectrum who adored Harry Potter and whose tracks disappeared during the murderous terror attack by Hamas. Unfortunately, no whisper helped, and her story did not have a happy ending. Last night, her body was found. Hamas killed her too. Share in memory of Noya! #HamasisISIS
Amir Barkol tweet media
English
311
1.8K
4.8K
177.5K
Shaked Brody
Shaked Brody@shakedbr·
@yosit קיבלנו את המטענים הניידים. תודה!!
עברית
0
0
9
786
Yosi 'Giuseppe' Taguri
תכירו, זה חמ״ל סוללות. בקשה בסוף הציוץ. כאן אנחנו פותחים את האריזה, מטעינים את הסוללה ל 100% אורזים בחמישיות עם כבלים ושולחים לשטח. רוקנו את המחסנים של בנדא - היבואן הראשי. עכשיו מרוקנים את מחסני היבואנים הקטנים. אנחנו מחכים כרגע למטוסים שיתפנו כדי להוביל עוד אלפי סוללות ניידות לארץ אבל זה ייקח זמן והנה הבקשה: אנחנו צריכים power banks, כאלה שנטענים מהר. זה הזמן לפשפש אצל השכנים והחברים. כל power bank שתביאו יגיע לחייל בשטח. לאן להביא: חמ״ל סוללות במיקסר גני התערוכה. זה המיקסר של הקומה הבודדת מול ביתן 10. לבקש את יוסי מחדר e1.
Yosi 'Giuseppe' Taguri tweet media
עברית
17
89
856
65K
Shaked Brody retweetledi
Aviv Slobodkin @NeurIPS
Aviv Slobodkin @NeurIPS@lovodkin93·
Ever skimmed an article, pinpointing key info, and wished for a tailor-made summary without crafting it yourself?🤔 Introducing SummHelper: your go-to for personalized summarization. 📜✏️ w/ Niv Nachum @pyshmulik @obspp18 Ido Dagan 1/n
English
2
19
36
3.7K
Shaked Brody retweetledi
AK
AK@_akhaliq·
FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval. paper page: huggingface.co/papers/2305.17… demo: huggingface.co/spaces/noamrot…
AK tweet media
English
0
16
90
33K
Shaked Brody retweetledi
AI Safety Papers
AI Safety Papers@safe_paper·
On the Expressivity Role of LayerNorm in Transformers' Attention Shaked Brody (@shakedbr), @urialon1, Eran Yahav (@yahave) Notes: A cool short paper on the role of LayerNorm in transformers. The authors break down this operator into two things: projection and scaling.
English
1
3
17
2.2K
Shaked Brody
Shaked Brody@shakedbr·
(a) projection allows the attention to create an attention query that attends to all keys equally, offloading the need to learn this operation by the attention. (b) scaling prevents keys vector from being "unselectable". 4/5
Shaked Brody tweet mediaShaked Brody tweet mediaShaked Brody tweet media
English
1
0
8
529
Shaked Brody
Shaked Brody@shakedbr·
I'm thrilled to announce that our paper "On the Expressivity Role of LayerNorm in Transformers' Attention" has been accpeted to Findings of ACL 2023 #ACL2023. 1/5
Shaked Brody tweet media
English
3
36
167
19.1K