Shaked Brody (@shakedbr) - Twitter Profili | Zamantika Mersobahis Locabet

Shaked Brody retweetledi

Amit Bracha@amit_bracha·2 Tem

🎉 Exciting news! Our paper has been accepted to #ECCV2024 @eccvconf! Check out our project page for more details, including our results on the DTU dataset! Special thanks to @janusch_patas and @zhenjun_zhao for sharing our work! 🙏

MrNeRF@janusch_patas

Surface Reconstruction from Gaussian Splatting via Novel Stereo Views arxiv.org/abs/2404.01810 Project: gs2mesh.github.io

English

0

5

13

1.3K

Shaked Brody retweetledi

Noam Rotstein@NoamRot·29 Haz

Our Paint-by-Inpaint demo is now live! 🥳 Big thanks to @_akhaliq and @Gradio for sharing our paper, and to @huggingface for the GPU grant! 🤗 Check it out here: Space: huggingface.co/spaces/paint-b… Project Page: rotsteinnoam.github.io/Paint-by-Inpai…

AK@_akhaliq

Paint by Inpaint Learning to Add Image Objects by Removing Them First Image editing has advanced significantly with the introduction of text-conditioned diffusion models. Despite this progress, seamlessly adding objects to images based on textual instructions without

English

1

29

117

22.8K

Shaked Brody retweetledi

Noam Rotstein@NoamRot·1 May

🥁 🥁 🥁 Announcing our new work- Paint by Inpaint: Learning to Add Image Objects by Removing Them First Together with Navve Wasserman, @roy_ganz, and Ron Kimmel, we've developed a framework designed for adding objects to images! Paper Page: rotsteinnoam.github.io/Paint-by-Inpai… 1/7

English

3

6

16

1.2K

Shaked Brody retweetledi

Roy Ganz@roy_ganz·28 Nis

I am thrilled to announce that our work was accepted as SPOTLIGHT to @CVPR! The official code is available at github.com/amazon-science… (currently, code and checkpoints for inference. Training will be available soon). @AmazonScience

AK@_akhaliq

Amazon presents Question Aware Vision Transformer for Multimodal Reasoning paper page: huggingface.co/papers/2402.05… Vision-Language (VL) models have gained significant research focus, enabling remarkable advances in multimodal reasoning. These architectures typically comprise a vision encoder, a Large Language Model (LLM), and a projection module that aligns visual features with the LLM's representation space. Despite their success, a critical limitation persists: the vision encoding process remains decoupled from user queries, often in the form of image-related questions. Consequently, the resulting visual features may not be optimally attuned to the query-specific elements of the image. To address this, we introduce QA-ViT, a Question Aware Vision Transformer approach for multimodal reasoning, which embeds question awareness directly within the vision encoder. This integration results in dynamic visual features focusing on relevant image aspects to the posed question. QA-ViT is model-agnostic and can be incorporated efficiently into any VL architecture. Extensive experiments demonstrate the effectiveness of applying our method to various multimodal architectures, leading to consistent improvement across diverse tasks and showcasing its potential for enhancing visual and scene-text understanding.

English

5

33

152

35.2K

Shaked Brody retweetledi

Rishabh Anand@rishabh16_·12 Şub

My two contenders: 1) Why LayerNorm is so helpful in Attention and what it’s actually doing under the hood (arxiv.org/abs/2305.02582) 2) Explaining the oversquashing phenomenon in GNNs (arxiv.org/abs/2006.05205) Really cool papers with superrr approachable math/theory 🙌🏻

Sam Power@sp_monte_carlo

what paper (not your own, maybe not even in your own area) can you not stop telling people about?

English

6

80

579

80.8K

Shaked Brody retweetledi

Noam Rotstein@NoamRot·17 Kas

Thanks @_akhaliq! I am happy to share that our FuseCap paper has been accepted to @wacv_official! Project page: rotsteinnoam.github.io/FuseCap/

AK@_akhaliq

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval. paper page: huggingface.co/papers/2305.17… demo: huggingface.co/spaces/noamrot…

English

1

8

16

12.7K

Shaked Brody retweetledi

Amir Barkol@BarkolAmir·19 Eki

I call upon all Harry Potter fans around the world to share this picture in memory of the lovely Noya, a 13-year-old girl on the autistic spectrum who adored Harry Potter and whose tracks disappeared during the murderous terror attack by Hamas. Unfortunately, no whisper helped, and her story did not have a happy ending. Last night, her body was found. Hamas killed her too. Share in memory of Noya! #HamasisISIS

English

311

1.8K

4.8K

177.5K

Shaked Brody@shakedbr·12 Eki

@yosit קיבלנו את המטענים הניידים. תודה!!

עברית

0

9

786

Yosi 'Giuseppe' Taguri@yosit·12 Eki

תכירו, זה חמ״ל סוללות. בקשה בסוף הציוץ. כאן אנחנו פותחים את האריזה, מטעינים את הסוללה ל 100% אורזים בחמישיות עם כבלים ושולחים לשטח. רוקנו את המחסנים של בנדא - היבואן הראשי. עכשיו מרוקנים את מחסני היבואנים הקטנים. אנחנו מחכים כרגע למטוסים שיתפנו כדי להוביל עוד אלפי סוללות ניידות לארץ אבל זה ייקח זמן והנה הבקשה: אנחנו צריכים power banks, כאלה שנטענים מהר. זה הזמן לפשפש אצל השכנים והחברים. כל power bank שתביאו יגיע לחייל בשטח. לאן להביא: חמ״ל סוללות במיקסר גני התערוכה. זה המיקסר של הקומה הבודדת מול ביתן 10. לבקש את יוסי מחדר e1.

עברית

17

89

856

65K

Shaked Brody retweetledi

Aviv Slobodkin @NeurIPS@lovodkin93·6 Eyl

Ever skimmed an article, pinpointing key info, and wished for a tailor-made summary without crafting it yourself?🤔 Introducing SummHelper: your go-to for personalized summarization. 📜✏️ w/ Niv Nachum @pyshmulik @obspp18 Ido Dagan 1/n

English

2

19

36

3.7K

Shaked Brody@shakedbr·13 Tem

We're presenting our paper today at #RepL4NLP Workshop @aclmeeting , 11:00-12:00, poster session. Come and see me!

Shaked Brody@shakedbr

I'm thrilled to announce that our paper "On the Expressivity Role of LayerNorm in Transformers' Attention" has been accpeted to Findings of ACL 2023 #ACL2023. 1/5

English

0

1

8

548

Shaked Brody retweetledi

Eran Yahav@yahave·11 Tem

@shakedbr finally gets to attend a non-virtual conference, presenting arxiv.org/abs/2305.02582 at #ACL2023 joint work with @urialon1 twitter.com/shakedbr/statu… for more details

Shaked Brody@shakedbr

I'm thrilled to announce that our paper "On the Expressivity Role of LayerNorm in Transformers' Attention" has been accpeted to Findings of ACL 2023 #ACL2023. 1/5

English

0

3

8

526

Shaked Brody retweetledi

AK@_akhaliq·1 Haz

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval. paper page: huggingface.co/papers/2305.17… demo: huggingface.co/spaces/noamrot…

English

0

16

90

33K

Shaked Brody retweetledi

Noam Rotstein@NoamRot·30 May

Introducing FuseCap! A framework designed to generate semantically rich image captions. Project page: rotsteinnoam.github.io/FuseCap/ Paper: arxiv.org/abs/2305.17718 Demo: huggingface.co/spaces/noamrot… w/ David Bensaid, @shakedbr, @roy_ganz, and Ron Kimmel. 🧵[1/6]

English

2

4

10

623

Shaked Brody retweetledi

galvanize (gail weiss) 💔🎗️@gail_w·11 May

Distractors hate him: this one weird trick* helps language models solve reasoning tasks even in the presence of irrelevant information! arxiv.org/abs/2305.06349 *memorising knowledge instead of holding it in context w/ @ZemingChen5, @ericmitchellai, @real_asli, and @ABosselut

English

0

6

18

1.9K

Shaked Brody retweetledi

AI Safety Papers@safe_paper·7 May

On the Expressivity Role of LayerNorm in Transformers' Attention Shaked Brody (@shakedbr), @urialon1, Eran Yahav (@yahave) Notes: A cool short paper on the role of LayerNorm in transformers. The authors break down this operator into two things: projection and scaling.

English

1

3

17

2.2K

Shaked Brody@shakedbr·5 May

Code: github.com/tech-srl/layer… Paper: arxiv.org/abs/2305.02582 5/5

Français

1

0

10

485

Shaked Brody@shakedbr·5 May

(a) projection allows the attention to create an attention query that attends to all keys equally, offloading the need to learn this operation by the attention. (b) scaling prevents keys vector from being "unselectable". 4/5

English

1

0

8

529

Shaked Brody@shakedbr·5 May

I'm thrilled to announce that our paper "On the Expressivity Role of LayerNorm in Transformers' Attention" has been accpeted to Findings of ACL 2023 #ACL2023. 1/5

English

3

36

167

19.1K

Shaked Brody

Keşfet