nitzan guetta

26 posts

nitzan guetta

@nitzanguetta

Katılım Temmuz 2023

13 Takip Edilen14 Takipçiler

nitzan guetta retweetledi

Yoav HaCohen@yoavhacohen·6 Oca

🚀 LTX-2 is now open source: text → audio + video. Today we’re releasing LTX-2, the first open-source foundation model for joint audiovisual generation, together with a full technical report. 🧵👇

English

227

1.7K

97.2K

nitzan guetta retweetledi

LTX@ltx_model·6 Oca

AI video shouldn’t be locked behind closed systems. We’re releasing LTX-2 as a truly open-source AI video model. Here’s @ZeevFarbman (CEO & Co-Founder, Lightricks) on why openness, local access, and community matter. 🧵

English

108

673

134.2K

nitzan guetta retweetledi

Yoav HaCohen@yoavhacohen·8 Tem

🚀 A new way to control AI video generation - by concatenation of control signals and LoRAs trained on just a few samples We’re releasing 3 control LoRAs for LTX-Video: open-pose, depth, and canny edges. Plus: training code so you can build your own types of control. 🧵

English

578

84.6K

nitzan guetta@nitzanguetta·9 Oca

🚀🚀🚀 OpenAI O1, Gemini-2.0 and Gemini-2.0-thinking are on the #VisualRiddles leaderboard! Multiple Choice: Gemini-2.0-thinking hits 60% accuracy (84% with hints!) Open-Ended (Auto-Rating): O1 leads with 58% accuracy. Check it out: 🔗 visual-riddles.github.io @YonatanBitton

English

1.2K

nitzan guetta retweetledi

Yonatan Bitton@YonatanBitton·14 Ara

🎉 We had a blast presenting #VisualRiddles at the D&B Track at #NeurIPS2024! 🙏 Thanks to everyone who stopped by—We hope this benchmark will push models with better visual reasoning and world knowledge. 🌐 Missed it? Learn more here: visual-riddles.github.io @nitzanguetta

Yonatan Bitton@YonatanBitton

If you missed yesterday, the session continues now!

English

7.4K

nitzan guetta@nitzanguetta·29 Kas

🎟️ Catch us at #NeurIPS2024: Wednesday, 11/12 Creative AI: 11:00–14:00, 16:30–19:30 Google Booth: 12:30–13:00 Thursday, 12/12 Poster Session 3: 11:00–14:00 We can’t wait to see you there! @YonatanBitton

English

151

nitzan guetta@nitzanguetta·29 Kas

Additional models update: Claude 3.5 Sonnet, GPT4o, Qwen-VL-Max & Molmo-7B have joined our leaderboard! Multiple Choice: GPT4o leads with 55% accuracy (83% with hints!) Open-Ended (Auto-Rating): Gemini Pro 1.5 remains at the top with 53% accuracy. 🔗 visual-riddles.github.io

English

167

nitzan guetta@nitzanguetta·29 Kas

🚀 Big news for #VisualRiddles! We’re excited to announce that Visual Riddles has been accepted to the Creative AI Track at NeurIPS 2024! 🎉 Come explore our Visual Riddles Gallery—a showcase of cognitive and visual challenges for multimodal AI. 🧵

GIF

English

3.1K

nitzan guetta@nitzanguetta·2 Eki

Excited to announce that our paper has been accepted to NeurIPS D&B 2024! Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Read the paper here: lnkd.in/dfHzA9eg Check out the project website: lnkd.in/dfQgKb24

nitzan guetta@nitzanguetta

Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵

English

3.4K

nitzan guetta@nitzanguetta·30 Tem

Work done with the amazing folks at @bengurionu @BarIlanU @nlphuji @Google @TelAvivUni ! 🎉 @lovodkin93 aviyamaimon eliyahabba @RoyiRassin We are grateful to our advisors 👩‍🏫: @YonatanBitton idanszpektor @amirgloberson yuvalelovici

English

647

nitzan guetta@nitzanguetta·30 Tem

🥳 We release the data, annotations, and code publicly! Paper: arxiv.org/abs/2407.19474 Website: visual-riddles.github.io Explorer: huggingface.co/spaces/visual-… Data: huggingface.co/datasets/visua…

English

108

nitzan guetta@nitzanguetta·30 Tem

AK@_akhaliq

Visual Riddles a Commonsense and World Knowledge Challenge for Large Vision and Language Models Imagine observing someone scratching their arm; to understand why, additional context would be necessary. However, spotting a mosquito nearby would immediately offer a likely explanation for the person's discomfort, thereby alleviating the need for further information. This example illustrates how subtle visual cues can challenge our cognitive skills and demonstrates the complexity of interpreting visual scenarios. To study these skills, we present Visual Riddles, a benchmark aimed to test vision and language models on visual riddles requiring commonsense and world knowledge. The benchmark comprises 400 visual riddles, each featuring a unique image created by a variety of text-to-image models, question, ground-truth answer, textual hint, and attribution. Human evaluation reveals that existing models lag significantly behind human performance, which is at 82\% accuracy, with Gemini-Pro-1.5 leading with 40\% accuracy. Our benchmark comes with automatic evaluation tasks to make assessment scalable. These findings underscore the potential of Visual Riddles as a valuable resource for enhancing vision and language models' capabilities in interpreting complex visual scenarios.

English

16.8K

Keşfet

@ZeevFarbman @YonatanBitton @bengurionu @BarIlanU @nlphuji @Google @TelAvivUni @lovodkin93