nitzan guetta

26 posts

nitzan guetta

nitzan guetta

@nitzanguetta

Katılım Temmuz 2023
13 Takip Edilen14 Takipçiler
nitzan guetta retweetledi
Yoav HaCohen
Yoav HaCohen@yoavhacohen·
🚀 LTX-2 is now open source: text → audio + video. Today we’re releasing LTX-2, the first open-source foundation model for joint audiovisual generation, together with a full technical report. 🧵👇
English
62
227
1.7K
97.2K
nitzan guetta retweetledi
LTX
LTX@ltx_model·
AI video shouldn’t be locked behind closed systems. We’re releasing LTX-2 as a truly open-source AI video model. Here’s @ZeevFarbman (CEO & Co-Founder, Lightricks) on why openness, local access, and community matter. 🧵
English
45
108
673
134.2K
nitzan guetta retweetledi
Yoav HaCohen
Yoav HaCohen@yoavhacohen·
🚀 A new way to control AI video generation - by concatenation of control signals and LoRAs trained on just a few samples We’re releasing 3 control LoRAs for LTX-Video: open-pose, depth, and canny edges. Plus: training code so you can build your own types of control. 🧵
English
17
80
578
84.6K
nitzan guetta
nitzan guetta@nitzanguetta·
🚀🚀🚀 OpenAI O1, Gemini-2.0 and Gemini-2.0-thinking are on the #VisualRiddles leaderboard! Multiple Choice: Gemini-2.0-thinking hits 60% accuracy (84% with hints!) Open-Ended (Auto-Rating): O1 leads with 58% accuracy. Check it out: 🔗 visual-riddles.github.io @YonatanBitton
nitzan guetta tweet media
English
0
1
5
1.2K
nitzan guetta
nitzan guetta@nitzanguetta·
🎟️ Catch us at #NeurIPS2024: Wednesday, 11/12 Creative AI: 11:00–14:00, 16:30–19:30 Google Booth: 12:30–13:00 Thursday, 12/12 Poster Session 3: 11:00–14:00 We can’t wait to see you there! @YonatanBitton
English
0
0
0
151
nitzan guetta
nitzan guetta@nitzanguetta·
Additional models update: Claude 3.5 Sonnet, GPT4o, Qwen-VL-Max & Molmo-7B have joined our leaderboard! Multiple Choice: GPT4o leads with 55% accuracy (83% with hints!) Open-Ended (Auto-Rating): Gemini Pro 1.5 remains at the top with 53% accuracy. 🔗 visual-riddles.github.io
nitzan guetta tweet media
English
1
0
0
167
nitzan guetta
nitzan guetta@nitzanguetta·
🚀 Big news for #VisualRiddles! We’re excited to announce that Visual Riddles has been accepted to the Creative AI Track at NeurIPS 2024! 🎉 Come explore our Visual Riddles Gallery—a showcase of cognitive and visual challenges for multimodal AI. 🧵
GIF
English
1
11
17
3.1K
nitzan guetta
nitzan guetta@nitzanguetta·
Excited to announce that our paper has been accepted to NeurIPS D&B 2024! Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Read the paper here: lnkd.in/dfHzA9eg Check out the project website: lnkd.in/dfQgKb24
nitzan guetta@nitzanguetta

Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵

English
0
3
17
3.4K
nitzan guetta
nitzan guetta@nitzanguetta·
Can you answer these riddles? We are happy to present our new paper “Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models”. Paper: Website: visual-riddles.github.io 🧵
nitzan guetta tweet media
AK@_akhaliq

Visual Riddles a Commonsense and World Knowledge Challenge for Large Vision and Language Models Imagine observing someone scratching their arm; to understand why, additional context would be necessary. However, spotting a mosquito nearby would immediately offer a likely explanation for the person's discomfort, thereby alleviating the need for further information. This example illustrates how subtle visual cues can challenge our cognitive skills and demonstrates the complexity of interpreting visual scenarios. To study these skills, we present Visual Riddles, a benchmark aimed to test vision and language models on visual riddles requiring commonsense and world knowledge. The benchmark comprises 400 visual riddles, each featuring a unique image created by a variety of text-to-image models, question, ground-truth answer, textual hint, and attribution. Human evaluation reveals that existing models lag significantly behind human performance, which is at 82\% accuracy, with Gemini-Pro-1.5 leading with 40\% accuracy. Our benchmark comes with automatic evaluation tasks to make assessment scalable. These findings underscore the potential of Visual Riddles as a valuable resource for enhancing vision and language models' capabilities in interpreting complex visual scenarios.

English
1
14
34
16.8K