Gregor Geigle

119 posts

Gregor Geigle

@GregorGeigle

PhD student @Uni_WUE| NLP, Multimodal Vision+Language

เข้าร่วม Aralık 2020

92 กำลังติดตาม189 ผู้ติดตาม

Gregor Geigle รีทวีตแล้ว

Fabian David Schmidt@fdschmidt·20 Şub

Introducing MVL-SIB, a massively multilingual vision-language benchmark for cross-modal topic matching in 205 languages! 🤔Tasks: Given images (sentences), select topically matching sentence (image). Arxiv: arxiv.org/abs/2502.12852 HF: huggingface.co/datasets/WueNL… Details👇

English

1.4K

Gregor Geigle@GregorGeigle·14 Oca

@TrelisResearch @vikhyatk @snowclipsed @teortaxesTex @j0yk1ll No, they compare (I think) adding pooled features vs. concatenate. So x_global + pooled(x_1,...) vs. concat(x_global, pooled(x_1,...). I tested concat(x_global,x_1,...), too, but that was worse despite more features (maybe too many?)

English

Trelis Research@TrelisResearch·13 Oca

@GregorGeigle @vikhyatk @snowclipsed @teortaxesTex @j0yk1ll Interesting, my read from the paper is that concatenation works better (in which case they have more features? I assume?). Table 12

English

vik@vikhyatk·9 Oca

New Moondream 2B release is out! Includes structured outputs, improved text understanding, gaze detection. And probably more things I'm forgetting about right now.

English

128

1.1K

116.7K

Gregor Geigle@GregorGeigle·10 Oca

Thanks to a GPU grant by @huggingface , you can try out Centurio Aya here: huggingface.co/spaces/WueNLP/… (code shamelessly adapted from @mervenoyann demo of Llava-Next)

Gregor Geigle@GregorGeigle

Want to train a *multilingual* LVLM but not sure how? Or looking for a strong model to use? Presenting "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model"! Arxiv: arxiv.org/abs/2501.05122 HF Collection: huggingface.co/collections/Wu…

English

Gregor Geigle@GregorGeigle·10 Oca

@vikhyatk @snowclipsed @teortaxesTex @j0yk1ll Can confirm this works well. Used it for my recent model, too, because those mega long sequence lengths are a pain to train with. Do you pool the crops or concatenate them all together channel-wise? I found pooling to work better, surprisingly.

English

vik@vikhyatk·9 Oca

@snowclipsed @teortaxesTex @j0yk1ll Found the paper, so I'm spared the effort of writing it up myself :) arxiv.org/abs/2403.13043

English

385

Gregor Geigle@GregorGeigle·10 Oca

This was a collaboration with Florian Schneider, @CarolinHolterm, Chris Biemann, Radu Timofte, @anne_lauscher, and @gg42554.

English

Gregor Geigle@GregorGeigle·10 Oca

Next, we apply our lessons learned to train Centurio - state-of-the-art multilingual LVLMs trained with 100 languages based on Aya-Expanse @CohereForAI and Qwen 2.5 @Alibaba_Qwen. Weights on HuggingFace!

English

113

Gregor Geigle@GregorGeigle·10 Oca

English

3.8K

Gregor Geigle รีทวีตแล้ว

Fabian David Schmidt@fdschmidt·12 Ara

📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at @Mila_Quebec with @davlanade @gg42554 @licwu Datasets: huggingface.co/datasets/WueNL… huggingface.co/datasets/WueNL… Details to follow👇

English

4.6K

Gregor Geigle@GregorGeigle·18 Kas

@mervenoyann @visheratin Right, good point that. As a PhD student at a university, I don't have to pay too much attention if something is commercially permissible but that's not true for others, of course.

English

merve@mervenoyann·18 Kas

@GregorGeigle @visheratin but I believe NLLB models have NC licenses no?

English

merve@mervenoyann·18 Kas

Best multilingual SigLIP ever is now compatible with transformers 🫡🤗

English

213

18.6K

Gregor Geigle@GregorGeigle·7 Kas

2. "Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?" 🧵: twitter.com/GregorGeigle/s…

Gregor Geigle@GregorGeigle

"Grounding tasks improve fine-grained image understanding which helps reduce visual hallucinations in Vision-LLMs" Intuitive claim and often repeated but is it *true*? We tested it in our recent paper: arxiv.org/abs/2406.14492 🧵 (spoiler: no)

English

Gregor Geigle@GregorGeigle·7 Kas

1. "African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification" 🧵: twitter.com/GregorGeigle/s…

Gregor Geigle@GregorGeigle

Could you use your Vision-LLM to help identify dogs, plants, dishes, or other things? We investigated and let's just say, do not rely on them when foraging mushrooms in the wild... Paper: arxiv.org/abs/2406.14496 Code: github.com/gregor-ge/FOCI… 🧵

English

122

Gregor Geigle@GregorGeigle·7 Kas

The monkey's paw worked well, so I will present 2(!) posters at @emnlpmeeting Wednesday at 4pm. I will be easy to spot - just look for the guy with crutches🩼

English

423

Gregor Geigle รีทวีตแล้ว

Fabian David Schmidt@fdschmidt·6 Kas

Excited to present NLLB-LLM2Vec at @emnlpmeeting Tuesday 2pm! Drop by our poster to chat about multilingual & multimodal research. NLLB-LLM2Vec can now easily be used with @huggingface AutoModels — try it esp. for embedding low-resource languages! 🌐 huggingface.co/fdschmidt93/NL…

Fabian David Schmidt@fdschmidt

Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports cross-lingual NLU in 200+ languages🔥 Joint work w/ Philipp Borchert, @licwu, and @gg42554 during my great research stay at @cambridgeltl

English

4.4K

Gregor Geigle@GregorGeigle·23 Eki

Awesome work! I don't know why but it feels strange to see my University logo in the same figure as these big labs & groups😅

Xiang Yue@xiangyue96

🌍 I’ve always had a dream of making AI accessible to everyone, regardless of location or language. However, current open MLLMs often respond in English, even to non-English queries! 🚀 Introducing Pangea: A Fully Open Multilingual Multimodal LLM supporting 39 languages! 🌐✨ neulab.github.io/Pangea/ arxiv.org/pdf/2410.16153 The Pangea family includes three major components: 🔥 Pangea-7B: A state-of-the-art multilingual multimodal LLM capable of 39 languages! Not only does it excel in multilingual scenarios, but it also matches or surpasses English-centric models like Llama 3.2, Molmo, and LlavaOneVision in English performance. 📝 PangeaIns: A 6M multilingual multimodal instruction tuning dataset across 39 languages. 🗂️ With 40% English instructions and 60% multilingual instructions, it spans various domains, including 1M culturally-relevant images sourced from LAION-Multi. 🎨 🏆 PangeaBench: A comprehensive evaluation benchmark featuring 14 datasets in 47 languages. Evaluation can be tricky, so we carefully curated existing benchmarks and introduced two new datasets: xChatBench (human-annotated wild queries with fine-grained evaluation criteria) and xMMMU (a meticulously machine-translated version of MMMU). 🙌 This is a joint leading effort with @yueqi_song. Also kudos to the amazing team @AkariAsai, @seungonekim, @Jeande_d, @simi_97k, @anjali_ruban, @lintangsutawika, @Sathya8NR, @gneubig for their hard work! Check out more results and insights we conclude from our training in the thread below. 👇

English

1.8K

Gregor Geigle@GregorGeigle·18 Eki

@giffmana I knew of the B/16 model but must have missed that one. So I tested it to shamelessly plug my work (github.com/gregor-ge/Babe…): For classification, it is by far the best for English + mid/high-res languages. Retrieval lags behind NLLB-SigLIP (-English). tl;dr: SigLIP sweep

English

421

Lucas Beyer (bl16)@giffmana·17 Eki

Yall know SigLIP-So400m right? Did you know there is also an international version of it? It slipped through the cracks during the original release, but now it’s on timm too. What to expect: slightly worse EN benchmarks, but significantly better language and culture coverage!

Ross Wightman@wightmanr

OpenCLIP passed 10K stars on GitHub this week. A big milestone for any open-source project. 🍻 to the many collaborators that made that possible. Coincidentally, I pushed a new release with a port of the largest multi-lingual SigLIP -- a SO400M/16 @ 256x256 that appeared on big_vision a little while back. Now on the @huggingface hub and useable via timm or OpenCLIP (update your timm too)! huggingface.co/timm/ViT-SO400…

English

154

15.5K

Gregor Geigle@GregorGeigle·9 Ağu

@ChenLiu47008770 Not surprising since your work was the main inspiration for the master's thesis 👍 We did not use Fisher's information, though. Only a simple epoch-wise schedule top-down.

English

211

Chen Cecilia Liu@ChenLiu47008770·9 Ağu

@GregorGeigle 😄method name sounds familiar

English

Gregor Geigle@GregorGeigle·9 Ağu

A broken ankle might stop me from going to #ACL2024 myself but it won't stop *you* from checking out my accepted papers (1x main conference, 2x workshop):

English

1.4K

Gregor Geigle@GregorGeigle·9 Ağu

... 2) the thesis of my first Master's student Max: "Improving Vision-Language Cross-Lingual Transfer with Scheduled Unfreezing" (in the workshop proceedings).

English

114

Gregor Geigle@GregorGeigle·9 Ağu

At the ALVR workshop, you can find 1) our mBLIP (arxiv.org/abs/2307.06930) model and...

English

129

ค้นพบ

@TrelisResearch @vikhyatk @snowclipsed @teortaxesTex @j0yk1ll @huggingface @mervenoyann @CarolinHolterm