Jindřich Libovický

432 posts

Jindřich Libovický

Jindřich Libovický

@jlibovicky

🇨🇿 🇪🇺 Researcher at @ufal_cuni. Working on multilingual NLP and neural machine translation. Views my own. He/him

Prague, Czech Republic Beigetreten Temmuz 2011
432 Folgt953 Follower
Jindřich Libovický
Jindřich Libovický@jlibovicky·
Join Mu-SHROOM 🍄, a SemEval 2025 shared task on detecting hallucination spans in multilingual LLM outputs! 🌍 Includes Czech with regional Czech questions 🇨🇿. Do you think you can spot when something isn’t true? 🤔 Try it out! 👉 helsinki-nlp.github.io/shroom #SemEval2025 #NLProc
English
0
2
7
615
Jindřich Libovický
Jindřich Libovický@jlibovicky·
This is going to be fun! 🤓 We have three years to spend 6.5M CZK on improving multilingual tokenization. The goal is to make subwords more alignable across languages and help languages that suffer from over-segmentation with current models.
Institute of Formal and Applied Linguistics@ufal_cuni

Good news! 🥳 GAČR will fund two of our projects: 👉 @jlibovicky proposes to better tokenization for #LLMs and machine translation 👉 Veronika Kolářová will study syntactic features of Czech non-verbal predicates ➕ Dominik Macháček receives Postdoc Individual Fellowship! 💪

English
0
0
13
457
Jindřich Libovický
Jindřich Libovický@jlibovicky·
There's no clear winner this year's MRL shared task, but we ended up in the cluseer of top-3 teams. I'm so proud of you, folks ☺️
Institute of Formal and Applied Linguistics@ufal_cuni

Finally, @kat_haem and Gianluca Vico presented one of the three price-winning 🏆🤑 submissons for the shared task on multilingual named entity recognition and question answering! w/ @AndreiM85400815, @jindra_helcl and @jlibovicky. Congrats! aclanthology.org/2024.mrl-1.29

English
0
0
12
538
Jindřich Libovický
Jindřich Libovický@jlibovicky·
This week I am at #EMNLP2024 in Miami 🌴🇺🇸. Find me 🕵️ or message 💌 me if you want to chat about multilinguality or tokenization and stop by our poster on Tuesday at 2 p.m., I'll present our paper on lexically Grounded Subword Segmentation aclanthology.org/2024.emnlp-mai…
Jindřich Libovický tweet media
English
0
1
9
647
Jindřich Libovický retweetet
Jindra Helcl
Jindra Helcl@jindra_helcl·
... starring @jlibovicky and me as young and perspective scientists with their impeccable movie editing skills
English
0
1
4
168
Jindřich Libovický
Jindřich Libovický@jlibovicky·
👍 It works great for preserving morpheme boundaries. 👍 Does a good job in POS tagging. 👎 No improvement in machine translation. And bad news, @zouharvi, our downstream performance does not correlate with Rényi efficiency. 🤷‍♂️ 🧵4/4
Jindřich Libovický tweet media
English
1
0
4
226
Jindřich Libovický
Jindřich Libovický@jlibovicky·
Then, we find segmentations with subwords with the closest embedding closest to the word embedding. We collect bigram stats from those and use them in a bigram-LM-based segmenter (a generalization of SentencePiece). And we also do some experiments... 🧵3/4
English
1
0
3
208
Jindřich Libovický
Jindřich Libovický@jlibovicky·
In the paper introducing the dataset aclanthology.org/2024.alvr-1.9.…, we also present a method based on hard-negative sampling on the text side of the model that significantly improves the model's ability to distinguish details.
Jindřich Libovický tweet media
English
0
0
0
108
Jindřich Libovický
Jindřich Libovický@jlibovicky·
It consists of minimum pairs of images and captions derived from the MS COCO test set. Annotators used object detection and Stable Diffusion Inpanting 👨‍🎨👩‍🎨 to get images with either different objects or objects of different colors and sizes. Everything's 100% human-supervised. 💪
Jindřich Libovický tweet media
English
1
0
0
131
Jindřich Libovický
Jindřich Libovický@jlibovicky·
📣 We have a dataset! ❓Have you also noticed that language-vision encoders like CLIP do not pay attention to details? ❓ Do you think your model is doing better? 👉 InpaintCOCO dataset huggingface.co/datasets/phiyo… is here for you. Work of @phiyodr, folks from @unibw_m, and myself.
Jindřich Libovický tweet media
English
1
1
8
461