ArmelRandy

79 posts

ArmelRandy

@RandyZebaze

PhD Student @InriaParisNLP | MVA 2022 @ENS_ParisSaclay | X19 @Polytechnique

Katılım Şubat 2022

103 Takip Edilen202 Takipçiler

Sabitlenmiş Tweet

ArmelRandy@RandyZebaze·21 Ağu

🎉 Grateful and happy to share that two of our papers were accepted to #EMNLP2025 Findings! 🚀 [1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation [2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation A big thank you to my amazing co-authors! 🙌 @bensagot @RABawden @InriaParisNLP

English

533

ArmelRandy retweetledi

Junior cedric Tonga@tonga_cedric·28 Oca

A bit late 😅 but thrilled to share 🎉 our paper “LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction”has been accepted to #EACL2026! Thrd (1/7)

English

187

ArmelRandy retweetledi

Rohan Paul@rohanpaul_ai·18 Eki

The paper tests whether “thinking tokens” help translation and finds they mostly do not. The team tests big reasoning models that write hidden thoughts before the answer. They compare translations with and without those thoughts. Quality barely changes across many language pairs. They also train small models to copy step by step explanations before the final translation. Those models do worse than simple training that maps source text to target text. They try better intermediate traces that are specific to translation. These only help when the traces include actual draft translations. That means the useful signal is more parallel text, not explanations about how to translate. Using stronger teacher translations as targets beats teaching the student to think. Splitting sentences into smaller phrase pairs helps a bit, especially when data is scarce. Extra reinforcement learning does not change the ranking. Plain input to output training still gives the best quality for the compute used. ---- Paper – arxiv. org/abs/2510.11919 Paper Title: "LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens"

English

4.4K

ArmelRandy@RandyZebaze·21 Ağu

[1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation Arxiv: arxiv.org/abs/2503.04554 Github: github.com/ArmelRandy/com… [2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation Arxiv: arxiv.org/abs/2508.08680 Github: github.com/ArmelRandy/top…

English

102

ArmelRandy@RandyZebaze·21 Ağu

English

533

ArmelRandy retweetledi

Lydia Nishimwe@LydiaNishimwe·3 Tem

🎓 I defended my PhD in Machine Translation last month! Grateful to my colleagues at @inria_paris for the support & collaboration throughout this journey. 🎯 Open to Work - AI/NLP Research Scientist or Engineer roles, starting September 2025, on-site in the Paris area or remote.

English

415

ArmelRandy retweetledi

Slator@slatornews·28 Mar

👉 slator.ch/InriaAITransla… Researchers at @Inria 🇫🇷 demonstrate how to improve #AI #translation for low-resource languages by breaking ⛓️‍💥 sentences into simpler phrases, translating each using in-context examples, and using these pairs to guide translation. #xl8 #t9n #LLMs #LRLs @RandyZebaze @bensagot @RABawden @InriaParisNLP

English

246

ArmelRandy retweetledi

Rohan Paul@rohanpaul_ai·7 Mar

LLMs struggle with machine translation for low-resource languages, even with similar examples. This paper introduces Compositional Translation (CompTra). It decomposes sentences into phrases, translates each using retrieved examples, and recombines these translations for the final output. 📌 CompTra improves translation quality by processing simpler sentence components. It gains +1.8 XCOMET score with Command-R+. 📌 This method leverages LLMs' strength in translating short phrases for low-resource language machine translation. 📌 Compositional Translation uses self-generated phrase translations as in-context examples. This improves main translation in low resource scenarios. Methods Explored in this Paper 🔧: → CompTra decomposes sentences into simpler phrases using a divide prompt with examples. → Each phrase is independently translated by the LLM using similarity-based retrieved demonstrations. → Finally, the LLM recombines these phrase translations to generate the translation of the original sentence.

English

1.5K

ArmelRandy retweetledi

Anthropic@AnthropicAI·24 Şub

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

English

2.8K

18.7K

3.6M

ArmelRandy retweetledi

Lydia Nishimwe@LydiaNishimwe·20 Şub

🚀 Exciting Challenge Ahead! 🚀 I'm thrilled to be one of 12 finalists in the 3-minute thesis competition (@MT180FR ) at Sorbonne Université. 🗓️March 10th, 6PM Paris 🔗Register to watch (in person or online) & vote: sorbonne-universite.fr/mt180-2025-les… Looking forward to seeing you there!

English

309

ArmelRandy@RandyZebaze·13 Şub

TL;DR Everything is in the title. The paper is available on ArXiv arxiv.org/pdf/2408.00397 The code and outputs are available on Github github.com/ArmelRandy/ICL… Thanks to my co-authors @bensagot and @RABawden, and to @InriaParisNLP. 10/10

English

ArmelRandy@RandyZebaze·13 Şub

Finally, we demonstrate that similarity-based example selection (in a high-quality sample pool) helps few-shot MT with LLMs ranging from 2 to 70 billion parameters. As the number of in-context examples grows, the gap with random selection remains significant. 9/10

English

ArmelRandy@RandyZebaze·13 Şub

I am happy to announce that our paper "In-context Example Selection via Similarity Search Improves Low-resource Machine Translation" was accepted to the #NAACL2025 Findings 🤩🔥. What is this about? TAGS: Machine Translation (MT), High/Low -resource languages (H/LRLs). 🧵 1/10

English

325

ArmelRandy retweetledi

kyutai@kyutai_labs·13 Oca

Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! huggingface.co/kyutai/helium-…

English

379

58.2K

Keşfet

@bensagot @RABawden @InriaParisNLP @inria_paris @Inria @MT180FR @elonmusk @BarackObama