ArmelRandy

79 posts

ArmelRandy

ArmelRandy

@RandyZebaze

PhD Student @InriaParisNLP | MVA 2022 @ENS_ParisSaclay | X19 @Polytechnique

Katılım Şubat 2022
103 Takip Edilen202 Takipçiler
Sabitlenmiş Tweet
ArmelRandy
ArmelRandy@RandyZebaze·
🎉 Grateful and happy to share that two of our papers were accepted to #EMNLP2025 Findings! 🚀 [1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation [2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation A big thank you to my amazing co-authors! 🙌 @bensagot @RABawden @InriaParisNLP
ArmelRandy tweet media
English
2
1
6
533
ArmelRandy retweetledi
Junior cedric Tonga
Junior cedric Tonga@tonga_cedric·
A bit late 😅 but thrilled to share 🎉 our paper “LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction”has been accepted to #EACL2026! Thrd (1/7)
Junior cedric Tonga tweet media
English
1
1
6
187
ArmelRandy retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
The paper tests whether “thinking tokens” help translation and finds they mostly do not. The team tests big reasoning models that write hidden thoughts before the answer. They compare translations with and without those thoughts. Quality barely changes across many language pairs. They also train small models to copy step by step explanations before the final translation. Those models do worse than simple training that maps source text to target text. They try better intermediate traces that are specific to translation. These only help when the traces include actual draft translations. That means the useful signal is more parallel text, not explanations about how to translate. Using stronger teacher translations as targets beats teaching the student to think. Splitting sentences into smaller phrase pairs helps a bit, especially when data is scarce. Extra reinforcement learning does not change the ranking. Plain input to output training still gives the best quality for the compute used. ---- Paper – arxiv. org/abs/2510.11919 Paper Title: "LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens"
Rohan Paul tweet media
English
1
5
28
4.4K
ArmelRandy
ArmelRandy@RandyZebaze·
🎉 Grateful and happy to share that two of our papers were accepted to #EMNLP2025 Findings! 🚀 [1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation [2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation A big thank you to my amazing co-authors! 🙌 @bensagot @RABawden @InriaParisNLP
ArmelRandy tweet media
English
2
1
6
533
ArmelRandy retweetledi
Lydia Nishimwe
Lydia Nishimwe@LydiaNishimwe·
🎓 I defended my PhD in Machine Translation last month! Grateful to my colleagues at @inria_paris for the support & collaboration throughout this journey. 🎯 Open to Work - AI/NLP Research Scientist or Engineer roles, starting September 2025, on-site in the Paris area or remote.
English
0
3
11
415
ArmelRandy retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
LLMs struggle with machine translation for low-resource languages, even with similar examples. This paper introduces Compositional Translation (CompTra). It decomposes sentences into phrases, translates each using retrieved examples, and recombines these translations for the final output. 📌 CompTra improves translation quality by processing simpler sentence components. It gains +1.8 XCOMET score with Command-R+. 📌 This method leverages LLMs' strength in translating short phrases for low-resource language machine translation. 📌 Compositional Translation uses self-generated phrase translations as in-context examples. This improves main translation in low resource scenarios. Methods Explored in this Paper 🔧: → CompTra decomposes sentences into simpler phrases using a divide prompt with examples. → Each phrase is independently translated by the LLM using similarity-based retrieved demonstrations. → Finally, the LLM recombines these phrase translations to generate the translation of the original sentence.
Rohan Paul tweet media
English
1
3
13
1.5K
ArmelRandy retweetledi
Anthropic
Anthropic@AnthropicAI·
Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.
English
1K
2.8K
18.7K
3.6M
ArmelRandy retweetledi
Lydia Nishimwe
Lydia Nishimwe@LydiaNishimwe·
🚀 Exciting Challenge Ahead! 🚀 I'm thrilled to be one of 12 finalists in the 3-minute thesis competition (@MT180FR ) at Sorbonne Université. 🗓️March 10th, 6PM Paris 🔗Register to watch (in person or online) & vote: sorbonne-universite.fr/mt180-2025-les… Looking forward to seeing you there!
English
2
3
8
309
ArmelRandy
ArmelRandy@RandyZebaze·
Finally, we demonstrate that similarity-based example selection (in a high-quality sample pool) helps few-shot MT with LLMs ranging from 2 to 70 billion parameters. As the number of in-context examples grows, the gap with random selection remains significant. 9/10
ArmelRandy tweet media
English
1
0
0
59
ArmelRandy
ArmelRandy@RandyZebaze·
I am happy to announce that our paper "In-context Example Selection via Similarity Search Improves Low-resource Machine Translation" was accepted to the #NAACL2025 Findings 🤩🔥. What is this about? TAGS: Machine Translation (MT), High/Low -resource languages (H/LRLs). 🧵 1/10
ArmelRandy tweet media
English
1
0
3
325
ArmelRandy retweetledi
kyutai
kyutai@kyutai_labs·
Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! huggingface.co/kyutai/helium-…
English
10
89
379
58.2K