Diptesh Kanojia

971 posts

Diptesh Kanojia banner
Diptesh Kanojia

Diptesh Kanojia

@diptesh

Senior Lecturer in NLP for AI, Institute for @PeopleCentedAI | University of Surrey | #nlproc

United Kingdom 가입일 Haziran 2008
1.5K 팔로잉780 팔로워
Diptesh Kanojia 리트윗함
Vilém Zouhar @ EACL
Vilém Zouhar @ EACL@zouharvi·
The 2025 MT Evaluation shared task brings together the strengths of the previous Metrics and Quality Estimation tasks under a single, unified evaluation framework. The following tasks are now open (deadline July 31st but participation has never been easier 🙂)
English
1
6
12
2.6K
Diptesh Kanojia
Diptesh Kanojia@diptesh·
📢 Test Set RELEASED! 🚀 The test set for the #WMT25 Shared Task on QE-informed Segment-level Error Correction is now LIVE! It's time to put your MT error correction / APE methods to the test. Let's see how well they can correct machine translation! #NLProc #MT #WMT2025
English
1
5
8
1.3K
Diptesh Kanojia
Diptesh Kanojia@diptesh·
📊 Evaluation: Systems will be ranked on two key metrics: 1️⃣ DeltaCOMET: Primary metric measuring the raw quality improvement over the original MT. 2️⃣ Gain-to-Edit Ratio: DeltaCOMET divided by TER, rewarding systems that are not just effective, but also efficient. #MTeval
English
1
0
0
36
Diptesh Kanojia 리트윗함
Raj Dabre
Raj Dabre@prajdabre·
Machine Translation is my first and final love. Every single work I do has some flavor of Machine Translation to it. Machine Translation is the best test bed for any sequence to sequence neural architecture. So it's best you read the book on NMT by the OG MT teacher Prof Philipp Koehn of @JHUCompSci. I can't recommend this enough. arxiv.org/abs/1709.07809
Raj Dabre tweet media
English
2
9
200
12.4K
Diptesh Kanojia 리트윗함
AI4Bharat
AI4Bharat@ai4bharat·
📢 Presenting IndicSeamless: A Speech Translation Model for Indian Languages 🎙️🌍 IndicSeamless is a speech translation model fine-tuned from SeamlessM4Tv2-large on 13 Indian languages. Trained on a curated subset of BhasaAnuvaad, the largest open-source Speech Translation dataset for Indian languages and English, this model enhances translation quality across diverse linguistic contexts. 🔹 Why IndicSeamless? Indian languages pose unique challenges in speech translation due to their linguistic diversity, resource limitations, and speech variations. IndicSeamless addresses these challenges by improving accuracy, fluency, and contextual relevance across multiple languages. 🔹 Key Features: ✅ Fine-tuned on a diverse subset of BhasaAnuvaad 🗂️ ✅ Handles spontaneous and read speech effectively 🗣️📖 ✅ Robust to noise variations and diverse accents 🎧 ✅ Optimized for real-world deployment in speech applications 🔹 Resources: 🔗 Demo: huggingface.co/spaces/ai4bhar… 🔗 Model: huggingface.co/ai4bharat/indi… 🔗 Paper: arxiv.org/abs/2411.04699 This represents a significant step forward in multilingual speech translation for the Indian subcontinent. Please try out the demo hosted on @huggingface who have kindly allotted us an A100 GPU! Thanks, @reach_vb for the support!
English
9
34
191
24K