Debjit Paul

568 posts

Debjit Paul

@DebjitPaul2

Research Scientist. Post-doc @EPFL @ICepfl • Ph.D @HD_NLP #NLProc #AI #ML

England, United Kingdom Katılım Nisan 2012

2.2K Takip Edilen618 Takipçiler

Sabitlenmiş Tweet

Debjit Paul@DebjitPaul2·26 Şub

Can AI agents truly synthesize information across multiple sources? 🏆 Best system: 8.97 F1 ❌ Most systems: 0% exact match 🚀 Introducing DEEPSYNTH, our new benchmark accepted at #ICLR2026!🇧🇷 📄 Paper: arxiv.org/abs/2602.21143 🤗 Data: huggingface.co/datasets/DeepS…

GIF

English

807

Debjit Paul retweetledi

Amit Sharma@amt_shrma·5d

The better LLMs get at reasoning, the longer their traces get—thousands of tokens, dozens of tool calls. But in law, medicine, and agentic AI, "usually correct" isn't good enough: answers must be verifiably correct. We built interwhen at @MSFTResearch to make that tractable. And it's now open source. Across benchmarks, plugging interwhen into an LLM yields: ✅ 100% soundness (with full verifiers) 📈 up to 15% accuracy gain ⚡ ~ 1.5× compute cost 🧵

English

112

12.1K

Debjit Paul retweetledi

Pratyush Kumar@pratykumar·6 Mar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English

209

1.3K

6.9K

728.3K

Debjit Paul@DebjitPaul2·5 Mar

In the LLM era, author responses seem much longer, with more arguments and justification. I felt this in my own pool. Are meta-reviews drifting toward weighing extended defenses and vague complaints about reviewers instead of the technical core? #ACL2026 #NLProc

English

188

Debjit Paul retweetledi

Guilherme Favaron@guifav·25 Şub

Your AI agent scores 95% on benchmarks but can it actually synthesize information from multiple sources to answer a real question? DEEPSYNTH, a new benchmark from @DebjitPaul2, Daniel Murphy, Milan Gritta and team at Huawei Noah's Ark Lab (@HuaweiNoahsArk), @imperialcollege, @ucaboratory, and @Cambridge_Uni, tests exactly this. 120 tasks across 7 domains, 67 countries, requiring agents to gather data, cross reference sources, and produce structured insights. The results are sobering: 11 state of the art LLMs and deep research agents top out at 8.97 F1. The best LLM judge score reaches only 17.5. Models hallucinate freely and collapse when reasoning over large information spaces. The gap between answering trivia and doing actual research remains massive. Accepted at ICLR 2026.

English

152

Debjit Paul@DebjitPaul2·26 Şub

GIF

English

807

Debjit Paul@DebjitPaul2·26 Şub

Work done in collaboration with Daniel Murphy, @milangritta, Ronald A. Cardenas, @victor_p91, Jun Wang, @glampouras_NLP, and an amazing annotation team. Project Page: agentdeepsynthesis.github.io/deepsynth.gith…

English

Debjit Paul@DebjitPaul2·26 Şub

🔍 Key findings: Current agents “struggle with hallucinations and reasoning over large information spaces”. In other words, they tend to make unsupported claims or miss links across sources when answering complex questions.

English

Debjit Paul retweetledi

Amit Sharma@amt_shrma·9 Şub

Hiring alert 🚀 – Microsoft Research India. We’re expanding our team in AI reasoning and related areas. If you’re building reasoning models, verification frameworks, or next-gen research agents, I’d love to connect. Roles across levels + Postdoc openings. DM me to chat.

English

489

40.3K

Debjit Paul retweetledi

Marius Mosbach@mariusmosbach·9 Şub

🚨 I'm looking for emergency reviewers for ARR submissions in the interpretability and analysis track. Topics include: Analysis of CoT, Supervised Fine-tuning, and Matryoshka Representation Learning.

English

3.2K

Debjit Paul@DebjitPaul2·8 Şub

@lena_voita Unfortunately, yes!

English

Lena Voita@lena_voita·8 Şub

Haven't been here for a while, so sanity check: Do people still complain about not getting reviews by the deadline? If yes, here I go: In my ACL AC batch, NONE of the papers has all three reviews. With added delay declarations, still NONE are fully covered. Is this the norm?

English

2.1K

Debjit Paul retweetledi

Vivek Gupta@keviv9·7 Şub

🚨 ARR Jan 2026 needs 5–6 emergency reviewers If you have bandwidth for one additional paper, I’d really appreciate the help. DM or email me. Please DM - and feel free to RT 🙏

English

4.4K

Debjit Paul retweetledi

Pratyush Kumar@pratykumar·7 Şub

I go around saying we should be builders across the 'full stack' @SarvamAI - from compute, to data, to models, to apps. And in come @thekrishnambiar and @thedesignobsess and say we should compose our own music for model launches! And so here we go with our first Sarvam sound track. India's most ambitious decade to build is here; come join the build. And yes for your ringtone upgrade, go here: sarvam.ai/bulbul-audio

English

166

1.1K

58.6K

Debjit Paul retweetledi

Pratyush Kumar@pratykumar·5 Şub

Drop 4/14: Introducing Sarvam Vision: a state-space based 3 billion parameter vision language model that is competitive with the best results in digitisation in English, and defines a significantly higher bar for Indian languages. See the details in our blog: sarvam.ai/blogs/Sarvam-v…

English

135

590

3.3K

610.2K

Debjit Paul retweetledi

Kamalika Chaudhuri@kamalikac·5 Şub

I started my career as a theorist, and am now an empirical LLM researcher. In today's blog post, I talk about the parallels between theory and empirical research: kamalikachaudhuri.substack.com/p/a-theorists-…

English

406

44.8K

Debjit Paul retweetledi

Sahithya Ravi@Sahithya_Ravi·30 Oca

I am excited to introduce our work SPIKE-RL, which models surprise in Video-LLMs 🤯 📎arxiv.org/pdf/2509.23433 ✨ To appear at #ICLR2026 w/@VeredShwartz @adityachinchure @LeonidSigal 1/7 🧵

English

Debjit Paul@DebjitPaul2·28 Oca

My prediction: one day authors will argue, “Gemini, Grok, ChatGPT, and Claude all gave my paper high scores so why did this human reviewer give it a low one?”

English

183

Keşfet

@MSFTResearch @imperialcollege @Cambridge_Uni @milangritta @victor_p91 @glampouras_NLP @lena_voita @SarvamAI