Ehsan Kamalloo

232 posts

Ehsan Kamalloo

Ehsan Kamalloo

@ehsk0

Research Scientist @ServiceNowRSRCH

Katılım Ağustos 2013
596 Takip Edilen338 Takipçiler
Ehsan Kamalloo retweetledi
Xiangru (Edward) Jian
Xiangru (Edward) Jian@EdwardJian2·
🚀 Announcing CUA-Suite, a computer-use agent (CUA) training and evaluation ecosystem based on the largest open expert video corpus for desktop CUAs – VideoCUA. 55 hours of human demonstrations across 87 professional apps — 2.5× bigger than the previous largest dataset. 🌐 cua-suite.github.io
GIF
English
2
16
81
25.5K
Ehsan Kamalloo retweetledi
Jimmy Lin
Jimmy Lin@lintool·
Congratulations Dr. Thakur for successfully defending his Ph.D. earlier today! Well deserved given his foundational contributions to benchmarks, data, and evaluation... and as his handle @beirmug suggests, there will be celebratory beers tonight! 🍻
Jimmy Lin tweet media
English
2
6
45
8.2K
Ehsan Kamalloo retweetledi
Alexandre Lacoste
Alexandre Lacoste@alex_lacoste_·
We're sitting on a gold mine of data for evaluation and post-training. Hundreds of agentic benchmarks, rich structured environments, verifiable signal. Most of it is sitting idle. Not because nobody wants it, but because the engineering to use it is brutal. 🧵
Alexandre Lacoste tweet media
English
1
14
35
5.9K
Ehsan Kamalloo retweetledi
Emiliano Penaloza
Emiliano Penaloza@emilianopp_·
Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10
English
11
77
428
45.4K
Ehsan Kamalloo retweetledi
ServiceNow AI Research
ServiceNow AI Research@ServiceNowRSRCH·
3 papers from @ServiceNowRSRCH accepted to #ICLR2026! 🎉 🔒 No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms 🔍 DRBench: A Realistic Benchmark for Enterprise Deep Research 💻 Grounding Computer Use Agents on Human Demonstrations Proud of our team's contributions to AI security, agents, and multi-modal learning. Congrats to all! 🚀 #AIResearch #AISecurity
English
0
3
9
957
Ehsan Kamalloo retweetledi
ServiceNow AI Research
ServiceNow AI Research@ServiceNowRSRCH·
1/5 🚀Apriel-1.6-15B-Thinker: a 15B multimodal reasoner scoring 57 on the Artificial Analysis Intelligence Index - approaching the performance of ~200B-scale frontier models while remaining an order of magnitude smaller. 🧠Model weights: huggingface.co/ServiceNow-AI/… 📄Blog: huggingface.co/blog/ServiceNo… 💬Chat demo: huggingface.co/spaces/Service… @SathwikTejaswi @sagardavasam @tscholak @NVIDIAAI @nvidianewsroom @togethercompute @turingcom @ArtificialAnlys
ServiceNow AI Research tweet media
English
9
55
227
24.2K
Ehsan Kamalloo retweetledi
ServiceNow AI Research
ServiceNow AI Research@ServiceNowRSRCH·
🚀 We’re hiring at ServiceNow AI Research! We’re looking for a Senior Research Engineer/Scientist specializing in AI Agent Reliability to contribute to research initiatives focused on enhancing the robustness, safety, and resilience of AI agents operating in enterprise environments. If you want to work on real-world, high-impact AI with a world-class research team — we want to meet you! 🔗 Apply here: careers.servicenow.com/jobs/744000094… Please share or tag someone who’d be a great fit! #AIJobs #Hiring #MachineLearning #AIResearch #LLMs #Agents #ServiceNowAI
English
0
4
13
1.3K
Ehsan Kamalloo retweetledi
ServiceNow AI Research
ServiceNow AI Research@ServiceNowRSRCH·
🚀It’s NeurIPS Week in San Diego! The ServiceNow AI Research team is here and excited to connect. If you’re attending, stop by our booth K#17 to meet our researchers and chat about frontier agents, multimodal learning, time-series modeling, trustworthy AI & more. We’re proud to have multiple contributions accepted across the main conference and workshops — including a ⭐ Spotlight paper! 📅 Stay tuned — we’ll share our schedule and presentation highlights each day. If you're in San Diego → come say hi, grab some swag, and meet the team! Here’s to an inspiring NeurIPS week 🌟 #NeurIPS2025 #AIResearch #FrontierAgents #MachineLearning #ServiceNowAI
English
0
1
7
1K
Ehsan Kamalloo retweetledi
Rafael Pardinas
Rafael Pardinas@muchomuchacho·
You can now train reasoning models with GSPO in PipelineRL: sequence-level optimisation + async weight updates = faster, more stable RL training. Can you guess which is which? @ServiceNowRSRCH
Rafael Pardinas tweet media
English
2
4
4
243
Ehsan Kamalloo retweetledi
Torsten Scholak
Torsten Scholak@tscholak·
🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation approach ✅ Perfect for enterprise scale 📄 Report: arxiv.org/abs/2511.02651 🔗 Blog post: huggingface.co/blog/ServiceNo… 🤗 Models: huggingface.co/collections/Se… #AI #LLM #EfficientAI #Mamba #HybridModels
Torsten Scholak tweet media
English
5
35
114
32.9K
Ehsan Kamalloo retweetledi
Ehsan Kamalloo retweetledi
Alexandre L.-Piché
Alexandre L.-Piché@alexpiche_·
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave: youtu.be/Z1uEuRKACRs
YouTube video
YouTube
English
1
29
143
68.6K
Ehsan Kamalloo retweetledi
Alexandre Drouin
Alexandre Drouin@alexandredrouin·
Excited to speak at the AAAI-26 Workshop on Agentic AI Benchmarks & Enterprise Tasks (Jan 26, Singapore) 🇸🇬 As agents are rapidly productized, realistic enterprise benchmarks for capabilities and reliability are essential! Submit: openreview.net/group?id=AAAI.… 🗓️ Oct 29 cc @gneubig
English
0
4
4
446
Ehsan Kamalloo retweetledi
Issam Laradji
Issam Laradji@ILaradji·
🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: lnkd.in/gpRXbb7K 💻 Code: lnkd.in/g4-x5EDc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks, gathering insight across both public and private data sources. 🤖 These agents must navigate the web and internal data (like Excel sheets, PDFs, Word files, PowerPoints, emails, and chat logs) to generate comprehensive research reports. 🎯 The tasks can be seen as needles-in-haystacks challenge, with both supporting and distractor facts carefully planted throughout the private data. Reports are evaluated on recall, precision, factuality, and overall quality. 🙏 Huge thanks to the ServiceNow AI Research team who made this possible: Amirhossein Abaskohi, Tianyi Chen, Miguel Muñoz, Amrutha Varshini Ramesh, Étienne Marcotte, Xing Han Lu, Nicolas Chapados, Spandana Gella, Chris Pal, Alexandre Drouin 📄 Paper: lnkd.in/gpRXbb7K 💻 Code: lnkd.in/g4-x5EDc
Issam Laradji tweet media
English
0
18
41
8.2K