Ehsan Kamalloo

232 posts

Ehsan Kamalloo

@ehsk0

Research Scientist @ServiceNowRSRCH

Katılım Ağustos 2013

596 Takip Edilen338 Takipçiler

Ehsan Kamalloo retweetledi

Xiangru (Edward) Jian@EdwardJian2·2d

🚀 Announcing CUA-Suite, a computer-use agent (CUA) training and evaluation ecosystem based on the largest open expert video corpus for desktop CUAs – VideoCUA. 55 hours of human demonstrations across 87 professional apps — 2.5× bigger than the previous largest dataset. 🌐 cua-suite.github.io

GIF

English

25.5K

Ehsan Kamalloo retweetledi

Jimmy Lin@lintool·5d

Congratulations Dr. Thakur for successfully defending his Ph.D. earlier today! Well deserved given his foundational contributions to benchmarks, data, and evaluation... and as his handle @beirmug suggests, there will be celebratory beers tonight! 🍻

English

8.2K

Ehsan Kamalloo retweetledi

Alexandre Lacoste@alex_lacoste_·19 Mar

We're sitting on a gold mine of data for evaluation and post-training. Hundreds of agentic benchmarks, rich structured environments, verifiable signal. Most of it is sitting idle. Not because nobody wants it, but because the engineering to use it is brutal. 🧵

English

5.9K

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·19 Mar

🎙️ Today at NVIDIA GTC 2026 — @alex_lacoste_ presents From Benchmark Silos to an Interoperable AI Evaluation Ecosystem! Catch it here 👇 nvidia.com/gtc/session-ca… 10am, Marriott - Ballroom Salon III (L2) #NVIDIAgtc #AIResearch #ServiceNow

English

559

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·4 Mar

🎙️ Exciting news: @alex_lacoste_ is presenting at NVIDIA GTC 2026!! Topic: the fragmented world of agent benchmarks is creating a growing integration tax, and CUBE is the proposed fix. CUBE = a universal benchmarking protocol built on MCP + Gym. Already validated with NVIDIA's NeMo tools. 📅 March 19 · 10:00 a.m. 🔗 nvidia.com/gtc/session-ca… #NVIDIAgtc #AgenticAI #AIResearch #ServiceNow

English

723

Ehsan Kamalloo retweetledi

Emiliano Penaloza@emilianopp_·6 Şub

Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10

English

428

45.4K

Ehsan Kamalloo retweetledi

Rafael Pardinas@muchomuchacho·30 Oca

PipelineRL got accepted to TMLR 🎉 ~2x faster on-policy RL training through in-flight weight updates. Making LLM agents training fly at @ServiceNowRSRCH @alexpiche_ @DBahdanau @ehsk0 Paper: arxiv.org/abs/2509.19128 Code: github.com/ServiceNow/Pip…

English

1.6K

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·30 Oca

3 papers from @ServiceNowRSRCH accepted to #ICLR2026! 🎉 🔒 No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms 🔍 DRBench: A Realistic Benchmark for Enterprise Deep Research 💻 Grounding Computer Use Agents on Human Demonstrations Proud of our team's contributions to AI security, agents, and multi-modal learning. Congrats to all! 🚀 #AIResearch #AISecurity

English

957

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·23 Ara

🚀 Introducing AprielGuard — an 8B parameter state-of-the-art guardian model. Built to catch policy violations and jailbreaks alike, fully post-trained on synthetic data using SyGra. 🧠 Model Weights: huggingface.co/ServiceNow-AI/… 📄 Blog: huggingface.co/blog/ServiceNo… @JayKasundra09 @SeganBoss @SathwikTejaswi @ServiceNow @ServiceNowNews @NVIDIAAI @nvidianewsroom Why it matters 👇

English

3.4K

Ehsan Kamalloo retweetledi

Massimo Caccia@MassCaccia·15 Ara

Yes, we are, and we’re expanding! Come join us to work on the Apriel series, PipelineRL, and more 🙂 careers.servicenow.com/jobs/744000096…

will brown@willccbb

ServiceNow is easily the Meituan of the West. cooking far harder than any reasonable person would expect them to

English

4.1K

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·9 Ara

1/5 🚀Apriel-1.6-15B-Thinker: a 15B multimodal reasoner scoring 57 on the Artificial Analysis Intelligence Index - approaching the performance of ~200B-scale frontier models while remaining an order of magnitude smaller. 🧠Model weights: huggingface.co/ServiceNow-AI/… 📄Blog: huggingface.co/blog/ServiceNo… 💬Chat demo: huggingface.co/spaces/Service… @SathwikTejaswi @sagardavasam @tscholak @NVIDIAAI @nvidianewsroom @togethercompute @turingcom @ArtificialAnlys

English

227

24.2K

Ehsan Kamalloo@ehsk0·2 Ara

@nouhadziri Congrats, great work🎉🎉

English

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·2 Ara

🚀 We’re hiring at ServiceNow AI Research! We’re looking for a Senior Research Engineer/Scientist specializing in AI Agent Reliability to contribute to research initiatives focused on enhancing the robustness, safety, and resilience of AI agents operating in enterprise environments. If you want to work on real-world, high-impact AI with a world-class research team — we want to meet you! 🔗 Apply here: careers.servicenow.com/jobs/744000094… Please share or tag someone who’d be a great fit! #AIJobs #Hiring #MachineLearning #AIResearch #LLMs #Agents #ServiceNowAI

English

1.3K

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·1 Ara

🚀It’s NeurIPS Week in San Diego! The ServiceNow AI Research team is here and excited to connect. If you’re attending, stop by our booth K#17 to meet our researchers and chat about frontier agents, multimodal learning, time-series modeling, trustworthy AI & more. We’re proud to have multiple contributions accepted across the main conference and workshops — including a ⭐ Spotlight paper! 📅 Stay tuned — we’ll share our schedule and presentation highlights each day. If you're in San Diego → come say hi, grab some swag, and meet the team! Here’s to an inspiring NeurIPS week 🌟 #NeurIPS2025 #AIResearch #FrontierAgents #MachineLearning #ServiceNowAI

English

Ehsan Kamalloo retweetledi

Rafael Pardinas@muchomuchacho·26 Kas

You can now train reasoning models with GSPO in PipelineRL: sequence-level optimisation + async weight updates = faster, more stable RL training. Can you guess which is which? @ServiceNowRSRCH

English

243

Ehsan Kamalloo retweetledi

Torsten Scholak@tscholak·20 Kas

🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation approach ✅ Perfect for enterprise scale 📄 Report: arxiv.org/abs/2511.02651 🔗 Blog post: huggingface.co/blog/ServiceNo… 🤗 Models: huggingface.co/collections/Se… #AI #LLM #EfficientAI #Mamba #HybridModels

English

114

32.9K

Ehsan Kamalloo retweetledi

ServiceNow AI Research@ServiceNowRSRCH·6 Kas

ServiceNow AI Research presents PipelineRL — one of the most impactful efficiency tricks in modern RL training. An elegant solution to a noisy, expensive problem. Worth the read 👇

Rishabh Agarwal@agarwl_

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to generator (to generate data from our latest policy being trained). (Conventional PPO-off-policy) A naive approach would be to "start generators on a batch, wait for all sequences to complete, update the model weights for both trainers and generators, and repeat. Unfortunately, this approach leads to idle generators and low pipeline efficiency due to heterogeneous completion times. (Pipeline-RL) Instead, we simply let the generators continue generating tokens without discarding or finishing ongoing generations in-flight whenever we need to do a weight update -- doing an "in-flight" weight update. As such our KV caches for these generations would be stale, as they would come from LLM with earlier copy(ies) of the weights) but this is ok (see below).

English

2.1K

Ehsan Kamalloo retweetledi

Alexandre L.-Piché@alexpiche_·4 Kas

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave: youtu.be/Z1uEuRKACRs

YouTube

English

143

68.6K

Ehsan Kamalloo retweetledi

Alexandre Drouin@alexandredrouin·22 Eki

Excited to speak at the AAAI-26 Workshop on Agentic AI Benchmarks & Enterprise Tasks (Jan 26, Singapore) 🇸🇬 As agents are rapidly productized, realistic enterprise benchmarks for capabilities and reliability are essential! Submit: openreview.net/group?id=AAAI.… 🗓️ Oct 29 cc @gneubig

English

446

Ehsan Kamalloo retweetledi

Issam Laradji@ILaradji·13 Eki

🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: lnkd.in/gpRXbb7K 💻 Code: lnkd.in/g4-x5EDc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks, gathering insight across both public and private data sources. 🤖 These agents must navigate the web and internal data (like Excel sheets, PDFs, Word files, PowerPoints, emails, and chat logs) to generate comprehensive research reports. 🎯 The tasks can be seen as needles-in-haystacks challenge, with both supporting and distractor facts carefully planted throughout the private data. Reports are evaluated on recall, precision, factuality, and overall quality. 🙏 Huge thanks to the ServiceNow AI Research team who made this possible: Amirhossein Abaskohi, Tianyi Chen, Miguel Muñoz, Amrutha Varshini Ramesh, Étienne Marcotte, Xing Han Lu, Nicolas Chapados, Spandana Gella, Chris Pal, Alexandre Drouin 📄 Paper: lnkd.in/gpRXbb7K 💻 Code: lnkd.in/g4-x5EDc

English

8.2K

Keşfet

@beirmug @alex_lacoste_ @ServiceNowRSRCH @alexpiche_ @DBahdanau @JayKasundra09 @SeganBoss @SathwikTejaswi