Turing

9.7K posts

Turing banner
Turing

Turing

@turingcom

Accelerating superintelligence to drive economic growth.

Palo Alto, CA Katılım Eylül 2018
2.2K Takip Edilen16.1K Takipçiler
Turing
Turing@turingcom·
Rooftop HH during @DeepLearningAI's AI Dev 26 X SF last week and the conversations went far beyond the sessions. Builders, researchers, and operators sharing what’s actually working in AI right now. Real deployments, real constraints, real impact. If you’re building in AI and want to be part of what’s next, we’d love to connect.
Turing tweet mediaTuring tweet mediaTuring tweet mediaTuring tweet media
English
2
4
15
264
Jeffrey Weichsel
Jeffrey Weichsel@jeffreyweichsel·
May the Fourth be with you! I’ve joined the Light Side at @turingcom as Head of Frontier Data - Enterprise Workflows. Excited to wield the Force of frontier data and defeat the dark side of entropy. Working on synthetic data or complex workflows? Let’s connect.
English
12
1
40
2.6K
Turing
Turing@turingcom·
Excited to share that @Turingcom contributed to EnterpriseOps-Gym, a reality-check, top-downloaded enterprise benchmark for agents from @ServiceNowRSRCH and it has been accepted accepted to the upcoming @icmlconf in Seoul, Korea! Stay tuned for EnterpriseOps-Gym v2 — more dimensions, harder setups & richer failure modes. Website & Paper below.
Sai Rajeswar@RajeswarSai

📢 EnterpriseOps-Gym is now accepted to ICML 2026 🇰🇷✨ website: enterpriseops-gym.github.io 🧩 1,150 expert-curated tasks 🏢 8 enterprise domains 🧰 512 tools ✅ Deterministic verifiers (Outcome + Integrity + Compliance) 📦 Fully containerized, no enterprise instance required 📊 𝗙𝗿𝗲𝘀𝗵 𝘂𝗽𝗱𝗮𝘁𝗲: GPT-5.5 numbers are out (alongside the strongest open and closed baselines). We will keep updating as new models drop because long-horizon reliability is moving fast, and we want to stay current. One exciting (and humbling) signal: we’ve already seen frontier lab teams experimenting with EnterpriseOps-Gym to stress-test and improve their agents: including folks at OpenAI, Mistral AI, and NVIDIA AI. 🙏 📈 𝗘𝗮𝗿𝗹𝘆 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 𝗮𝗿𝗲 𝗽𝗿𝗼𝗺𝗶𝘀𝗶𝗻𝗴: Top open models  including NVIDIA Nemotron Super  are showing strong performance, in some cases competing with frontier models. @shiva_malay @sagardavasam @PShravannayak @turingcom @jonsidd @ServiceNowRSRCH @Mila_Quebec

English
2
5
17
549
Turing
Turing@turingcom·
Explore RL environments, evaluation infrastructure, benchmarks, and real-world data systems for post-training research: go.turing.com/turing-at-ai-d…
English
0
2
9
134
Turing
Turing@turingcom·
The best AI agents complete enterprise workflows correctly only 37% of the time. That's not a model problem. It's a benchmarking problem. ServiceNow built EnterpriseOps-Gym to fix it. Here's what they found: Most benchmarks test short, static tasks. Real enterprise work looks nothing like that. @Turingcom designed 1,000+ prompts across 8 domains: HR, ITSM, CSM, Email, Calendar, Drive, Teams, and hybrid workflows. Each task required 7 to 30 sequential steps with real policy constraints. Results from frontier models: → 37.4% task completion at best → Planning, not tool access, is the #1 bottleneck → Human-authored plans boosted performance by 14 to 35 points → Policy compliance was unreliable across the board If your agent hasn't been tested on long-horizon, stateful workflows, you don't know how it actually performs. Full case study below.
English
1
3
15
452
Turing
Turing@turingcom·
Most enterprise AI does not fail because of bad models. It fails because of missing engineering discipline. You would not deploy a pricing engine without version control. You would not release a claims system without regression testing. You would not run a trading platform without monitoring. AI is now making those same decisions and is rarely held to the same standard. Every SDLC discipline maps directly to AI workflows. Unit tests become evaluation suites. Release management becomes model versioning. QA becomes human-in-the-loop validation. Application logging becomes model observability. Performance testing becomes drift detection. Without these controls, the failures are quiet. Models degrade silently for months. Outputs cannot be traced back to a specific version or input. "We did not know the model changed" is not an acceptable answer to a regulator or a legal challenge. The hardest part is not technical. It is ownership. Data owns training. Engineering owns deployment. Compliance owns policy. AI fails in the handoffs between those teams, and that gap stays invisible until a regulatory exam or incident forces it into view. The organizations getting this right have made one shift: they stopped treating AI as a series of experiments and started treating it as infrastructure. The tooling and frameworks already exist. What is missing is the institutional expectation that AI systems should be held to the same standard as the software they sit alongside. Full breakdown below.
English
1
4
12
9.3K
Turing
Turing@turingcom·
Turing built a production-ready RL Gym for training AI agents on real commercial sales workflows. The scale: -100+ workflows -4 platforms: LinkedIn Sales Navigator, HubSpot, Outreach, Calendly -50+ cross-platform tasks -Pass@3 difficulty calibration across all workflows What’s inside: -Sandboxed UI replicas with realistic data -Natural-language prompts tied to structured execution steps -Step-level verifiers with assertion-based scoring -Standardized API for consistent reward signals -Dockerized environments for plug-and-play training Why it matters: Agents don't learn from screenshots. They learn from consequences. This system enables: • Verifiable task completion • Step-level failure analysis • Multi-platform coordination • Scalable RL experimentation This is how you move from demos to real agent performance. Learn more below.
English
2
5
20
33.5K
Turing
Turing@turingcom·
We’re live at @DeepLearningAI’s AI Dev 26 in San Francisco. Come see how Turing is advancing the next wave of AI, from expanding model capabilities to delivering real-world impact at scale. Stop by booth #113 to meet the team and get a closer look at what we’re building. And if you’re around this evening, join us for our rooftop happy hour from 5:00 to 8:00 PM. A great chance to connect, unwind, and keep the conversation going. Details below.
Turing tweet mediaTuring tweet media
English
2
6
16
2.6K