Applied Compute

134 posts

Applied Compute banner
Applied Compute

Applied Compute

@appliedcompute

We build Specific Intelligence for enterprises.

San Francisco Katılım Temmuz 2012
19 Takip Edilen3.7K Takipçiler
Applied Compute
Applied Compute@appliedcompute·
Using the Context Engine to build a pipeline on APEX-Agents produces up to 16.9% relative improvement at fixed reasoning, with consistent gains on GDPVal. For enterprises, this turns context into a compounding asset: every production rollout makes the next one better.
Applied Compute tweet media
English
1
0
12
1.6K
Applied Compute
Applied Compute@appliedcompute·
Introducing the AC Context Engine: enterprise-grade infrastructure to continuously encode nuanced institutional knowledge into a living artifact (Contextbase). We find that our Contextbases can be the unlock to moving the Pareto frontier on cost and intelligence.
Applied Compute tweet media
English
2
5
97
44.6K
Applied Compute
Applied Compute@appliedcompute·
We study three production use cases: agentic coding, code QA, and office work. For each, we capture full traces from production deployments. These workloads are long-context and long-horizon, extending into hundreds of tool call turns for each. Each row in the workload file is a single agent trace, including the input prompt, generation, and tool call lengths needed to synthetically replay against an OpenAI-compatible endpoint.
English
1
0
20
2.2K
Applied Compute
Applied Compute@appliedcompute·
Inference demand in 2026 has surged, but not for single-turn workloads that most engines are benchmarked on. Agentic workloads have a different structure: traces consist of many tool-calling turns with heavy-tailed distributions over assistant and tool output. These workloads introduce a new set of challenges for efficient serving. We pulled production traces from over 100 post-training runs and are open sourcing these workloads to help define a new target for inference engine optimization.
Applied Compute tweet media
English
6
12
133
34.1K
Applied Compute retweetledi
Moritz Stephan
Moritz Stephan@moritz_stephan·
it was a blast working with @spdling, @rhythmrg, @raymondmfeng and the rest of the @appliedcompute team. Splitting capability maximization (i.e. be good at finding bugs) and product alignment (i.e. short rollouts) into two distinct phases while training made a big difference here and can be useful for other specialized models when real-world product constraints matter
Cognition@cognition

Today we're releasing SWE-check, a specialized bug detection model we RL-trained with @appliedcompute that matches frontier performance on internal in-distribution evals and makes meaningful progress on out-of-distribution evals, all while running 10x faster.

English
0
2
43
4.3K
Applied Compute
Applied Compute@appliedcompute·
Our work on SWE-check with @cognition is a good window into how we work. We collaborate closely with the team, train a specialized model inside their real environment, and iterate from feedback. Training a specialized model gives teams the flexibility to choose where they want to sit on the cost-latency-performance Pareto frontier. In this case, we specifically optimized for cost and latency given the product requirements. Try it for yourself in Windsurf Next today, and read the technical details in the post below!
Cognition@cognition

Today we're releasing SWE-check, a specialized bug detection model we RL-trained with @appliedcompute that matches frontier performance on internal in-distribution evals and makes meaningful progress on out-of-distribution evals, all while running 10x faster.

English
2
4
68
6.2K
Applied Compute retweetledi
Bryan Lee
Bryan Lee@_brylee10·
at AC i’ve learned forward deployed work is among my favorite. a personal favorite memory was getting a high five from a customer after a day in the office and a successful prod deployment. closely collaborating with companies and diving into the nitty-gritty of their systems to make agents work is challenging but rewarding. it’s “full stack” in the sense it involves a eng, research, and understanding customer needs which makes each day different and gets me excited.
Applied Compute@appliedcompute

There is a large delta between what models can do and what they deliver in company-specific workflows. We bridge that gap through forward deployment. In a given week, our engineers might build eval frameworks from scratch, deploy a large-scale context ingestion engine, and present results to F500 leadership. We fine-tune models on proprietary data no frontier lab has seen and optimize agent performance against real-world outcomes. We're excited by engineers with rigor, high customer empathy, and a bias toward action in ambiguity. appliedcompute.com/blog/unlocking…

English
0
4
22
2.4K
Applied Compute
Applied Compute@appliedcompute·
There is a large delta between what models can do and what they deliver in company-specific workflows. We bridge that gap through forward deployment. In a given week, our engineers might build eval frameworks from scratch, deploy a large-scale context ingestion engine, and present results to F500 leadership. We fine-tune models on proprietary data no frontier lab has seen and optimize agent performance against real-world outcomes. We're excited by engineers with rigor, high customer empathy, and a bias toward action in ambiguity. appliedcompute.com/blog/unlocking…
English
0
0
45
37.4K
Applied Compute
Applied Compute@appliedcompute·
Thanks to @Wing_VC and @EricNewcomer for recognizing us in the 2026 Enterprise Tech 30 list alongside so many exceptional teams. Lots more to build.
Applied Compute tweet media
English
4
5
26
4.1K
Applied Compute
Applied Compute@appliedcompute·
We post-trained AC-Small on ~2,000 expert tasks in law, consulting, and finance. It improved on every held-out professional benchmark we tested, including GDPVal, Toolathalon, and APEX V1, with no regression in general capabilities. The strongest gain was in medicine (+13.3 pp), a domain entirely absent from training. What generalized was procedural discipline. Expert data encodes how professionals structure, verify, and revise their work, and that discipline transfers across domains. Enterprises that capture and structure the work of their best professionals will build models that can outperform general alternatives on the tasks that matter most to their business.
Mercor@mercor_ai

Does training on APEX-Agents dev set generalize beyond the benchmark? @appliedcompute post-trained GLM-4.7 on ~2,000 expert Mercor tasks and achieved state-of-the-art legal performance on APEX Agents. We then evaluated the model on other enterprise benchmarks. On GDPVal, AC-Small’s win+tie rate rose from 55.0% to 62.7% (+7.7pp), ranking 5th overall and ahead of Opus 4.5.

English
0
6
50
7.8K
Applied Compute
Applied Compute@appliedcompute·
The FDE role of the AI era has fundamentally changed. It's no longer just about building dashboards and connecting data pipes - it's about building evals, deploying agents that improve in production, winning trust across the org chart, and closing feedback loops that compound over time. We wrote about what to expect when deploying AI in the enterprise today.
Michael Chen@michaelzchen5

x.com/i/article/2037…

English
3
4
67
12.2K