Post

Runloop Developer
Runloop Developer@RunloopDev·
Runloop now integrates with @wandb Weave for orchestrated agent benchmarks with full traceability. Runloop runs thousands of agent tasks in parallel. @weave_wb turns the traces into something you can inspect and compare. Joint report: wandb.ai/wandb_fc/genai…
Runloop Developer tweet media
English
1
1
2
53
Runloop Developer
Runloop Developer@RunloopDev·
@wandb @weave_wb Agent benchmarking at scale has two problems: 1. Most benchmarks don't run in parallel, so evaluation takes days 2. The output is a pile of logs nobody can read Runloop solves the first. Weave solves the second.
English
1
0
0
18
Runloop Developer
Runloop Developer@RunloopDev·
What the integration looks like in practice: Runloop orchestrates concurrent devboxes, materializes deterministic inputs, isolates the scoring harness, exports structured traces. Weave ingests those traces and provides tool call trees, error clusters, version comparisons, model leaderboards.
Runloop Developer tweet media
English
1
0
0
16
Runloop Developer
Runloop Developer@RunloopDev·
@wandb @weave_wb The demo in the joint report: Terminal-Bench 2, OpenCode as the agent harness, Gemini 3 Pro vs Claude Sonnet 4.6, 100 concurrent devboxes, full trace export to Weave, side-by-side comparison in one view.
Runloop Developer tweet media
English
1
1
1
68
Paylaş