Sabitlenmiş Tweet

Recently we built OTelBench – a benchmark to test how well LLMs handle OpenTelemetry instrumentation.
We tested 14 models. The best (Claude Opus 4.5) hit only 29%.
These weren't trick questions, just small subset of typical SRE tasks.
Link here:
quesma.com/blog/introduci…

English










