ingero

23 posts

ingero

@ingero_io

Free open source GPU observability for AI teams. Traces training and inference failures and GPU stalls via eBPF-for-CUDA. Apache 2.0 license

Tel Aviv Katılım Mart 2026

2 Takip Edilen0 Takipçiler

ingero@ingero_io·1d

One Linux kernel, zero per-host agents. Trace AI workloads from a single eBPF binary that touches no application code. ingero.io/one-kernel-zer… #eBPF #GPU #GPUObservability #MLOps

English

ingero@ingero_io·4d

Same eBPF, different vendor. The same uprobe pattern that catches NVIDIA libcudart calls applies to AMD libhip. Kernel scheduler and TCP layer are silicon-agnostic. ingero.io/ebpf-libhip-ro… #eBPF #GPU #CUDA #GPUObservability

English

ingero@ingero_io·4d

Seven weeks, ten releases: from kernel-side TCP retransmits to a cluster-side MCP tool surface an LLM (or a Grafana panel) can drive end-to-end. ingero.io/from-tcp-retra… #eBPF #GPU #MCP #GPUObservability

English

ingero@ingero_io·6d

Inference benchmarks publish per-host averages. Per-rank tail latency is where the user feels the spike. eBPF captures the rank that matters. ingero.io/inference-plat… #eBPF #GPU #vLLM #GPUObservability

English

ingero@ingero_io·11 May

MCP shows what the agent did. eBPF shows why the GPU stalled while it did it. Two layers, joined on the same trace database. ingero.io/mcp-what-ebpf-… #eBPF #GPU #MCP #GPUObservability

English

ingero@ingero_io·7 May

Vendors shipped seven MCP servers in five weeks. Agent sees a tool name; kernel sees syscalls, library loads, and CUDA driver paths. eBPF closes that gap. ingero.io/mcp-tools-ebpf… #eBPF #GPU #MCP #GPUObservability

English

ingero@ingero_io·6 May

One slow rank in 8 across 2 hosts pushed an ncclAllReduce barrier to wait 290ms. Per-host traces all read healthy. Cluster-level fan-in tells the truth. ingero.io/cluster-level-… #eBPF #GPU #GPUObservability #MLOps

English

ingero@ingero_io·4 May

nvidia-smi: 60% memory used. PyTorch: CUDA out of memory. Both are right. Memory fragmentation leaves holes too small for any new alloc. eBPF trace of cudaMalloc/cudaFree shows exactly where the 40% went. ingero.io/gpu-problem-1-… #GPU #CUDA #PyTorch #eBPF

English

ingero@ingero_io·4 May

GPU utilization is a counter. nvidia-smi reports the GPU had at least one queued op in the sample window, not that it did useful arithmetic. The cause sits below. ingero.io/gpu-utilizatio… #eBPF #GPU #GPUObservability #MLOps

English

ingero@ingero_io·27 Nis

26 seconds to find a straggler. Fleet v0.10 end-to-end on Lambda Cloud: 3-node A100 and 3-node GH200 (arm64, 64k pages). Same agent stack on both. One wrinkle on Grace Hopper. ingero.io/fleet-v0-10-en… #eBPF #GPUObservability #MLOps #FOSS

English

ingero@ingero_io·22 Nis

One slow GPU idles 999 peers at every AllReduce barrier. Production data: 60% of 512+ GPU jobs hit fail-slow events, adding 34% to average job time. nvidia-smi hides it. ingero.io/gpu-stragglers… #GPUObservability #eBPF #GPU #MLOps

English

ingero@ingero_io·21 Nis

vLLM health check: ok. nvidia-smi: 95%. User waited 11s for first token. Traced 10,869 CUDA + kernel events: prefix-cache head-of-line blocking, engine loop preempted 2.5s. ingero.io/11-second-time… #vLLM #eBPF #GPUObservability #GPU

English

ingero@ingero_io·21 Nis

11-Second Time to First Token on a Healthy vLLM Server Prod vLLM server. 11s TTFT. nvidia-smi says GPU is fine. Kernel tracing of 10,869 CUDA events found prefix-cache head-of-line blocking. ingero.io/11-second-time… #vLLM #eBPF #GPUObservability #MLOps

English

ingero@ingero_io·20 Nis

Four GPUs, one query. eBPF fleet mode fans a single SQL query to every node and returns the straggler in under a second. No Prometheus, no Grafana, no central collector. ingero.io/distributed-gp… #GPUObservability #eBPF #GPU #MLOps

English

ingero@ingero_io·17 Nis

10,869 GPU kernel events. 4 MCP tool calls. 47 seconds. Claude diagnosed a vLLM bottleneck nvidia-smi kept hiding: the engine coroutine was being preempted 5,347 times on a shared CPU. ingero.io/ebpf-trace-cud… #eBPF #GPU #vLLM #MCP

English

ingero@ingero_io·15 Nis

MCP is becoming the interface between AI agents and infra data. Datadog wraps their dashboards. Qualys flags the security risk. We think MCP should BE the observability layer -- talking directly to kernel tracepoints, no metric pipeline in between. ingero.io/mcp-observabil…

English

ingero@ingero_io·13 Nis

4-node DDP job stalling. nvidia-smi: nothing. One SQL query fanned out to all nodes via eBPF found the straggler in <1s -- checkpoint I/O preempting training on one box. No central collector needed. ingero.io/distributed-gp…

English

ingero@ingero_io·2 Nis

Ingero now traces CUDA Graph lifecycle events: cudaStreamBeginCapture, cudaStreamEndCapture, cudaGraphInstantiate, cudaGraphLaunch. Detect re-capture spikes in vLLM and torch.compile workloads via eBPF. No code changes needed. github.com/ingero-io/inge… #CUDA #PyTorch #eBPF #GPU

English

ingero@ingero_io·1 Nis

PyTorch DataLoader was 124x slower than direct tensor indexing. We traced 200,000 context switches and 300,000 page allocations in 40s. The GPU wasn't slow, it was starving. ingero.io/124x-slower-py… #PyTorch #eBPF #GPU

English

ingero@ingero_io·31 Mar

PyTorch training 13x slower than expected. torch.profiler showed nothing. eBPF tracing found .cpu().numpy() forcing full GPU sync every batch. Fix: 2 lines of pure PyTorch. ingero.io/tracing-13x-py… #PyTorch #CUDA #eBPF

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry