kubesimplify

4.7K posts

kubesimplify

@kubesimplify

Simplifying cloud native for all | for sponsorship queries contact [email protected]

India Katılım Mart 2022

3 Takip Edilen12.5K Takipçiler

Sabitlenmiş Tweet

kubesimplify@kubesimplify·25 Haz

Learn Kubernetes Today! In this course you learn about the core concepts including demos of CNI, kube proxy & CoreDNS. A project based learning where you deploy multi microservices app with db. Go learn today & do not forget to subscribe to Kubesimplify. youtu.be/EV47Oxwet6Y?si…

YouTube

English

6.4K

kubesimplify@kubesimplify·10h

Hot take: most platform teams should not build their own AI platform. Kubeflow and Ray on Kubernetes exist. Use them. Decision framework: Less than 5 ML engineers → Vertex AI or SageMaker. Managed beats custom every time at this scale. 5–50 engineers, inference-heavy workloads → Ray Serve on Kubernetes. Scales well, battle-tested. Mixed training and inference at scale → Kubeflow & Kserve. It handles the full ML lifecycle and has the ecosystem to back it. Custom requirements beyond all of the above → you are in the 1% of teams. Build it. Everyone else? The "we built our own AI platform" story is almost always CV-driven engineering, not user-driven product thinking. Stop reinventing. Ship models, not infrastructure.

English

218

kubesimplify@kubesimplify·15h

Most AI workloads on Kubernetes still use sleep loops in init containers to wait for GPU readiness, license checks or model cache warmup. There's a cleaner primitive for this: scheduling gates. Add a scheduling gate to your Pod spec and the scheduler holds it in Pending not Running, not Init until an external controller removes the gate. Your controller watches external state, clears the gate when conditions are met, and only then does the Pod enter the scheduling queue. Why this matters for AI workloads: No wasted node resources while the Pod sits in init -Clean separation between scheduling logic and container logic -Karpenter and Cluster Autoscaler won't provision a node until the gate is removed -Stack multiple gates for multi-condition readiness GA since 1.30, very underrated. Most teams haven't touched it. Stop burning GPU node-seconds on sleep loops.

English

337

kubesimplify@kubesimplify·20h

Inference observability is broken. Prometheus tracks CPU and memory. It misses what actually matters for LLM serving: • Tokens/sec • KV cache hit rate • GPU SM occupancy • Time-to-first-token (TTFT) The stack that actually works in 2026: → NVIDIA DCGM Exporter → vLLM native /metrics → Grafana with LLM dashboards If your AI alerts only fire on OOM, you're flying blind.

English

296

kubesimplify@kubesimplify·1d

Tenant isolation on shared GPU Kubernetes the 2026 playbook. Most teams treat namespaces as isolation. They’re not. They’re filing cabinets with labels. The 4 layers you actually need: 1. Network - Cilium NetworkPolicy + ClusterMesh for cross-cluster traffic. Default-deny everywhere. 2. Scheduler -PriorityClasses + ResourceQuotas + LimitRanges. One bad tenant should never starve another. 3. GPU - DRA with per-tenant DeviceClasses (stable since K8s 1.34). No more whole-card-or-nothing allocation. 4. Secrets - external secrets operator, per-tenant ServiceAccount bindings, no shared kubeconfig. Bonus: vNode for hard runtime isolation when you can’t trust the workload. Not all tenants are friendly. This is the architecture winning in production. Save it.

English

540

kubesimplify@kubesimplify·1d

Most “multi-tenant” K8s setups can’t handle one bad pod. Quick test: run this in a tenant namespace kubectl run hostile --image=alpine --restart=Never -- sh -c 'while true; do dd if=/dev/zero of=/dev/null; done' This burns CPU hard. If other pods in different namespaces slowed down, you don’t have CPU isolation. No LimitRange. No ResourceQuota. Just shared compute and a noisy neighbor.

English

554

kubesimplify@kubesimplify·2d

Storage for AI workloads on K8s: why CSI alone isn't enough. → CSI / block storage: great for stateful apps, but not optimal for 100GB+ model weights. → Object storage (S3/MinIO): cheap and infinite, but lazy-loading weights on pod start turns cold-start into a multi-minute blocker. → Fluid: treats datasets as first-class K8s resources, orchestrating distributed caches to preload model weights onto cluster nodes and serve from local RAM/SSD. Fluid hit CNCF Incubating in January 2026 (not Sandbox). With production adopters like Alibaba, Xiaomi and NetEase running it for LLM inference and training, the graduation feels overdue.

English

368

kubesimplify@kubesimplify·2d

Ever kubectl expose a deployment and wonder what actually routes your traffic? The old Endpoints API couldn't scale one object with 3,000 pod IPs means every node re-downloads the entire blob on every change. @SaiyamPathak 's new deep dive breaks down EndpointSlices: • 100-endpoint cap per slice = linear scaling • 3 conditions (Serving/Terminating/Ready) = graceful shutdown without dropping connections • Topology hints = zone-aware routing that saves cross-zone egress costs Verified against k8s 1.36 source + live cluster demo. There's also a YouTube video walkthrough embedded in the blog where you can watch him breakdown the system component by component. Worth the read if you want to understand the machinery under your Services 👇 blog.kubesimplify.com/how-kubernetes…

English

248

kubesimplify@kubesimplify·2d

NVCF is now open source under Apache 2.0 , the full control plane, not just an SDK. Read @SaiyamPathak break it down ->the three-plane architecture ->NATS JetStream scale-to-zero ->multi-cluster routing and more! It also gives you an idea of what you can actually run locally today vs. what still needs NGC access. Must-check out for GPU inference infra insights blog.kubesimplify.com/nvcf-is-now-op…

English

753

kubesimplify@kubesimplify·2d

training.linuxfoundation.org/mega-may-2026/

ZXX

kubesimplify@kubesimplify·2d

The Linux Foundation’s biggest training sale of 2026 is officially LIVE! From May 12–20, get massive discounts on: • Kubernetes certifications → CKA, CKS, KCNA, KCSA • Linux certifications → LFCS, LFCE • IT Professional programs • Training bundles + Thrive-One subscriptions 💥 Offers: 65% OFF - Cert+ Thrive-One (auto-applied at checkout, no code 60% OFF - Power Bundles → MM26PBKS 60% OFF - Non-K8s Bundles & IT Pro Programs → MM26BUNKS 50% OFF - Kubernetes Bundles → MM26K8BUNKS 50% OFF - Instructor-Led Courses → MM26ILTKS. 50% OFF - Individual Certs & Courses → MM26CCKS 30% OFF - Thrive-One Annual (new) → MM26TOAKS 30% OFF - Thrive-One Monthly (new) → MM26TOMKS Whether you're starting your Kubernetes journey or stacking up to Kubestronaut, this is the window to grab the cert path you've been putting off. Sale ends May 20. Link in comment #Kubernetes #CNCF #LinuxFoundation #CKA #CKS #DevOps

English

257

kubesimplify@kubesimplify·2d

Setting ImagePullPolicy: Always, on a 12GB model image pulls it on every pod restart. That's egress charges on every node, every time. Fix: Use IfNotPresent Pin by digest, not tag Cache with a registry mirror or DaemonSet puller One config change. Real money saved.

English

451

kubesimplify@kubesimplify·2d

events.linuxfoundation.org/kubecon-cloudn…

ZXX

kubesimplify@kubesimplify·2d

🚨 Last call for discounted KubeCon India tickets. If you’ve been waiting or still deciding about attending KubeCon + CloudNativeCon India 2026 hurry up. ⏳ Today is the final day to buy standard tickets before prices increase tomorrow. Use code: 👉 KCIN26AMBF for 25% off. 🎟️ Registration Link Below👇 #KubeCon #Kubernetes #CloudNative #CNCF

English

158

kubesimplify@kubesimplify·2d

Most "GitOps for AI" setups are just GitOps for the YAML. The model weights still live in S3 with no versioning. No rollback. No diff. That's not GitOps. That's wishing. Real GitOps for ML: → Model weights stored as OCI artifacts, signed → Artifact digests pinned in the ArgoCD Application manifest → Sync waves: artifact pull, serving deploy, traffic shift → Rollback is one git revert Most teams aren't doing this yet.

English

278

kubesimplify@kubesimplify·3d

Your CNI choice is silently eating your GPU training budget. Most infrastructure teams pick a CNI once and never revisit it. On CPU workloads, that's fine. On distributed GPU training, that decision shows up directly in your utilization numbers and your cloud bill. We recently benchmarked the same 8×H100 cluster across three configurations. Same hardware. Same interconnect. Only the data plane changed. Here's what allreduce throughput actually looked like: 🔹 Calico (default): 38 GB/s 🔹 Cilium eBPF, no RDMA: 41 GB/s 🔹 Cilium eBPF + GPUDirect RDMA: 87 GB/s The jump from Calico to Cilium alone is marginal. The real unlock is GPUDirect RDMA. It eliminates the CPU and system memory from the data path entirely. GPU writes directly to the NIC buffer. NIC writes directly to the peer GPU. The kernel is not involved. Cilium's eBPF dataplane matters because it handles network policy and routing without kube-proxy overhead, keeping latency predictable at scale. But RDMA is what moves the needle on throughput. ->2.3× faster allreduce means: ->Gradient sync finishes faster ->GPUs sit idle less between steps Effective utilization goes up without changing a single line of training code What this requires: - RoCEv2-capable NICs (Mellanox/NVIDIA ConnectX recommended) - GPUDirect RDMA enabled on the nodes - Cilium configured with RDMA-aware network policies - NVIDIA GPU Operator handling the driver and plugin stack The hardware cost is identical. The gap is entirely in how you wire the network layer. If you're running distributed training on Kubernetes and haven't benchmarked your CNI, start there. The fix is infrastructure, not algorithmic. What CNI are you running on your GPU clusters? 👇

English

1.1K

kubesimplify@kubesimplify·3d

These three primitives exist in Kubernetes specifically for stateful, latency-sensitive workloads. An LLM inference server is exactly that. If your deployment manifest doesn't have all three, you're running a demo. Production is when it survives a node drain at 2am without anyone noticing.

English

kubesimplify@kubesimplify·3d

🔷preStop Hook with Model Unload Large models do not unload instantly. When a pod receives SIGTERM, Kubernetes stops routing traffic and waits for terminationGracePeriodSeconds. But if your container exits before in-flight requests complete, those requests fail. A preStop hook gives you a window to drain connections cleanly. Pair it with a realistic terminationGracePeriodSeconds based on your model's actual shutdown time. For a 70B model? That is not 30 seconds.

English

kubesimplify@kubesimplify·3d

Most AI inference deployments on Kubernetes are one bad rollout away from a 5-minute outage. Here's what matters🧵

English

468

Keşfet

@SaiyamPathak @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine