nodezero

143 posts

nodezero

@1remotedev

HPC • Omics ~~~Building...

Manhattan, NY Katılım Ocak 2023

140 Takip Edilen46 Takipçiler

nodezero@1remotedev·9h

@ivanburazin someone still has to rack the server at 3am when the abstraction leaks. this is why we can't have nice things

English

Ivan Burazin@ivanburazin·22h

Most people think all compute is elastic. It is elastic from your perspective as a consumer. But someone still has to physically rack a server, boot it, patch it, and keep it running. Elasticity is an abstraction, not a law of nature. There are humans and machines behind it, and they have physical constraints. When the abstraction meets the constraint, things like the cpu shortage happen.

English

2.4K

nodezero@1remotedev·9h

@iximiuz the moment you realize elastic just means someone else's problem until the capacity planning meeting

English

Ivan Velichko@iximiuz·9h

Enable Internet Access for a Private Network via a NAT Gateway 🛠️ If you want to understand how cloud networking works (e.g., private VPCs), I recommend starting with the basics. Try solving this practical task that leverages only regular VMs: labs.iximiuz.com/challenges/net…

English

2.9K

nodezero@1remotedev·9h

@dashmundkar step 7: realize step 6 should have been step 1 and migrate to GH Actions three sprints ago

English

Dashrath Mundkar@dashmundkar·19h

Modernizing legacy Jenkins, in order: 1. Move pipelines to Jenkinsfile (in repo) 2. Pin plugin versions, audit unused ones 3. Ephemeral agents (k8s or docker), no static slaves 4. Shared libraries for common steps 5. Credentials in a vault, not Jenkins 6. Decide: keep modernizing, or migrate to GH Actions Step 6 is honest. Don't skip it.

English

639

nodezero@1remotedev·9h

@__karnati the state files stored locally on engineers laptops hit a little too close to home

English

Sri@__karnati·15h

My Terraform "worked." But It also had: → Copy-pasted resource blocks across 3 environments → State files stored locally on individual engineers laptops → Hardcoded values that made secrets discoverable → No validation in CI, you found out something was broken when apply failed → No drift detection, we found out infrastructure had changed when it caused an incident

English

880

nodezero@1remotedev·9h

@dhh also explains why they can recite a 2003 LKML flame war verbatim but can't count the r's in strawberry

English

170

DHH@dhh·15h

The reason agents are so good at Linux is that all 40 million lines of kernel code was part of the pre training. Along with every other open source dependency. This really does make every obscure error message shallow, and the system completely malleable.

English

104

2.1K

80.4K

nodezero@1remotedev·13h

Guy on Hinge asked what I bring to the table. Said I manage 4,000 GPU nodes. He unmatched. Table remains undefeated.

English

nodezero@1remotedev·21h

guy on hinge asked what i do. i said hpc/devops. he said oh so like IT. unmatched. i dont make the rules.

English

nodezero@1remotedev·23h

@fromcodetocloud question 28: we see you answered everything correctly. the role has been filled internally but we'll keep your resume on file 💀

English

Mashood tried Ops@fromcodetocloud·16 May

This is the list of interview questions i was asked last month. Result : Selected ✅ 1. Are you currently working with Docker, Kubernetes, Terraform, and Jenkins? 2. Have you worked on creating Dockerfiles? Have you experimented with them, such as running multiple containers? 3. Have you handled the deployment of Java and Node.js microservices using containers? Were they deployed on a Kubernetes cluster? 4. Can you briefly explain the difference between a virtual machine and a container? 5. If you build a Docker image on Windows, can you run it on Linux? 6. If you build a Docker image on RHEL-based Linux, can you run it on Ubuntu as a native container? 7. If you open a container and create files inside it without mounting any host file system, what happens to those files after the container stops or is removed? How do you persist data in this case? 8. How do you mount a specific volume size (for example, 500 GB) while starting a container? 9. Can you provide the command to mount an existing host file system into a container? 10. How can you share the network between a container and the host? 11. How do you configure a container to use the host network? 12. Without specifying a network at runtime, can a container directly access the host network? 13. What other methods can be used to achieve network access? Have you created custom Docker networks? 14. What are the minimum required instructions in a Dockerfile? 15. What is an ENTRYPOINT in a Dockerfile? What is the difference between CMD and ENTRYPOINT? 16. Can you give a real-time example of how CMD was used in your project? 17. How is your Kubernetes cluster deployed? Is it multi-node or single-node? Is it hosted on public cloud, private cloud, or on-premises infrastructure? 18. Which Kubernetes objects have you worked with, such as Deployments, StatefulSets, Pods, or ReplicaSets? 19. Can you explain the difference between a Deployment and a StatefulSet? 20. Do you have experience with Kubernetes probes? Can you explain what a liveness probe is? 21. How do you view historical Kubernetes events after they are no longer available through `kubectl get events`? Are any logging or monitoring tools configured for this purpose? 22. Is Prometheus configured in your environment? 23. Have you worked with Jenkins pipelines? 24. Can you describe your jenkins pipleline on high level? What deployment strategy do you use, such as blue-green or rolling deployment? 25. How to migrate Jenkins to Kubernetes? 26. Do you have any questions for us?

English

105

7.8K

nodezero@1remotedev·23h

@livingdevops the day an AI can sit through a 4am PagerDuty rotation and not hallucinate a fix that makes it worse is the day I'll worry

English

Akhilesh Mishra@livingdevops·4 May

Can we stop AI fear-mongering? - DevOps is not dead. - SRE is not dead. - Platform Engineering is not dead. - Cloud Engineering is not dead. In 2026, everyone said AI agents would replace us. They are wrong. AI is good at: → Reading docs fast → Generating boilerplate → Parsing logs at scale → Writing code that looks right → And hallucinating with confidence They suck at → Complex production incidents → Debugging what the AI itself broke → Making judgment calls under pressure → Understanding why your infra is on fire AI hallucinates. Constantly. It gives you confident wrong answers in your most critical moments. And someone has to clean up that mess. That someone is you. Here is the real situation in 2026: Companies are not replacing engineers with AI. They are replacing engineers who refuse to learn AI with engineers who use it well. Big difference. AI tools need skilled engineers to make them worth the invoice. The $50K/month AI stack is worthless without a human who can tell it what to do, verify what it outputs, and fix what it breaks. That human is the most valuable person in the room. Be that human.

English

5.7K

nodezero@1remotedev·23h

@clovisdsdo check node pressure → cordon bad nodes → scale manually past HPA lag → check if limits are the bottleneck not the cluster. been there.

English

clovis@clovisdsdo·4d

Kubernetes Incident Response It is 9:00 AM on a Friday Your company just launched a major promotion, and traffic suddenly spikes by 500% Users start reporting that checkout is timing out At the same time, monitoring shows a spike in HTTP 504 Gateway Timeout errors You check the Kubernetes cluster and notice that the order-processor Pods are crashing one by one The HPA is trying to create more Pods The Cluster Autoscaler is trying to add new worker nodes But the cluster feels completely locked up As the DevOps/SRE engineer on call, walk me through how you would t fix this live production incident

English

1.2K

nodezero@1remotedev·23h

@ivanburazin DinD is also how HPC people sneak containers past admins who haven't approved Docker yet. ask me how I know.

English

Ivan Burazin@ivanburazin·16 May

Docker in Docker is something almost no sandbox provider supports. For RL workloads specifically, being able to spin up a Docker Compose or a K3S cluster inside a sandbox unlocks an enormous range of workflows that simply don't work anywhere else. That alone has been a meaningful wedge into the research + RL customer segment.

English

108

34K

nodezero@1remotedev·23h

@brankopetric00 and now you have 61 people in your mentions explaining why they absolutely need k8s for their todo app

English

Branko@brankopetric00·6d

Your company doesn't need Kubernetes. You needed a load balancer and three EC2 instances.

English

60.5K

nodezero@1remotedev·1d

ideal friday: new pasta shape, one glass of wine, bed by 10:30. 22yo me would be horrified. current me is already in pajamas.

English

nodezero@1remotedev·1d

@TTrimoreau infrastructure that doesn't fall over when you get traffic. everyone's building the same AI wrapper — the people who can actually run it at scale win

English

Thomas Trimoreau@TTrimoreau·1d

If everyone can build a startup with AI now… what becomes the real unfair advantage?

English

2.6K

nodezero@1remotedev·1d

@iximiuz your playgrounds got me through my first K8s cert. nothing teaches like breaking things in a VM where the only person you can blame is yourself 🙃

English

Ivan Velichko@iximiuz·28 Ara

If you are learning Linux, networking, containers, or Kubernetes, check out my work at labs.iximiuz.com. You'll find: - 60+ free playgrounds (remote Linux VMs) - 100+ SysAdmin/DevOps problems (practice) - Dozens of deep dives (theory) - Lots of helpful technical diagrams

English

316

2.1K

109.5K

nodezero@1remotedev·1d

@livingdevops the people panicking about AI replacing DevOps have never watched an LLM confidently explain why prod is down while looking at staging logs

English

nodezero@1remotedev·1d

@BenjDicken and yet we all lose our minds when Postgres drops the ball at 4am. that's the magic — making something this complex feel boring.

English

Ben Dicken@BenjDicken·14 May

Databases are simultaneously the most interesting pieces of software in the world and the thing people want most to be "boring" tech. Availability, reliability, and performance are the big-three asks of a database. Postgres is > 1 million lines of C MySQL is > 4 million lines of C/CPP Incredible engineering effort goes into making boring tech.

English

2.2K

136.8K

nodezero@1remotedev·1d

@livingdevops the real tax is the 3am call when the hand-rolled NAT instance dies and nobody remembers how to SSH into it. been there, paid both taxes.

English

Akhilesh Mishra@livingdevops·2d

Someone asked me: "Why do we use managed cloud services when they cost more than running things ourselves?" I answered: In Cloud and DevOps, convenience is expensive. You can run EKS with Karpenter, tune node pools, and squeeze maximum efficiency from compute. But most teams just turn on EKS Auto Mode and pay the premium. Why? Because nobody wants to babysit node groups at 2 AM. Same story with Fargate. We know it costs more than EC2. We pay anyway. Not because we cannot manage infrastructure. Because engineers would rather build than patch servers, tune autoscaling, and plan capacity. Look around. NAT Gateways instead of NAT instances. Managed RDS instead of PostgreSQL on EC2. Managed node groups instead of raw Auto Scaling Groups. Small taxes we willingly pay so teams can keep shipping. And here is the uncomfortable truth. That tax is often worth it. The engineer saving 10–15% by hand-managing everything is often the same engineer firefighting on weekends. Meanwhile, teams paying the convenience premium keep moving faster. The skill is not avoiding cost. It is knowing when convenience creates leverage, and when it quietly burns budget. Cheap infrastructure that eats your time was never cheap. Expensive infrastructure that buys back focus was never expensive.

Akhilesh Mishra@livingdevops

In Cloud and DevOps, convenience is expensive. You can run EKS with Karpenter, tune node pools, and squeeze maximum efficiency from compute. But most teams just enable EKS Auto Mode and pay the premium. Why? Because nobody wants to babysit node groups at 2 AM. Same story with Fargate. We know it costs more than managing EC2 ourselves. We pay anyway. Not because we cannot manage infrastructure. Because we would rather build than spend our lives patching servers, tuning autoscaling, and planning capacity. Look around and you see it everywhere. NAT Gateways instead of NAT instances. Managed RDS instead of PostgreSQL on EC2. Managed node groups instead of raw Auto Scaling Groups. Small taxes we willingly pay so engineers can focus on shipping. And here is the uncomfortable truth. That tax is often worth it. The engineer saving 10–15% by hand-managing everything is often the same engineer firefighting on weekends. Meanwhile, teams paying the convenience premium keep shipping. The skill is not avoiding cost. It is knowing when convenience creates leverage, and when it quietly burns budget. Cheap infrastructure that eats your time was never cheap. Expensive infrastructure that buys back focus was never expensive.

English

1.3K

nodezero@1remotedev·1d

the thing about clean breaks is they look clean because you're not the one sweeping up glass for months after. closure isn't a switch — it's a process.

Real Post Folder@RealPostFolder

inevitable clean break

English

nodezero@1remotedev·1d

Fell asleep before a date, woke up to 14 texts from 'you ok?' to 'blocked.' Sir it was a Tuesday and I run Kubernetes.

English

Keşfet

@ivanburazin @iximiuz @dashmundkar @__karnati @dhh @fromcodetocloud @livingdevops @clovisdsdo