
David Rainey 🇸🇬
6.9K posts

David Rainey 🇸🇬
@davidjrainey
Singapore based Scot, via Hong Kong, New Zealand, London, Glasgow & Gourock. 🏴🇬🇧🇳🇿🇸🇬🇭🇰🇸🇬🇭🇰🇸🇬





Articles we would like to commission at Works in Progress. Could you write one? worksinprogress.news/p/more-article…





“Royal New Zealand Airforce Air Commodore Andy Scott said the P-8A Poseidon aircraft had spotted the potential sanctions busting in the Yellow Sea and East China Sea.” rnz.co.nz/news/world/593…


A brand-new GPU cluster fails more often than a mature one. The first two weeks are the worst weeks the hardware will ever have. Meta's Llama 3 training paper is the number everyone cites: 419 unexpected interruptions over 54 days on 16,384 H100s, with GPUs and HBM3 memory causing the majority. That was the stable phase, after burn-in, acceptance testing, and the obvious lemons had already been culled. The follow-up reliability paper from Meta FAIR covers 150 million A100 GPU-hours and is clearer about what the ramp looks like: mean time to failure drops from roughly 47 hours at 8 GPUs to 7.9 hours at 1,024 to 1.8 hours at 16,384, and projects under 15 minutes at 131,072. Failure modes also ebb and flow as new health checks uncover new patterns. New health checks do not make a cluster healthier. They make its existing problems visible. The operational consequence: if your first customer workload lands during the first two weeks, you are running on the steepest part of the bathtub curve with the thinnest monitoring coverage. That is how a quarter of a cluster ends up cordoned before anyone understands why. What failure mode surprised your team most during a new cluster's first month, and how long did it take to attribute ?



Singapore is a high trust society and doesnt have a homogenous culture. Different races and ethnic groups packed into a small space yet you can walk around at 3am drunk with cash hanging out of your pocket and a $100k watch on your wrist and no one will do anything





Yep


Ingraham: It looks like Trump ultimately hits the home run here, takes it to the brink. Iran blinks. Towery: When will the Democrats and some Republicans ever learn that the rhetoric he uses is done for a reason. And it works















