
Greg Val
220 posts

Greg Val
@val__greg
Building https://t.co/7dDABA5EIO - what happens when you lose access to everything. Founder + CTO across startups and enterprise with 20y+ experience.














We experienced an outage at Coinbase last night, which is never acceptable. The root cause was a room overheating in an AWS datacenter when multiple chillers failed. We design our services to be redundant to downtime in any one AWS Availability Zone (AZ), and most of our systems worked this way last night, but not all. Our centralized exchange did not. Exchanges have unique architectures that optimize for latency and co-location of clients. It is possible to make exchanges resistant to AZ failures, but this can introduce latency delays that are not desirable along with breaking customer co-location. Given this incident, we'll revisit these tradeoffs to ensure we're giving you the best possible venue to trade. At a minimum, the duration of an outage should be able to be reduced considerably when an AZ move is needed. Thank you to the AWS and Coinbase teams for working through the night to mitigate the issue. We’ll share the detailed technical summary once it's ready.



























