There was a problem with the file for today’s episode which has been fixed, so if you ran into an error this morning please delete and re-download today’s episode.
We've focused more on reliability through the last couple of years than ever before at @zendesk. I've just shared the 8 principles that have helped guide us internally
zendesk.engineering/zen-and-the-ar…
🏗️10 years ago if I was building a low-traffic, multi-tenant app with a simple domain model (< 10 "entities") that needs a nice front-end, I would have reached for @rails on @heroku.
What should I reach for today? Assume I want to learn something new *and* be productive!
I have some exciting news! This week I was promoted to Principal Engineer @github. I'm really proud of the work I've done here since joining in 2017, and excited for that work to grow from here.
@glenathan@beanieboi@yann_ck@t_crayford@nicolefv In my experience, going to smaller but more numerous services comes with its own problems. specifically, figuring out whether stuff works correctly gets an order of magnitude more difficult. Deployment complexity is a function of organization size and amount of functionality.
@dasch@beanieboi@yann_ck@t_crayford Imagine how much more effective you could be if you could deploy smaller, faster iterations instead of having to batch up changes and waste time queuing.
There’s a lot of research about this by @nicolefv in Accelerate.
Every team’s deployment process should aim for “hit the merge button on github dot com, computers do the rest”
my current team does this for a control plane that manages millions of databases, your team can do it too
@marcdel@beanieboi@yann_ck@t_crayford I agree, and we use this heavily. Some stuff can’t use feature flags though, like library upgrades. Inherently risky.
@dasch@beanieboi@yann_ck@t_crayford progressive delivery (decoupling deployment from release via feature flags) is the only thing i’ve seen actually work for this once your team gets past a certain size. requires significant investment to do right tho
@glenathan@beanieboi@yann_ck@t_crayford That’s quite a blanket statement 😉 with hundreds of thousands of customers, the opportunity cost is pretty big. And there’s stuff that just takes time. We have ten or so production environments spread over the globe.
@dasch@beanieboi@yann_ck@t_crayford I can think of few better ways to invest than to improve the speed of deployment on an active codebase with high contention
@glenathan@beanieboi@yann_ck@t_crayford That’s great in theory, but not always achievable or the best use of resources for large orgs with older code bases.
@dasch@beanieboi@yann_ck@t_crayford Perhaps your causality should be flipped?
What changes can you make to your system architecture or approaches so that small teams doing 5-10 minutes becomes viable?
@beanieboi@yann_ck@t_crayford What happens when a bad change gets deployed to production but other changes get enqueued before that’s detected? Dev env doesn’t catch everything.
@dasch@yann_ck@t_crayford The feature branch is for development/testing. We only merge to main once tested in your dev (everybody has their own dev env). main is strictly for production and represents what’s running, except we rollback but that’s then clearly communicated with the team.
@beanieboi@yann_ck@t_crayford … the “enqueueing” of changes to deploy. Merging to e.g. `production` is basically just enqueuing a deploy+merge to master – and you need automated rebasing to get rid of tainted deploys from the queue.
@beanieboi@yann_ck@t_crayford I think most orgs use deploy from master, which they also use for development. Also: do you have a separate merge from production to master? That _would_ solve most of the problem, but it’s also just what I’m describing – managing the merges to the dev branch separately from…
@beanieboi@yann_ck@t_crayford But even then, you now have all the merges in between plus the fix as a single deployable unit - you can’t rebase, because the main branch is also used for dev. With 40+ devs working on a sprawling and old code base, this happens a lot.