Hi, I'm Graeme Johnson, and I work on Shopify's Developer Acceleration team. Our mission is to provide tools that let developers ship fast and safely. Recently we began shipping Shopify automatically as developers hit the merge button in GitHub. This removes the final manual step in our deploy pipeline, which now looks like this:
Merge → Build container → Run CI → Hit deploy button → Ship to production
We have invested a lot of engineering effort to make this pipeline fast enough to run end-to-end in about 15 minutes—still too slow for our taste—and robust enough to allow cancellation at any stage in the process. Automating the actual deploy trigger was the next logical step.
Before automatic deployment, we depended on developers to batch their changes by coordinating in Slack and to ship them in logical units—our deploys still weren’t quite fast enough to ship each change individually. We encountered two main problems on a regular basis.
1. Deploy logjam
The GitHub merge button is the last human interaction between a developer writing code and that code landing in production: our Shipit deploy robot automatically batches merges and trigger deploys on a configurable cadence. This effectively solves both the deploy logjam and unshipped changes problems.
Auto-deploy reinforces three important cultural standards too:
1. High expectations
If you’re asking devs to wait for their deploy (we are) the deploy machinery has to be fast and rock-solid. We have a team dedicated to ensuring this is true.
As the pull-request is the last gate between a developer and production we trust our devs to deliver high-quality reviews and comprehensive tests to ensure code works as designed.
Shipping is the cool part of software development and should not be a scary proposition.
We use the standard GitHub pull request flow of developing on a feature branch with automatic checks running against each commit. Checks include unit tests, container builds, and code style checks, which are all visible as CI status annotations from the pull request:
GitHub takes care of rolling up the various checks into an overall red/green status and indicates if a merge can be done automatically:
Pushing the merge button will commit your code to master and re-run all of the checks one more time before code is pushed to production. The Shipit main page shows merges being tested and those staged for deploy:
A ChatOps robot, Spy, hangs out in our #operations channel and automates common actions while providing a globally visible activity log. Shipit integrates with Slack and lets everyone know when code is about to ship:
And when code has shipped successfully:
If things go wrong a human can halt the deploy train and take corrective action. Here’s an example:
Once master has been put back into a known good state the deploy train can be restarted:
We tried the auto-deploy machinery on a number of smaller, lower-risk projects like our Slack robot to build confidence. This pointed out the need for Slack notifications to let developers see the progress of code through the pipeline. Then, our developer acceleration team shipped the auto-deploy changes and watched the deploy pipeline for the first few weeks for any signs of misbehaviour and to observe the first "real" rollback.
Results After Month #1
After a month, we've noticed some tangible results. First, we're shipping faster: in fact, almost 50 times each day. Second, instead of having big batches, we're now shipping smaller changes with average deploy size hovering between one to two changes. This shift makes connecting any problems to code changes easier.
Have we had to stop the deploy machine? Yes! Developers have demonstrated that they are responsible and will spot problems and stop the deploy train if needed. The amount of time deploys are locked turned out to be a key indicator of how healthy our pipeline is and now appears on our production dashboards.
Unexpected Surprises After Month #1
In terms of performance impact, Shopify is fully containerized so a deploy involves spinning up new Docker containers on the production fleet. A slight capacity reduction as containers are coming up plus cold caches once booted means a deploy has a minor but measurable performance impact as you can see on the following chart:
To mitigate the effect of deploy spikes and give humans a chance to assess quality we have a six-minute "quiet time" after each deploy to allow the system to come back to equilibrium.
With regards to human impact, the deploy button used to be accompanied by a checklist of reminders about how to ship safely. By automating the deploy these warnings were no longer in your face and it turns out people missed seeing them.
To address this concern we taught our ChatOps robot (Spy) to produce reminders about impending deploys and remind people about their responsibilities. The Spy robot sends a Slack direct message to individual developers as the deploy progresses. The first message appears immediately post-merge as CI is running:
During this period a developer watches our CI system to ensure the merged code passes all tests. Speed and test stability are critical, so that devs aren't stuck waiting too long to see the results. Once CI passes, Spy sends another direct message indicating a deploy is going out that includes your change:
We’ve found that this combination of reminders is sufficient to reinforce good shipping behaviour. When the deploy succeeds Spy sends a final message:
Automatic deploy was a successful experiment and is here to stay, but we still have work to do:
- More speed: our deploy machinery still isn’t fast enough to deploy each change individually which would further simplify problem determination. We’ve added even more metrics to the deploy pipeline to help us identify areas we can improve
- More stability: even though we have four 9s of test stability we have enough tests (55k) that instability is painful. A wobbly test requires a retry wasting both machine and people time so again we’re instrumenting the pipeline to make these easy to find and correct
- Automatic rollback: being able to spot functional or performance regressions automatically and roll back to a known good version is still an open problem. It requires adding a more probes to the application and building trust in a robot having control over the rollback button
- Better deploy granularity and canaries: being able to roll out a change to a subset of production to evaluate code fitness before global deploy is on the radar
I hope you've found this article useful in understanding continuous deployment at Shopify. Many thanks to Jean Boussier who contributed much of the code and helped author this post. We’ve open-sourced our Shipit deploy engine, which powers our deploy pipeline so you can enjoy the same tools that we do.
May you ship fast and safely.