Kate Neely 04/15/2020

Software Release Culture at Shopify

A recording of the event and the additional questions are now available in the Release Culture @ Shopify Virtual Event section at the end of the post.

By Jack Li, Kate Neely, and Jon Geiger

At the end of last year, we shared the Merge Queue v2 in our blog post Successfully Merging the Work of 1000+ Developers. One question that we often get is, “why did you choose to build this yourself?” The short answer is that nothing we found could quite solve the problem in the way we wanted. The long answer is that it’s important for us to build an optimized experience for how our developers want to work and to continually shape our tooling and process around our “release culture”.

Shopify defines culture as:

“The sum of beliefs and behavior of everyone at Shopify.”

We approach the culture around releasing software the exact same way. We have important goals, like making sure that bad changes don’t deploy to production and break for our users, and that our changes can make it into production without compromises in security. But there are many ways of getting there and a lot of right answers on how we can do things.

As a team, we try to find the path to those goals that our developers want to take. We want to create experiences through tooling that can make our developers feel productive, and we want to do our best to make shipping feel like a celebration and not a chore.

Measuring Release Culture at Shopify

When we talk about measuring culture, we’re talking about a few things.

  • How do developers want to work?
  • What is important to them?
  • How do they expect the tools they use to support them?
  • How much do they want to know about what’s going on behind the scenes or under the hood of the tools they use?

Often, there isn’t one single answer to these questions, especially given the number and variety of people who deploy every day at Shopify. There are a few active and passive ways we can get a sense of the culture around shipping code. One method isn’t more important than the others, but all of them together paint a clearer picture of what life is like for the people who use our tools.

Passive and active methods of measurement
Passive and active methods of measurement

The passive methods we use really don’t require much work from our team, except to manage and aggregate information that comes in. The developer happiness survey is a biannual survey of developers across the company. Devs are asked to self-report about everything from their satisfaction with the tools they use or where they feel the most of their time is wasted or lost.

In addition, we have Slack channels dedicated to shipping that are open to anyone. Users can get support from our team or each other, and report problems they’re having. Our team is active in these channels to help foster a sense of community and encourage developers to share their experiences, but we don’t often use these channels to directly ask for feedback.

That said, we do want to be proactive about identifying pain points, and we know we can’t rely too much on users to provide that direction, so there are also active things we do to make sure we’re solving the most important problems.

The first thing is dogfooding. Just like other developers at Shopify, our team ships code every day using the same tools that we build and maintain. This helps us identify gaps in our service and empathize with users when things don’t go as planned.

Another valuable resource is our internal support team. They take on the huge responsibility of helping users and supporting our evolving suite of internal tools. They diagnose issues and help users find the right team to direct their questions. And they are invaluable in terms of identifying common pain points that users experience in current workflows, as well as potential pitfalls in concepts and prototypes. We love them.

Finally, especially when it comes to adding new features or changing existing workflows, we do UX research throughout our process:

  • to better understand user behavior and expectations
  • to test out concepts and prototypes as we develop them

We shadow developers as they ship PRs to see what else they’re looking at and what they’re thinking about as they make decisions. We talk to people, like designers and copywriters, who might not ship code at other companies (but they often do at Shopify) and ask them to walk us through their processes and how they learned to use the tools they rely on. We ask interns and new hires to test out prototypes to get fresh perspectives and challenge our assumptions.

All of this together ensures that, throughout the process of building and launching, we’re getting feedback from real users to make things better.

Feedback is a Gift

At Shopify, we often say feedback is a gift, but that doesn’t always make it less intimidating for users to share their frustrations, or easier for us to hear when things go wrong. Our goal with all measuring is to create a feedback loop where users feel comfortable talking about what’s not working for them (knowing that we care and will try to act on it), and we feel energized and inspired by what we learn from users instead of disheartened and bitter. We want them to know that their feedback is valuable and helpful for us to make both the tools and culture around shipping supportive of everyone.

Shopify’s Release Process

Let’s look at what Shopify’s actual release process looks like and how we’re working to improve it.

Release Pipeline

Happy path of the release pipeline
Happy path of the release pipeline

This is what the release pipeline looks like on the happy path. We go from Pull Request (PR) to Continuous Integration (CI)/Merge to Canary deployment and finally Production.

Release pipeline process starts with a PR and a /shipit command
Release pipeline process starts with a PR and a /shipit command

Developers start the process by creating a PR and then issue a /shipit command when ready to ship. From here, the Merge Queue system tries to integrate the PR with the trunk branch, Master.

PR merged to Master and then deployed to Canary
PR merged to Master and then deployed to Canary

When the Merge Queue determines the changes can be integrated successfully, the PR is merged to Master and deployed to our Canary infrastructure. The Canary environment receives a random 5% of all incoming requests.

Changes deployed to Production Changes deployed to Production 

Developers have tooling allowing them to test their changes in the Canary environment for 10 minutes. If there’s no manual intervention and the automated canary analysis doesn’t trigger any alerts, the changes are deployed to Production

Trust

Developers want to be trusted and have autonomy over their work. Developers should be able to own the entire release process for their PRs.

Developers own the whole process
Developers own the process

Developers own the whole process. There are no release managers, sign offs, or windows that developers are allowed to make releases in.

We have infrastructure to limit the blast radius of bad changes
We have infrastructure to limit the blast radius of bad changes

Unfortunately, sometimes things will break, and that’s ok. We have built our infrastructure to limit the blast radius of bad changes. Most importantly, we trust each of our developers to be responsible and own the recovery if their change goes bad.

Developers can fast track a fix using /shipit --emergency command
Developers can fast track a fix using /shipit --emergency

Once a fix has been prepared (either a fix-forward or revert), developers can fast track their fix to the front of the line with a single /shipit --emergency command. To help our developers make decisions quickly, we don’t have multiple recovery protocols, and instead, just have a single emergency feature that takes the quickest path to recovery.

Velocity

Developers want to ship fast.

A quick release process allows us a quick path to recovery
A quick release process allows us a quick path to recovery

Speed of release is a crucial element to most apps at Shopify. It’s a big productivity boost for developers to ship their code multiple times a day and have it reach end-users immediately. But more importantly, having a quick-release process allows us a quick path to recovery.

We’re willing to make tradeoffs in cost for a fast release process
We’re willing to make tradeoffs in cost for a fast release process

In order to invest in really making our release process fast, we’re willing to make tradeoffs in cost. In addition to dedicated infrastructure teams, we also manage our own CI cluster with multiple thousands of nodes at its daily peak.

Automate as Much as Possible

Developers don’t want to perform repetitive tasks. Computers are still better at doing things than humans—so we automate as much as possible. We use automation in places like continuous deployments and canary analysis.

Developers don't have to press deploy, we automate that
Developers don't have to press deploy, we automate that

We automated out the need for developers to press Deploy—we automatically continuously deploy to Canary and Production.

Developers can override automation
Developers can override automation

It’s still important for developers to be able to override the machinery. Developers can lock the automatic deployments and deploy manually, in cases like emergencies.

Release Culture @ Shopify Virtual Event

We held Shipit! Presents a Q&A about Release Culture @ Shopify with our guests Jack Li, Kate Neely, and Jon Geiger on April 20, 2020.  We had a discussion about the culture around shipping code at Shopify and answered your questions. We weren't able to answer all the questions during the event, so we've included answers to all the questions that we didn't get to below.

How has automation and velocity maintained uptime?

Automation has helped maintain uptime by providing assurances on a level of quality for changes on the way out, such as the automated canary analysis guaranteeing that the changes meet a certain level of production quality. Velocity has helped maintain uptime by reducing downtime; when things break, the high velocity of release means that problems are resolved quicker.

For an active monolith with many merges happening throughout the day, deploys to canary must be happening very frequently. How do you identify the "bad" merge, if there have been many recent merges on canary, and how do you ensure that bad merges don't block the release of other merges while there's a "bad" merge in the canary environment?

Our process here is still relatively immature, so triaging failures is still a manual process. Our velocity helps us ensure smaller changelists which makes triaging failures easier. As for reducing the impact of bad changes, I will defer to our blog post about the Merge Queue, which helps us ensure that we are not completely stalled when bad changes happen. 

How do you as a tooling organization handle sprawl? How do you balance enabling and controlling? That is, can teams choose to use their own languages and frameworks within those languages, or are they restricted to a set of things they're allowed to use?

Generally, we are more restrictive in technology choices. This is mostly because we want to be strategic in the technologies that we use, and so we are open to experimentation, but we have technologies that are battle-tested at Shopify that we encourage as recommended defaults (e.g. Ruby, Rails). React Native is the Future of Mobile at Shopify is an interesting article that talks about a recent technology change we have made.

What were you using before /shipit? How did that transition look? How did you measure its success?

Successfully Merging the Work of 1000+ Developers tells the story of how we got to the current /shipit system. Our two measures of success around this has been through feedback from our developer happiness survey, and from metrics around average pull request time-to-production.

How many different tools comprise the CI/CD pipeline and are they all developed in house, and are they owned by a specific team or does everyone contribute?

We work with a variety of vendors! Our biggest partners are Buildkite, which we use for scheduling CI, and GitHub which we build our development workflow around. Some more info about our tooling can be found at Stackshare.io. The tools we build are developed and owned by our Developer Acceleration team, but everyone is free to contribute and we get tons of contributions all the time. Our CD system, Shipit, is actually open source and we see contributions by community members frequently as well.

How is performance on production monitored after a feature release?

Typically this is something that teams themselves will strategize around and monitor. Performance is a big deal to us, and we have an internal dashboard to help teams track their performance metrics. We trust each team to take this component of their product seriously. Aside from that, the Production Engineering team has monitors and dashboards around performance metrics of the entire system.

How did you get into creating dev tooling for in-house teams. Are there languages/systems you would recommend learning to someone who is interested?

(Note from Jack: Interpreting this as a kind of career question) Personally, I’ve always gravitated towards the more “meta” parts of software development, focusing on long-term productivity and maintainability of previous projects, so working on dev tooling full-time felt like a perfect fit. In my opinion, the most important skill to be successful in this problem space is to be adaptable, both in adapting to new technologies and to new ideas. Languages like Ruby, Python, that allow you to focus more on the ideas behind your code can be good enablers for this. Docker and Kubernetes knowledge is valuable in this area as well.

Is development done on feature branches and entire features merged all at once, or are partial/incomplete features merged into master, but guarded by a feature flag?

Very good question, I think certain teams/features will do slightly different things, but typically releases happen via feature flags that we call “Beta Flags” in our system. This allows changes to be rolled out on a per-shop basis, or a percentage-of-shops basis.

Do you guys use Crystalball?

We forwarded this question to our test infrastructure team, their response was that we don’t use Crystallball, there was some brief exploration into this, but it wasn’t fast enough to trace through our codebase, and the test suite in our main monolith is written in minitest.

Additional Information


If this sounds like the kind of problems you want to solve, we're always on the lookout for talent and we’d love to hear from you. Visit our Engineering career page to find out about our open positions.Learn about the actions we’re taking as we continue to hire during COVID‑19