Black Friday and Cyber Monday are the biggest days of the year at Shopify with respect to every metric. As the Infrastructure team started preparing for the upcoming seasonal traffic in the late summer of 2014, we were confident that we could cope, and determined resiliency to be the top priority. A resilient system is one that functions with one or more components being unavailable or unacceptably slow. Applications quickly become intertwined with their external services if not carefully monitored, leading to minor dependencies becoming single points of failure.
For example, the only part of Shopify that relies on the session store is user sign-in - if the session store is unavailable, customers can still purchase products as guests. Any other behaviour would be an unfortunate coupling of components. This post is an overview of the tools and techniques we used to make Shopify more resilient in preparation for the holiday season.
I was recently profiling a production Shopify application server using
perf and noticed a fair amount of time being spent in a particular function,
st_lookup, which is used by Ruby’s MRI implementation for hash table lookups:
Hash tables are used all over MRI, and not just for the
Hash object; global variables, instance variables, classes, and the garbage collector all use MRI’s internal hash table implementation,
st_table. Unfortunately, what this profile did not show were the callers of
st_lookup. Is this some application code that has gone wild? Is this an inefficiency in the VM?
This is the second in a series of blog posts describing our evolution of Shopify toward a Docker-powered, containerized data center. This instalment will focus on the creation of the container used in our production environment when you visit a Shopify storefront.
Read the first post in this series here.
Before we dive into the mechanics of building containers, let's discuss motivation. Containers have the potential to do for the datacenter what consoles did for gaming. In the early days of PC gaming, each game typically required video or sound driver massaging before you got to play. Gaming consoles however, offered a different experience:
- predictability: cartridges were self-contained fun: always ready-to-run, with no downloads or updates.
- fast: cartridges used read-only memory for lightning fast speeds.
- easy: cartridges were robust and largely child-proof - they were quite literally plug-and-play.
Predictable, fast, and easy are all good things at scale. Docker containers provide the building blocks to make our data centers easier to run and more adaptable by placing applications into self-contained, ready-to-run units much like cartridges did for console games.
This September, we quietly launched a new version of the Shopify admin. Unlike the launch of the previous major iteration of our admin, this version did not include a major overhaul of the visual design, and for the most part, would have gone largely unnoticed by the user.
Why would we rebuild our admin without providing any noticeable differences to our users? At Shopify, we strongly believe that any decision should be able to be questioned at any time. In late 2012, we started to question whether our framework was still working for us. This post will discuss the problems in the previous version of our admin, and how we decided that it was time to switch frameworks.
This is the first in a series of posts about adding containers to our server farm to make it easier to scale, manage, and keep pace with our business.
The key ingredients are:
- Docker: container technology for making applications portable and predictable
- CoreOS: provides a minimal operating system, systemd for orchestration, and Docker to run containers
Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.
We were looking for a reliable way to collect event data and send it to our data warehouse.
We were considering a more service-oriented architecture, and needed a standardized way of message passing between the components.
We were starting to evaluate containerization of Shopify, and were searching for a way to get logs out of containers.
We were intrigued by Kafka due to its highly available design. However, Kafka runs on the JVM, and its primary user, LinkedIn, runs a full JVM stack. Shopify is mainly Ruby on Rails and Go, so we had to figure out how to integrate Kafka into our infrastructure.
A recent phenomenon has taken the tech world by storm: Dogecoin. Though goofy and grammatically unique, the Dogecoin has proven to be an incredible force for good in the world through initiatives like The Dogecoin Foundation. For Shopify Hackdays then, the development team at Shopify took it upon themselves to make a gentlepeople's wager against the Business Development and Talent Acquisition teams at Shopify that the Dev team could raise more money in Dogecoin than the so called hustlers could by starting a Shopify business. Nothing like a good old fashioned competition to raise some money for charity. With all...
- Tags: Tech
I'm Chris Saunders, one of Shopify's developers. I like to keep journal entries about the problems I run into while working on the various codebases within the company. Recently we ran into a issue with authentication in one of our applications and as a result I ended up learning a bit about Rack middleware. I feel that the experience was worth sharing with the world at large so here's is a rough transcription of my entry. Enjoy! I'm looking at invalid form submissions for users who were trying to log in via their Shopify stores. The issue was actually at...
Shopify has been hard at work scaling its data pipeline for quite some time now, and it had gotten to the point that plain old log files just wouldn’t cut it. We wanted to do more and more with our data, but ran into problems at every turn: Batch processing of logs required log rotation, which introduced unacceptable latency into other parts of the pipeline. Traditional log aggregation tools like Flume didn’t provide the features, reliability, or performance that we were looking for. Fan-out configuration was promising to become unmanageable. We wanted anyone at Shopify to be able to use...
A month ago Shopify was at BigRubyConf where we mentioned an internal library we use for caching ActiveRecord models called IdentityCache. We're pleased to say that the library has been extracted out of the Shopify code base and has been open sourced! At Shopify, our core application has been database performance bound for much of our platform’s history. That means that the most straightforward way of making Shopify more performant and resilient is to move work out of the database layer. For many applications, achieving a very high cache ratio is a matter of storing full cached...