Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers (Part II)

In the first post of this series, I outlined Shopify’s history with flash sales, our move to Nginx and Lua to help manage traffic, and the initial attempt we made to throttle traffic that didn’t account sufficiently for customer experience. We had underestimated the impact of not giving preference to customers who’d entered the queue at the beginning of the sale, and now we needed to find another way to protect the platform without ruining the customer experience.

Emil Stolarsky

Continue reading →

Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers (Part I)

This Sunday, over 100 million viewers will watch the Super Bowl. Whether they’re catching the match-up between the Falcons and the Patriots, or there for the commercials between the action, that’s a lot of eyeballs—and that’s only counting America. But all that attention doesn’t just stay on the screen, it gets directed to the web, and if you’re not prepared curious visitors could be rewarded with a sad error page.

The Super Bowl makes us misty-eyed because our first big flash sale happened in 2007, after the Colts beat the Bears. Fans rushed online for T-shirts celebrating the win, giving us a taste of what can happen when a flood of people convene on one site in a very short duration of time. Since then, we’ve been continually levelling up our ability to handle flash sales, and our merchants have put us to the test: on any given day, they’ll hurl Super Bowl-sized traffic, often without notice.

My name is Emil Stolarsky and I work on the Performance and Capacity Planning team at Shopify. This series (with part one today, and part two next week) shares the problems we faced due to overwhelming traffic from flash sales and the thrifty (and nifty!) solution we created that allowed merchants to continue running sales without requiring a major overhaul of our platform.

While not every company faces flash sales, many need to handle high-traffic events that can overload their system, and we hope this post provides inspiration for solutions that can be implemented with a small team and some elbow grease.

Emil Stolarsky

Continue reading →