Maple Ong 09/23/2020

Enforcing Modularity in Rails Apps with Packwerk

On September 30, 2020 we held ShipIt! presents: Packwerk by Shopify. A video for the event is now available for you to learn more about our latest open source tool for creating packages with enforced boundaries in Rails apps. Click here to watch the video.

The Shopify core codebase is large, complex, and growing by the day. To better understand these complex systems, we use software architecture to create structural boundaries. Ruby doesn't come with a lot of boundary enforcements out of the box. Ruby on Rails only provides a very basic layering structure, so it's hard to scale the application without any solid pattern for boundary enforcement. In comparison, other languages and frameworks have built-in mechanisms for vertical boundaries, like Elixir’s umbrella projects.

As Shopify grows, it’s crucial we establish a new architecture pattern so large scale domains within the monolith can interact with each other through well-defined boundaries, and in turn, increase developer productivity and happiness. 

So, we created an open source tool to build a package system that can be used to guide and enforce boundaries in large scale Rails applications. Packwerk is a static analysis tool used to enforce boundaries between groups of Ruby files we call packages.

High Cohesion and Low Coupling In Code

Ideally, we want to work on a codebase that feels small. One way to make a large codebase feel small is for it to have high cohesion and low coupling.

Cohesion refers to the measure of how much elements in a module or class belong together. For example, functional cohesion is when code is grouped together in a module because they all contribute to one single task. Code that is related changes together and therefore should be placed together.

On the other hand, coupling refers to the level of dependency between modules or classes. Elements that are independent of each other should also be independent in location of implementation. When a certain domain of code has a long list of dependencies of unrelated domains, there’s no separation of boundaries. 

Boundaries are barriers between code. An example of a code boundary is to have a separate repository and service. For the code to work together in this case, network calls have to be made. In our case, a code boundary refers to different domains of concern within the same codebase.

With that, there are two types of boundaries we’d like to enforce within our applications—dependency and privacy. A class can have a list of dependencies of constants from other classes. We want an intentional and ideally small list of dependencies for a group of relevant code. Classes shouldn’t rely on other classes that aren’t considered their dependencies. Privacy boundaries are violated when there’s external use of private constants in your module. Instead, external references should be made to public constants, where a public API is established.

A Common Problem with Large Rails Applications

If there are no code boundaries in the monolith, developers find it harder to make changes in their respective areas. You may remember making a straightforward change that shockingly resulted in the breaking of unrelated tests in a different part of the codebase, or digging around a codebase to find a class or module with more than 2,000 lines of code. 

Without any established code boundaries, we end up with anti-patterns such as spaghetti code and large classes that know too much. As a codebase with low cohesion and high coupling grows, it becomes harder to develop, maintain, and understand. Eventually, it’s hard to implement new features, scale and grow. This is frustrating to developers working on the codebase. Developer happiness and productivity when working on our codebase is important to Shopify.

Rails Is Like an Open-concept Living Space

Let’s think of a large Rails application as a living space within a house without any walls. An open-concept living space is like a codebase without architectural boundaries. In an effort to separate concerns of different types of living spaces, you can arrange the furniture in a strategic manner to indicate boundaries. This is exactly what we did with the componentization efforts in 2017. We moved code that made sense together into folders we call components. Each of the component folders at Shopify represent domains of commerce, such as orders and checkout.

In our open-concept analogy, imagine having a bathroom without walls—it’s clear where the bathroom is supposed to be, but we would like it to be separate from other living spaces with a wall. The componentization effort was a great first step towards modularity for the great Shopify monolith, but we are still far from a modular codebase—we need walls. Cross-component calls are still being made, and Active Record models are shared across domains. There’s no wall imposing those boundaries, just an agreed upon social contract that can be easily broken.

Boundary Enforcing Solutions We Researched

The goal is to find a solution for boundary enforcement. The Ruby we all know and love doesn't come with boundary enforcements out of the box. It allows specifying visibility on the class level only and loads all dependencies into the global namespace. There’s no differences between direct and indirect dependencies.

There are some existing ways of potentially enforcing boundaries in Ruby. We explored a combination of solutions: using the private_constant keyword to set private constants, creating gems to set boundaries, using tests to prevent cross-boundary associations, and testing out external gems such as Modulation.

Setting Private Constants

The private_constant keyword is a built-in Ruby method to make a constant private so it cannot be accessed outside of its namespace. A constant’s namespace is the modules or classes where it’s nested and defined. In other words, using private_constant provides visibility semantics for constants on a namespace level, which is desirable. We want to establish public and private constants for a class or a group of classes.

However, there are drawbacks of using the private_constant method of privacy enforcement. If a constant is privatized after it has been defined, the first reference to it will not be checked. It is therefore not a reliable method to use.

There’s no trivial way to tell if there’s a boundary violation using private_constants. When declaring a constant private to your class, it is hard to determine if the use of the constant is getting bypassed or used appropriately. Plus, this is just a solution for privacy issues and not dependency.

Overall, only using private_constant is insufficient to enforce boundaries across large domains. We want a tool that is flexible and can integrate into our current workflow. 

Establishing Boundaries Through Gems

The other method of creating a modular Rails application is through gems. Ruby gems are used to distribute and share Ruby libraries between Rails applications. People may place relevant code into an internal gem, separating concerns from the main application. The gem may also eventually be extracted from the application with little to no complications.

Gems provide a list of dependencies through the gemspec which is something we wanted, but we also wanted the list of dependencies to be enforced in some way. Our primary concern was that gems don't have visibility semantics. Gems make transitive dependencies available in the same way as direct dependencies in the application. The main application can use any dependency within the internal gem as it would its own dependency. Again, this doesn't help us with boundary enforcement.

We want a solution where we’re able to still group code that’s relevant together, but only expose certain parts of that group of code as public API. In other words, we want to control and enforce the privacy and dependency boundaries for a group of code—something we can’t do with Ruby gems.

Using Tests to Prevent Cross-component Associations

We added a test case that rejects any PRs that introduce Active Record associations across components, which is a pattern we’re trying to avoid. However, this solution is insufficient for several reasons. The test doesn’t account for the direction of the dependency. It also isn’t a complete test. It doesn’t cover use cases of Active Record objects that aren’t associations and generally doesn’t cover anything that isn’t Active Record.

The test was good enforcement, but lacked several key features. We wanted a solution that determined the direction of dependencies and accounted for different types of Active Record associations. Nonetheless, the test case still exists in our codebase as we still found it helpful in triggering developer thought and discussions to whether or not an association between components is truly needed.

Using the Modulation Ruby Gem

Modulation is a Ruby gem for file-level dependency management within the Ruby application that was experimental at the time of our exploration. Modulation works by overriding the default Ruby code loading, which is concerning, as we’d have to replace the whole autoloading system in our Rails application. The level of complexity added to the code and runtime application behaviour is because dependency introspection performed at runtime.

There are obvious risks that come with modifying how our monolith works for an experiment. If we went with Modulation as a solution and had to change our minds, we’d likely have to revert changes to hundreds of files, which is impractical in a production codebase. Plus, the gem works at file-level granularity which is too fine for the scale we were trying to solve.

Creating Microservices?

The idea of extracting components from the core monolith into microservices in order to create code boundaries is often brought up at Shopify. In our monolith’s case, creating more services in an attempt to decouple code is solving code design problems the wrong way.

Distributing code over multiple machines is a topology change, not an architectural change. If we try to extract components from our core codebase into separate services, we introduce the added concern of networked communication and create a distributed system. A poorly designed API within the monolith will still be a poorly designed API within a service, but now with additional complexities. These complexities can come in forms such as stateless network boundary and serialisation between the systems, and reliability issues with networked communications. Microservices are a great solution when the service is isolated and unique enough to reason the tradeoff of the network boundary and complexities that come with it.

The Shopify core codebase still stands as a majestic modular monolith, with all the code broken up into components and living in a singular codebase. Now, our goal is to advance our application’s modularity to the next step—by having clear and enforced boundaries.

Packwerk: Creating Our Own Solution

Taking our learnings from the exploration phase for the project, we created Packwerk. There are two violations that Packwerk enforces: dependency and privacy. Dependency violations occur when a package references a private constant from a package that hasn’t been declared as a dependency. Privacy violations occur when an external constant references a package’s private constants. However, constants within the public folder, app/public, can be accessed and won't be a violation.

How Packwerk Works 

Packwerk parses and resolves constants in the application statically with the help of an open-sourced Shopify Ruby gem called ConstantResolver. ConstantResolver uses the same assumptions as Zeitwerk, the Rails code loader, to infer the constant's file location. For example, Some::Nested::Model will be resolved to the constant defined in the file path, models/some/nested/model.rb. Packwerk then uses the file path to determine which package defines the constant.

Next, Packwerk will use the resolved constants to check against the configurations of the packages involved. If all the checks are enforced (i.e. dependency and privacy), references from Package A to Package B are valid if:

  1. Package A declares a dependency on Package B, and;
  2. The referenced constant is a public constant in Package B

Ensuring Application Validity

Before diving into further details, we have to make sure that the application is in a valid state for Packwerk to work correctly. To be considered valid, an application has to have a valid autoload path cache, package definition files and application folder structure. Packwerk comes with a command, packwerk validate, that runs on a continuous integration (CI) pipeline to ensure the application is always valid.

Packwerk also checks for any acyclic dependencies within the application. According to the Acyclic Dependency Principle, no cycles should be allowed in the component dependency graph. If packages depend on each other in a cycle, making a change to one package will create a domino effect and force a change on all packages in the cycle. This dependency cycle will be difficult to manage.

In practical terms, imagine working on a domain of the codebase concurrently with 100 other developers. If your codebase has cyclic dependencies, your change will impact the components that depend on your component. When you are done with your work, you want to merge it into the main branch, along with the changes of other developers. This code will create an integration nightmare because all the dependencies have to be modified in each iteration of the application.

An application with an acyclic dependency graph can be tested and released independently without having the entire application change at the same time.

Creating a Package 

A package is defined by a package.yml file at the root of the package folder. Within that file, specific configurations are set. Packwerk allows a package to declare the type of boundary enforcement that the package would like to adhere to. 

Additionally, other useful package-specific metadata can be specified, like team and contact information for the package. We’ve found that having granular, package-specific ownership makes it easier for cross-team collaboration compared to ownership of an entire domain.

Enforcing Boundaries Between Packages

Running packwerk check
Running packwerk check

Packwerk enforces boundaries between packages through a check that can be run both locally and on the CI pipeline. To perform a check, simply run the line packwerk check. We also included this in Shopify’s CI pipeline to prevent any new violations from being merged into the main branch of the codebase.

Enforcing Boundaries in Existing Codebases

Because of the lack of code structure in Rails apps, legacy large scale Rails apps tend to have existing dependency and privacy violations between packages. If this is the case, we want to stop the bleeding and prevent new violations from being added to the codebase.

Users can still enforce boundaries within the application despite existing violations, ensuring the list of violations doesn't continue to increase. This is done by generating a deprecated references list for the package.

We want to allow developers to continue with their workflow, but prevent any further violations. The list of deprecated references can be used to help a codebase transition to a cleaner architecture. It iteratively establishes boundaries in existing code as developers work to reduce the list.

List of deprecated references for components/online_store
List of deprecated references for components/online_store

The list of deprecated references contains some useful information about the violation within the package. In the example above, we can tell that there was a privacy violation in the following files that are referring to the ::RetailStore constant that was defined in the components/online_store package.

By surfacing the exact references where the package’s boundaries are being breached, we essentially have a to-do list that can be worked off.

Conventionally, the deprecated references list was meant for developers to start enforcing the boundaries of an application immediately despite existing violations, and use it to remove the technical debt. However, the Shipping team at Shopify found success using this list to extract a domain out of their main application into its own service. Also, the list can be used if the package were extracted into a gem. Ultimately, we make sure to let developers know that the list of deprecated references should be used to refactor the code and reduce the amount of violations in the list.

The purpose of Packwerk would be defeated if we merely added to the list of violations (though, we’ve made some exceptions to this rule). When a team is unable to add a dependency in the correct direction because the pattern doesn’t exist, we recommend adding the violation to the list of deprecated references. Doing so will ensure that when such a pattern exists, we eventually refactor the code and remove the violation from the list. This results in a better alternative than creating a dependency in the wrong direction.

Preventing New Violations 

After creating packages within your application and enforcing boundaries for those packages, Packwerk should be ready to go. Packwerk will display violations when packwerk check is run either locally or on the CI pipeline.

The error message as seen above displays the type of violation, location of violation, and provides actionable next steps for developers. The goal is to make developers aware of the changes they make and to be mindful of any boundary breaking changes they add to the code.

The Caveats 

Statically analyzing Ruby is complex. If a constant is not autoloaded, Packwerk ignores it. This ensures that the results produced by Packwerk won’t have any false positives, but it can create false negatives. If we get most of the references right, it’ll be enough to shift the code quality in a positive direction. The Packwerk team made this design decision as our strategy to handle the inaccuracy that comes with Ruby static analysis. 

How Shopify Is Using Packwerk

There was no formal push for the adoption of Packwerk within Shopify. Several teams were interested in the tool and volunteered to beta test before it was released. Since its release, many teams and developers are adopting Packwerk to enforce boundaries within their components.

Currently Packwerk runs in six Rails applications at Shopify, including the core monolith. Within the core codebase, we have 48 packages with 30 boundary enforcements within those packages. Packwerk integrates in the CI pipeline for all these applications and has commands that can run locally for packaging-related checks.

Since Packwerk was released for use within the company, new conversations related to software architecture have been sparked. As developers worked on removing technical debt and refactoring the code using Packwerk, we noticed there’s no established pattern for decoupling of code and creating single-direction dependencies. We’re currently researching and discussing inversion of control and establishing patterns for dependency inversion within Rails applications.

Start Using Packwerk. It’s Open Source!

Packwerk is now out in the wild and ready for you to try it out!

To get Packwerk installed in your Rails application, add it as a gem and simply run the command packwerk init. The command will generate the configuration files needed for you to use Packwerk.

The Packwerk team will be maintaining the gem and we’re stoked to see how you will be using the tool. You are also welcome to report bugs and open pull requests in accordance with our contribution guidelines.

Credits

Packwerk is inspired by Stripe’s internal Ruby packages solution with its idea adapted to the more complex world of Rails applications.

ShipIt! Presents: Packwerk by Shopify

Without code boundaries in a monolith, it’s difficult for developers to make changes in their respective areas. Like when you make a straightforward change that shockingly results in breaking unrelated tests in a different part of the codebase, or dig around a codebase to find a class or module with more than 2,000 lines of code!

You end up with anti-patterns like spaghetti code and large classes that know too much. The codebase is harder to develop, maintain and understand, leading to difficulty adding new features. It’s frustrating for developers working on the codebase. Developer happiness and productivity is important to us.

So, we created an open source tool to establish code boundaries in Rails applications. We call it Packwerk.

During this event you will

  • Learn more about the problems Packwerk solves.
  • See how we built Packwerk.
  • Understand how we use Packwerk at Shopify.
  • See a demo of Packwerk.
  • Learn how you can get started with Packwerk.

Additional Information 


Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Default