Transforming to Microservices with Monorepos

Introduction

The project we started as a simple job board had now become a full blown job application tracking system with workflows, user control etc. But the large, monolith codebase was a problem. Not only was it taking longer for new team members to understand the code, there was always the fear of breaking something.

Nevertheless, as developers kept pushing their changes, we observed rising issues and clear failure of the team to collectively control the overall code quality. Both the reliability and velocity of code changes had gone down.

These concerns lead us to evaluate our options to improve our codebase - breaking the mono repo into smaller micro services, adding CI/CD pipelines, and monitoring tools for performance measurement, seemed like a promising option.

Micro services would provide us loosely coupled components, which would allow making changes in a service independent of others. CI/CD would allow automatic integration, testing and deployment of changes. Monitoring tools help identify difficult to find bugs in application behaviour. Also, there are system resource monitoring and observability tools that could help us manage outages etc.

But for all this to work, we needed to refactor our code to break it into smaller components.

Refactoring code into micro services took us about a couple of months but it was a big improvement for us from the monolith that we built earlier. Our coarsely defined micro services provided bounded context for the team to understand and develop the code independent from others. We generously used the Strimzi Kafka operator to streamline inter-service communication. Containerization with Kubernetes helped us automate deployment and testing.

Mono repo vs Multi repo

As we moved to micro services, another question was whether to manage the refactored code in multiple repositories or in a single repository. One repo for one microservice, i.e. multi repo for the project seemed to be the natural design pattern, so we first chose multi repo. Multi repo enabled the team to develop independently, deploy independently, and distribute responsibility independently. But then there were shortcomings too.

Multi repo was hard to test in an integrated manner. There was code duplication and code reviews were becoming tough because sometimes the context was missing. What we needed was the independence which multi repo provides whereas the ease of component sharing, code review without any complexity. This is where mono repo shines.

Mono repo keeps components as separate, independent packages within the single repo. This way the whole project is right there for people to look at, understand and get the bigger picture. On further exploration we found out that mono repo is quite a popular approach at large organizations like Facebook, Google, and others.

However, mono repo requires good tooling to manage dependencies, automation and other aspects and there are several tooling systems like bazel, buck, GVFS, G3 etc. This is when we found out Yarn workspaces and Lerna to manage multi-package mono repo.

Lerna

Lerna optimises the workflow around managing multi-package repositories with git and npm. It helps us with package management, cross-package orchestration, version control, listing, testing, and making build. In fact with one Lerna command, you can iterate through all the packages, running a series of operations (such as linting, testing, and building) on each package. Another big reason for its popularity is that it is easy to start using lerna and most of the work is done by 4 easy commands - publish, changed, diff, and run.

Yarn Workspaces

Yarn workspaces help us manage dependencies efficiently. It helps us build individual package and manage dependency version conflicts. By default in monorepo if package A depends on packages B and C, we need to build all three packages just to make a change in package. Because many packages reside within a single monorepo, this can be a very slow process compared to a multi repo setup, where you would just pull down a pre-built version of B and C when you install. With yarn workspaces you can build package A individually

Yarn workspaces then also helps in dealing with dependency version conflicts. Imagine now that two packages A and B depend on External packages, while A depends on external package@1.0.0 and B depends on external package@2.0.0. A and B can no longer share a copy of External package because their versions are incompatible. With Yarn focussed install, it will ensure that Package A and Package B gets the appropriate version of dependent packages.

To sum up, our learning was that, while cloud native gave us the paradigm to build separate, smaller services, mono repo gave us the repository scheme which helps us manage code for growing projects and teams. Cloud native + monorepo setup considerably improved reliability, and agility of our product.

Updated