Monorepo versus Polyrepo

05 Nov 2021

Let us start by defining monorepo

Keep multiple projects in a single source control system 1
Acquire as many third-party and in-house dependencies as possible for a build from the same source-control repository/branch, and in the same update/pull/sync operation. 2
Keep all teams in agreement on the versions of third-party and in-house dependencies via lock-step upgrades. 2

Plausible restrictions

The point about keeping third-party binary dependencies in source control was popular in the .net world before the NuGet and in the Java world before Maven Repositories. We have come to the point where it’s preferable to avoid this practice. Let us assume that only source code and similar sources are stored.

Language & infrastructure

Languages that do not have good package managers naturally lend themselves to creation of monorepos. The main languages that I’ve used that have this property are C and C++.

Debugging

In order to debug binary packages that you don’t have the source for some languages have support for different debug packages and source references. 3 Even if these tools are available they might not work with your organization source control or package repository.

If you write libraries for business code (spread domain knowledge in package artifacts that get referenced from an internal package repository), then having restrictions on how to debug the code can be a pain for developers.

Pain points

Matt Klein makes us aware about having tight coupling and performance in Monorepos: Please don’t!.

Tight coupling

You want to homogenize the use of internal or third party libraries. Is it OK for other teams that you refactor their code when they are busy with say a crunch and want to avoid unnecessary merge conflicts?

Does changes of internal or third party libraries imply a subtle change in behavior? Even if the code is well tested on its own, usage of the code can break. You might want to have a gradual increase in the usage of a new version of a library.

One reason why you want to transform multiple repositories to fewer repositories is that you have a monolith in disguise. Multiple services and libraries are changed for one issue in what should be one business subdomain.

Performance

For very large repositories we see that organizations use source control systems that allow you to do “sparse checkouts” of only a subset of the repository since the amount of code takes too long to clone.

Focus

Clear boundaries that you don’t need to cross can increase focus on a single business subdomain.

Being able to browse the history for the code of a business subdomain can allow people to know what’s going on (too much noise drowns out the team).

Ownership

Everybody has access to everything. Who reviews what?

Conclusion

Should you have a single repository for a team? I think it makes sense if the team only works on one product/system and the choice suits the system and the team.

Probably you want to merge and split out repositories as time goes on. It’s best to keep your options open and make a pragmatic choice for your team. Keeping separate repositories can make it clear that the source is supposed to be separate.

References

[1]

Wikipedia Monorepo

[2]

trunk based development Monorepos

[3]

Comments

Do you want to send a comment or give me a hint about any issues with a blog post: Open up an issue on GitHub.

Do you want to fix an error or add a comment published on the blog? You can do a fork of this post and do a pull request on github.

Assertfail