Monolithic Repos Are Evil

We all keep our code in Gitversion control repositories.
The question is whether we should create a new repository for each
new module or try to keep as much as possible in a single so called “monolithic” repo.
Market leaders, like Facebook
and Google,
advocate the second approach. I believe they are wrong.

Let’s use the following JavaScript function as an example.
It downloads a JSON document from a Zold
node (using jQuery)
and places part of its content on the HTML page.
Then it colors the data according to its value.

Now, let me refactor it. Let me break it into two pieces. The first
piece will load the data and the second one will be a jQuery plugin to colorize
HTML content according to the data it contains. This is how the
plugin will look:

In a nutshell, they all claim that productivity is higher with a monolithic repo because
the amount of operations one has to do in order to make a change is smaller.
Indeed, in a monorepo there will be a single branch, a single set of commits, a single pull request,
a single merge, deploy and release. Also it will be easier to test, both
manually and via unit testing. Continuous integration is easier to configure,
and so on and so forth.

All these “reasonable” arguments remind me of what I hear when preaching
object decomposition and suggesting that multiple objects are better than
a single large one. Imagine a large class of 3,000 lines of code, which
does many things and they are all very tightly coupled. It’s “easy” to test it,
to make changes, to deploy, to review, etc. Because everything stays in one
file, right? We don’t need to jump from class to class in order to understand
the design. We just look at one screen, scroll it up and down, and that’s it.
Right? Totally wrong!

I guess I don’t need to explain why it’s wrong. We don’t design our software
that way anymore. We know that tight coupling is a bad idea. We know that
a set of smaller components is better than a larger solid piece.

Why can’t we apply the same logic to repositories? I believe we can.
Of course, just like in object-oriented programming, a fine-grained design
requires more skills and time. Look at what I had to do with this small
jQuery plugin. I’ve spent hours of coding and thinking. I even had to learn
Gulp and Jasmine,
which I most probably will not use anymore. But the benefits we are getting
from it are enormous. This is my short list of them:

Encapsulation.
Each repo encapsulates a single problem, hiding its details from everybody
else. Thanks to that, the scope each repo has to deal with gets smaller.
The smaller the scope, just like in OOP, the easier it is to maintain and
modify. The easier to maintain, the cheaper the development. I guess Google
guys don’t really worry about costs. On the contrary, they want their salaries
to grow. A large unmaintainable monolithic repo is a perfect tool to make
it happen.

Fast Builds.
When a repo is small, the time its automated build takes is small. Look at the
time Travis spends
for my jQuery plugin. It’s 51 seconds. It’s fast.
We all know
that the faster the build, the better it is for productivity, since it’s easier to use
the build as a tool for development.

Accurate Metrics.
I don’t know whether you rely on metrics in your projects, but we at
Zerocracy do pay attention to numbers, like
lines of code, hits of code,
number of commits, classes, methods, cohesion,
coupling, etc. It’s always a question whether the metrics are accurate.
Calculating lines of code for a large repository doesn’t make any sense, since
the number will include a lot of files from completely different parts of
the application. Moreover there will be different languages and file formats.
Say a repo has 200K lines of Java, 150K lines of XML, 50K lines of JavaScript,
and 40K lines of Ruby. Can you say something specific about this repo? Is it
large? Is it a Java repo? And, more importantly, can it be compared with other
repositories? Not really. It’s just a big messy storage of files.

Homogeneous Tasks.
Smaller repositories tend to have smaller tech stacks, meaning that each of
them uses just a few languages and frameworks, or (and this is the preferred
situation)—one language or technology per repository. Thanks to this,
the management of programmers becomes easier, since any ticket/problem can
be assigned to anybody. It’s easier to make tasks similar in size and complexity.
This obviously means better manageability of the project.

Single Coding Standard.
It’s easier to standardize the coding style if the repo is small. When it’s
large, various parts of the code base will have different styles and
it will be almost impossible to put everybody on the same page. In other
words, smaller repositories look more beautiful than larger ones.

Short Names.
Each repository, inevitably, will have its own namespace. For example, in the
JS repository I just created, I only have two files: colorizejs.js and test-colorizejs.js.
I don’t really care about the naming inside them, since the namespace
is very small. I can even use global variables.
Shorter names and smaller namespaces mean better maintainability.

Simple Tests.
The larger the code base, the more dependencies it has, which are difficult
to mock and test. Very large code bases become fundamentally untestable since
they require a lot of integration tests which are difficult to maintain.
Smaller libraries, frameworks and modules are easier to keep at the level
of simple and fast unit testing.

Thus, I believe that the smaller the repositories and modules, the better.
Ideally, I would say, the largest acceptable size for a code base is 50,000 lines of code.
Everything that goes above this line is a perfect candidate for decomposition.

What do you think is better, a bigger code repository with everything inside, or many smaller ones with their own builds, dependencies, issues, and pull requests?