Qafoo GmbH - passion for software quality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:Author: Benjamin Eberlei
:Date: Mon, 22 May 2017 08:24:58 +0200
:Revision: 9
:Copyright: All rights reserved
===========================================================
How You Can Successfully Ship New Code in a Legacy Codebase
===========================================================
:Abstract:
Usually the problems software needs to solve get more complex over time. As
the software itself needs to model this increased complexity it is often
necessary to replace entire subsystems with more efficient or flexible
solutions. Instead of starting from scratch whenever this happens
(often!), a better solution is to refactor the existing code and therefore
reducing the risk of losing existing business rules and knowledge. A good
strategy for this kind of change is called "Branch By Abstraction". While
the term is certainly clumsy and overloaded, the idea itself is genious.
Let's see how it works…
:Description:
Usually the problems software needs to solve get more complex over time. As
the software itself needs to model this increased complexity it is often
necessary to replace entire subsystems with more efficient or flexible
solutions. Instead of starting from scratch whenever this happens
(often!), a better solution is to refactor the existing code and therefore
reducing the risk of losing existing business rules and knowledge. A good
strategy for this kind of change is called "Branch By Abstraction". While
the term is certainly clumsy and overloaded, the idea itself is genious.
Let's see how it works…
:Keywords:
Refactoring, Branch By Abstraction, Legacy Code
The greek philosopher Heraclitus already knew that "change is the only
constant" and as software developers we know this to be true for much
of software development and business requirements.
Usually the problems software needs to solve get more complex over time. As the
software itself needs to model this increased complexity it is often necessary
to replace entire subsystems with more efficient or flexible solutions.
Instead of starting from scratch whenever this happens (often!), a better
solution is to refactor the existing code and therefore reducing the risk of
losing existing business rules and knowledge.
A good strategy for this kind of change is called "Branch By Abstraction".
While the term is certainly clumsy and overloaded, the idea itself is genious.
Instead of introducing a long running branch in your version control system
(VCS) where you spend days and months of refactoring, you instead introduce an
abstraction in your code-base and implement the branching part by selecting
different implementations of this abstraction at runtime.
Explaining Branch By Abstraction without an example is not a good idea, so
lets take a look at three different examples to get an understanding how to
make use of branch by abstraction:
Example 1: Replacing the Backend in a CMS
-----------------------------------------
Up to eZPublish version 4, the popular CMS had a very relational-database
centric API and model where all the abstractions made it quite clear that the
underlying database is relational. In addition the content repository was leaky
about another relational abstraction, the fact that one attribute is stored in
one row because of the use of a highly sophisticated Entity-Attribute-Value
model.
Those assumptions in the API and the inability to implement different data
models made it very hard to scale eZPublish 4 beyond a certain number of
content objects. With the rise of Solr and ElasticSearch customers wanted to
use the powerful search engines, but the API limitations made this nearly
impossible.
With eZPublish 5 a new API was introduced with a fully object-oriented
interface and a new content model abstraction.
The idea was to allow developers to work towards switching the storage model
step by step. You could replace usages of the old API with the new API which
would already allow to benefit of NoSQL search based implementations, with a
fallthrough to the legacy database schema. After a full switch to the new API
it would also be possible to replace the legacy database schema entirely.
Critical in using branch by abstraction in such a scenario is finding the right
API that is not a leaky abstraction for the abstracted storage engine anymore.
::
new Criterion\LogicalAnd([
new Criterion\ContentTypeIdentifier("article");
new Criterion\Field("status", Operator::EQ, "xyz"),
new Criterion\FullText("Hello World");
new Criterion\Visibility(Criterion\Visibility::VISIBLE),
])
]);
$searchResult = $searchService->findContent($query);
You can see how this API exposes query functionality generically, but makes
sure to add semantic meaning such as the FullText criterion. This allowed eZ
Systems to implement the API both with the old legacy database schema as a
first implementation, but also experiment with Solr and other search
technologies to achieve much better performance and scalability.
Example 2: Rewriting a submodule without changing public API
------------------------------------------------------------
Very early in my career I was responsible for maintaining a fairly large
newsletter system that had pretty bad APIs and was a large mess of spaghetti
code.
One center piece of this system was the parser that took free form HTML,
some configuration flags and text input and generated an HTML email that
included tracking data and links, inline CSS styles, images uploaded to a CDN
and mangled the HTML such that Outlook and the likes rendered it correctly.
The code for this large system was hidden in a single god-function inside a
class::
class Mailing
{
public function generateMail()
{
// 2000-3000 lines of PHP parsing goodness!
$this->save(array('generated_mail' => $parsedHtmlBody));
}
}
The code was hard to test and was a frequent source of small and nasty bugs.
When we onboarded a second customer onto this system and he needed different
configuration inputs everything fell apart.
We built a new mail parser from scratch with a nice new API. It was entirely
stateless and accepted a large object of inputs and produced a large object of
outputs that could then be put into the existing database as required.
The code was feature flagged to our new customer first::
class Mailing
{
public function generateMail()
{
if (Config::getTenant() === 'new_customer') {
$parser = new MailParser(/* some dependencies */);
$parsedMail = $parser->parse(
new MailDraft(array(/* tons of inputs */))
);
$this->save(array('generated_mail' => $parsedMail->body));
return;
}
// old 2000-3000 lines of PHP goodness!
}
}
The combination of highly unit tested code and some weeks of production
experience with a subset of customers finally gave us confidence to get rid of
the old code entirely and we ended up with a shiny nice API within this legacy
code base, but entirely decoupled from it.
Example 3: Github reimplements Merge button
-------------------------------------------
In December 2015, `Github blogged `_
about their use of branch by abstraction to replace how the merge button works
across the whole site. This critical feature of Github needed extensive testing
so they used a library called Scientist to devise an experiment.
They refactored the old code into a dedicated method and then wrote new code
with the same method signature. The scientist library then allows them to run
both new and old code after each other. The library checks if the response of
both methods is the same and logs an error if its not the case.
This strategy is a very powerful use-case of branch by abstraction.
The Process
-----------
The process of implementing Branch By Abstraction usually follows a certain set
of steps while each step already provides you with a sensible outcome. The
basic we have to work with looks something like this:
.. image:: /blog/images/branch_by_abstraction_01.png
:target: /blog/images/branch_by_abstraction_01.png.svg
We have several "Clients" (classes, functions) which use and probably implement
an implicitly defined concept like in one of the examples above. This might be
inline code (Example 2) or already code in other classes which is called using
APIs which do not describe the actual domain well (Example 1). Usually it is a
combination of both.
Step 1: Refactoring The First Client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We start by picking one of those "Clients" (not the most complex one) and see
how this Client uses the implicitly defined concept. We create a new API for
this. This is very similar to `Extract Methods`__ or `Extract Classes`__
refactorings. While moving the old code we also create an abstraction
(interface, set of interfaces).
Do not over-analyze in this step. Just extract the Facade which is required by
this one Client and do not try to already cover all other Clients concepts. We
will get there. It is best to just *move* the old code behind a Facade
(implementing the abstraction) to make sure the system behaves like before and
we do not break something. `Functional Tests`__ are really `useful when doing
such refactorings`__.
.. image:: /blog/images/branch_by_abstraction_02.png
:target: /blog/images/branch_by_abstraction_02.png.svg
Already after this step the code in the first Client will be much more
readable. You already did something beneficial to the software.
__ /blog/098_extract_method.html
__ /blog/099_extract_service_class.html
__ /blog/081_phpunit_mink_functional_tests.html
__ /blog/085_how_to_refactor_without_breaking.html
Step 2: Iterate Across Clients
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We continue to do this with every client. You will have to adapt the Facade and
the abstraction during this process. There will always be edge cases in in some
clients you will not have thought of initially. But this is also the beauty
behind this approach. We slowly discover and understand the implicit concept /
our domain.
.. image:: /blog/images/branch_by_abstraction_03.png
:target: /blog/images/branch_by_abstraction_03.png.svg
Step 3: Finish The Clients
^^^^^^^^^^^^^^^^^^^^^^^^^^
We do this until all clients are refactored. The clients are done now and so is
our abstraction. We now understood the concept and have modelled it with a new
set of interfaces. This is great!
.. image:: /blog/images/branch_by_abstraction_04.png
:target: /blog/images/branch_by_abstraction_04.png.svg
But there still is this Facade which contains all the old ugly code. If the
code is working well enough and we do not have to change it often it is valid
to just keep it there and ignore it. It will not be maintainable, but if there
are no requirements to change, adapt or extend it – then this is fine. Feel
free to just stop here.
Step 4: Starting a new Implementation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If we decided that we need a new implementation because the old one is just
unmaintainable we can start with this now. We have an abstraction which we
"just" need to implement. Please do this test-driven. With tests we can be
pretty sure that the new implementation will succeed and fulfil all the
requirements of our abstraction. Since the Facade, the Verifier and the new
implementation all fulfill the same interface we should be able to replace the
implementation used by the Clients using our Dependency Injection Container or
with some Factory.
.. image:: /blog/images/branch_by_abstraction_05.png
:target: /blog/images/branch_by_abstraction_05.png.svg
But we only really know that the new implementation is a success if we are
testing it in production. And with the given abstraction we can implement a
*Verifier* which ensures exactly this.
The Verifier knows the legacy Facade and the new code – it also implements the
abstraction. The Verifier now always calls the Facade wrapping the old code and
the new implementation and compares their output. It can optionally even
compare additional metrics like speed and memory usage. By using this in
production for some time and logging all differences we can be pretty sure that
the new implementation will be ready for production.
.. image:: /blog/images/branch_by_abstraction_06.png
:target: /blog/images/branch_by_abstraction_06.png.svg
In this step you might also find bugs in the old code, just like Github did in
the example linked before.
Step 5: Delete the Old Code
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now comes the most beautiful step: Deleting the old code.
.. image:: /blog/images/branch_by_abstraction_07.png
:target: /blog/images/branch_by_abstraction_07.png.svg
Once we are sure everything works there is no need to keep the Verifier nor the
Facade wrapping the old code. You can throw everything away and now only
maintain and adapt the new clean code.
.. note::
Qafoo experts can `kick-start your team with a continuous refactoring`__
process. We can show you how to improve your source code quality on the go
and help you to get rid of the big quality chuckholes in your construction
site.
__ /services/workshops/refactoring
Conclusion
----------
Branch by abstraction is not an easy skill to master and the implementation is
always extremely specific to the use-case. But the benefits of avoiding
long-running branches and complex merges, being able to test old vs new code
and gradually rolling out the new functionality to only a subset of users are
so large, that you should definately learn about it. To be honest, for me it is
one reason why refactoring in a large legacy code can be so much fun!
..
Local Variables:
mode: rst
fill-column: 79
End:
vim: et syn=rst tw=79
Trackbacks
==========
Comments
========