Learnings, Insights and Notes from Microxchg 2016

I was really happy when Microxchg 2016 conference was announced. Last year
the conference was great, I learned a lot, talked to nice people and had really
great coffee. So, let’s go to Berlin again!

Today, back in Frankfurt, I can say that the coffee was again great ;-). But not
only the coffee, also the food, the venue and .. basically everything except
the WIFI was great. Thanks to all involved in organizing this conference and
all the speakers! Hope to see all of you next year!

So, next are my notes, key learnings and insights.

Key learnings and observations

When people are talking about microservices they probably mean different things.

For some microservices is just dividing a system into services. It doesn’t have
to be many services, maybe 5 or 10, and they might be big. Meaning more than
1000 or even more than 10000 lines of code.

Others are talking about fine grained services. These are most likely
small in size, about 10 to 1000 lines of code, and they usually have hundreds
or even thousands of different services and even more service instances running
in data centers all over the world. That’s the scale at which Netflix, Amazon
and that kind of companies operate.

I think this is important to notice because challenges are different in both
scenarios. Imagine all the communication between service is encrypted. Managing
SSL certificates for thousands of service surely is a problem you need to
solve. With only 5 services it is not such a big problem and you probably don’t
even need tooling.

It seems to be common sense that you end up with a big ball of mud when one
or more teams work on a monolith.

Stories are always the same and something along the lines of this: We started with
$your_favorite_framework, most likely rails, and then we had feature teams
adding one feature after another. Working on code across the whole architecture
without ever really owning something. And then after some months or years developers
start complaining because there is too much code and too complex code. There
are dependencies all across the codebase which make refactorings an adventure.
Deployment doesn’t feel good because they are not confident everything works.
Onboarding new developers takes ages. In the end it slows them down.

Nobody solved all the problem you might encounter in a microservices architecture
yet.

This is also a common theme. Even the big players are still working on
monitoring solutions for example, because collecting metrics and making sense of
them at large scale isn’t that easy. This is only one example. Also security is
a weak spot. How do you manage SSL certificates for thousands of servers?

These, among many other, questions are still to be answered and yet some
people are doing microservices and they’re successfully doing it.

What I learned: Start with coarse grained and high level services, learn and
split them up into fine grained service later if needed. You don’t need to
solve all problems upfront.

The talk was mainly about the basics you have to know when you do microservices,
their context, why they decided to do microservices, some challenges and advice.

Susanne said that her team wanted to split because working with the monolith
felt clumsy and slow. Everything took too long, meetings, decisions and they
had no well defined responsibility and thus it often took some time until
someone picked up a bug and fixed it. So they decided to divide their product
into 5 apps and also their one big team into three.

Other drivers were that they wanted different technology stacks and also ease
of deployment. The latter is especially important for them in order to get early
feedback from customers so they might decide to pivot.

What I learned: You don’t get the consistency guarantees you think you get. Your
database most likely doesn’t support the isolation level you are assuming.
There is more than ACID and BASE. Find out which consistency requirements you
really need.

Uwe made the point that ACID doesn’t mean serializability. But most developers
think that each ACID transaction is perfectly isolated from each other. That’s
not true except you choose isolation level serializable, which most database
don’t support and you probably don’t want to do even if they would. With
all other isolation levels, anomalies happen in production. This is when
something weird happens which shouldn’t happen and can’t be reproduced.
Something like two concurrent transaction update X and Y of one dataset and
each update in isolation is valid but changing both together would violate a
constraint.

Also very interesting that he said that consistency is something you can no
longer push into the infrastructure. We made our software resilient by
addressing this concern at the application level, with circuit breakers for
example. Now we probably need to do the same with consistency. Don’t just rely
on your datastore to handle it.

What I learned: We as developers need a more rational approach to making
technical decisions. The fact that something is a new technology shouldn’t be
the main driver when choosing technology. Microservices are complex, but there
are tools to tackle this complexity. A monolith is a dead end. There is no
technology which can help you dealing with a big ball of mud.

He proposed patterns and pattern languages in particular to make more
rational decisions instead of emotional decisions. Do you know the Gartner
Hype Cycle?
It basically describes that each new technology first looks good and everyone
is excited about it. But then after a while people are getting frustrated because
it turns out that the new technology introduced other problems. This is when
people usually learn about the drawbacks and understand the trade-offs.

The nice thing about patterns is, that they usually not only describe a
solution. First a pattern describes a context, which is the situation we are in.
And then it states a problem it tries to solve and the constraints that apply.
And it is important to have this information in order to make informed and
rational decision, if the presented solution fits into our context or not. Also
a pattern describes what the author ended up with and how it relates to other
patterns.

What I learned: Even if you aren’t a security specialist you can do a good job in
securing your software. As you don’t have to have a medical doctor title or be
in biology to know that washing hands is a good idea.

Sam explained the concept of an attack tree and an approach of modelling
security threats. Imagine you have a safe. In this case a threat is that the
safe could be opened. And surprisingly there are many ways to open a safe.
Attackers could pick the lock or learn the combination for example. Once these
are identified dig deeper and ask how could they learn the combination? Maybe
because someone wrote it down and an attacker found it in the trash? And once
you have this decide how to mitigate. You could buy a safe with a lock that
cannot be picked, or at least the equipment would be more expensive than what is
in the safe.

Apart from that he also described four things you should consider when thinking
about security. First is prevention. In order to prevent attacks, keep the
attack surface small. Use authentication in order to prevent attackers from
doing bad things and also encrypt network traffic. Especially in cloud
environments you can’t trust the network. Detection is about how we know that
we are affected. Advice here is to subscribe to newsgroups and so on to keep up
to date with vulnerabilities which affect your tech stack and infrastructure.
This can be tricky with microservices because you may have more than one tech
stack and thus there is more to monitor. Response is the next thing you should
think about. What happens if there was a security breach and how do we handle
this. Sam suggests to think about this before it happens in order to have a plan
in case it happens. And finally recovery. This is about going back to normal
operations. Like burn all you VM images and services, bake new images and put
them in production in order to be sure that attackers didn’t left backdoors or
other surprises.

What I learned: They build a service registry which lists all services as well
as developers which were working on them to find a person to talk about the
service quickly. Have private APIs for internal clients and a public general
purpose API for external clients.

Bora talked a lot about Soundcloud’s journey towards a microservices
architecture. They started with a monolith, which they now call the mothership,
and then switched to microservices. This monolith provided one public API which
internal clients, like the web frontend, iOS and Android app, as well as
external clients were talking to. It quickly turned out that their internal
clients have very specific requirements which shouldn’t be part of a general
purpose API and thus they decided to move towards multiple APIs. One public and
many private APIs, one for each client. This is what is know as a Backend for
Frontend. (btw Phil Calcado described this
journey
in great detail some time ago)

When they started with implementing the first microservice many developers were
sceptical if this could work out. After one year they started loving it.

Now they are facing other challenges, like how many services do we have and what
exactly are those doing? Who can I talk to if I have to change a service? Who is
the owner? And Bora showed us their Service Registry which is a tool which
allows to search for services and then shows who last contributed to it, who
handled the last incident and who worked on the latest story the service was
involved in. All very useful information to find someone who can answer
questions. Martin Fowler described this as a Humane
Registry

They quickly learned that it makes no sense to let the developers maintain that
information manually on their own, because they often don’t have time to
contribute. Also, “Developers are not famous for documenting things” as Bora
said :-). Thus, they use datasources like github, agile planning tools and
deployment logs to automate this.

Looking at their microservices architecture, they now treat the mothership as a
microservice. They have BFFs for each internal client and below two layers of
microservices. The lowest layer is the foundation-layer where more CRUD-like
services live which are not allowed to communicate to other services except to
their respective datastore. On top of this layer they put the value-added-layer.
This is where the services live which actually implement features. At this layer
services are allowed to communicate with each other and foundation-services. So
those typically load some data from different other services transform it and
return an aggregated view which then is used by the BFF to build a response to
a client request.

What I learned: Be comfortable with not building the perfect system, which is
impossible. Teams who try to build the perfect system are on a death march.
Accept that there will be failures.

And instead of avoiding it, ask what is an acceptable failure rate? Even laser
scanners in a super market have a failure rate and this is acceptable. In
reality things will fail and you can still be in business successfully.

Also Richard is one of the folks from the message driven, event based system
camp. So he explained why this is the way to go and how easy it is to plug in
new services into a messaging infrastructure and so on. What I found
interesting about that approach is that he described the messages as a JSON
document with some metadata, a command and an optional payload. Services then
use pattern matching to find messages which are relevant to them.

He also pointed out that delivery guarantees do not exist. So I guess, again
this has to be handled at the application level.

What I learned: If you can show results like a nice metrics dashboard, chances
are that business people really like it and give you more time to work on it. A
very helpful definition of how big a microservice should be. Also a reminder
to document decisions.

Usually when we talk about the size of a microservice we use lines of code as
metric. Praveena said that services should be small and simple in order to
keep the context of change minimal and thus be able to make changes fast. But
instead of measuring lines of code she suggests using time probably needed to
rewrite a service. It should be at most two weeks. If it takes longer you might
want to split the service.

Documenting decisions and reasons why something is implemented the way it is can
be really important and helpful for new team members in order to understand why
the system is how it is and also for the team to check if assumptions made
earlier are still valid. It enables them to learn quickly and on their own.

You’re reading CodeCraft; an online publication about Technology and
Software Craftmanship by
@VaamoTech.