Service Discovery and Configuration Management for Node & Express with ZooKeeper

by Stefan Fidanov

Node is very easy to start with. Especially when you are
using Express. During development you are runnign your
app and your database on the same server, as well as anything else that you need.

In fact, when you start you often have the same setup in production. Everything
is on the same machine. To modify your configuration you just log in to the server
and quickly change whatever you need.

When your app grows, your needs grow too. You begin to scale. You have multiple
front end servers running your application and you also have multiple other servers
running your database and many other services.

It is no more easy. You cannot just log into one server. You have to make the change
everywhere and it should happen quickly.

What you are facing are two very well known problems called configuration
management and service discovery.

Configuration management tells your Node app how it should be configured and also
how all other services, like the database for example, are configured.

Service discovery informs your Express app what kind of services and where they
are available. For example, it can tell you that there are 3 Redis instances, one
available only for writing and two available only for reading.

What is so hard?

Initially, you have all this information in your code. It knows everything it
needs to know. In your code you store what database you have, their passwords,
their ip addresses and anything else that you might need.

Security

Even when everything is running on the same machine there is already a problem
of security. Your git repository is not a place to hold secure stuff. Even when
removed it will still be available in the older commits.

This can be easily solved by using a separate file or environment variables, both
outside of your source control. This works with one or two servers, but when you
have hundred it is not more sufficient.

Each time you have to change something you need to log in everywhere, which is
impossible when you have hundreds or even more servers.

Immediate Configuration Changes

This brings us to the next problem. When you make a change to your configuration
like for example setting a new database password all front end servers should know
about it immediately.

It is just not an option to go and change each and every server manually.

Service Availability & Discovery

Often the bottleneck of a web application is the database. Especially at peak
traffic you might be required to add more database replicas for reading.

How do you let your front end servers know about those new replicas? How do they
let them know when they are no more available?

Again, changing all files manually is super slow and error prone.

Wouldn’t it be nice your front end servers to automatically and immediately be
aware when some configuration changes or when you add another database server?

Fortunately, there are some tools that will help us with that. They are ZooKeeper
and zkfarmer.

ZooKeeper

ZooKeeper is a simple solution that can help you
with all of the problems I mentioned above.

First of all, if you are working with Node & Express, you are probably not a big
fan of Java, but ZooKeeper runs on Java.

Don’t worry it will not mess up your scalability. Not only ZooKeeper is fast, but
in your solution it will never be touched by any user requests and it will contain
very little data. Therefore it will not use too much memory or processing power.

Actually, we will use ZooKeeper to keep many different files in sync.

What is ZooKeeper?

ZooKeeper is a simple database, with a few very simple features.

First, it stores its data like a file tree. The base node is / then you can
have child node /child and even deeper children /child/deep.

Each node can have children, just like a folder, but each node can also contain
data.

However, a node is not intented to have lots of data but it is limited to 1MB,
and usually ZooKeeper nodes have much less than that. This is more than enough
for a simple json structure for example.

Another critical feature is ephemeral nodes. An ephemeral node exist only while
the client that created it is still connected to ZooKeeper. As soon as the client
is disconnected the node ceases to exist. Those nodes cannot have children, but
they can have data.

Also with ZooKeeper it is easy to setup multiple instances which automatically
synchronize between themselves.

These are just some of the great features that ZooKeeper provides and there are
a few more, which you can read about if you are interested at
zookeeper.apache.org.

How do I connect to ZooKeeper?

You can connect directly from Node, just like with any other database. On the
ZooKeeper web site they talk only about Java and C interfaces, but there are
actually for many more languages and platforms, including Node and JavaScript.

Another option is to use the client utility which ships with ZooKeeper.

zkfarmer

However, today we are not going to do any programming against ZooKeeper, instead
we are going to use another great, tool developed by the nice people at
Dailymotion called zkfarmer

It is a python tool, but don’t worry again it will not mess up your scalability
as no traffic will ever know that it exists.

It is a tool which will sync json files and also folders with different nodes on
ZooKeeper, without any programming on your side. Isn’t this nice?

Getting your hands dirty

No more talking, let’s do something. Our example project will have the following
servers:

1 ZooKeeper server

1 Mongo server for writing

2 Mongo servers for reading

2 Front end servers for running your Node & Express app

It might sound a little bit complicated, but actually ZooKeeper really starts to
shine when you are using it with more machines.

We are not going to look how to install Mongo or your node code here. You
can find most of this information in my other articles.

Once you are inside the ZooKeeper folder create a file conf/zoo.cfg with the
following content

tickTime=2000
dataDir=/var/zookeeper-data
clientPort=2181

It instructs where ZooKeeper should listen for incoming connections and where it
should store its data. tickTime is the base time in milliseconds used by
ZooKeeper when performing various operations, don’t worry about it.

Return to the ZooKeeper folder and run the following command

$ bin/zkServer.sh start

Congratulations, ZooKeeper should be up and running!

Let’s connect with it by issueing the following command from the ZooKeeper folder

You just played with ZooKeeper. First, you list all nodes under the base /
node. Then your create a /services node with a string just_small_data as data.
Then you create a child of that node called mongo with a simple {} string.

We’ve just prepared our zkfarmer data for the rest of this article.

Installing zkfarmer

Next we are going to install zkfarmer on the ZooKeeper server and also on each
of the three Mongo servers.

Python

Fortunately, python comes with every Linux installation so installing it on your
server should not be difficult. You will also need pethon setuptools.

Service discovery

The enabled file tells that the server is accepting connections. The write that
it accepts write and the read file tells that this mongo server doesn’t accept
reads, therefore it should be used only for writing.

Did you like this article?

Please share it

Enter your email and get our NPM Cheat Sheet for NodeJS Developers and
the links to our 5 most popular articles which have helped thousands of
developers build faster, more reliable and easier to maintain Node applications.