If you have never been exposed to software system design challenges, you might be totally lost on even where to begin. Dive into this post to find out about what matters when it comes to software architecture and system design and how you can get your grip in this wide area of software engineering.

If you have never been exposed to software software system design challenges, you might be totally lost on even where to begin. I believe in finding the limits to a certain extend first and then start getting your hands dirty. The way you can start this is by finding some interesting product or services (ideally you are a fan of), and learning about their implementations. You will be surprised that how simple they may look, they most probably involve great deal of complexity. Don’t forget: simple is usually complex and that’s OK™.

I believe the biggest suggestion I can give you while approaching to system design challenges is this: not to assume anything! You should pin down the facts and expectations from this system first. Some good questions to ask here are which will help you start this process:

What is the problem you are trying to solve?

What is the the peak volume of users that will interact with your system?

What are the data write and read patterns going to be?

What are the expected failure cases, how do you plan to mitigate them?

What are the availability and consistency expectations?

Do you need to worry about any auditing, regulation aspects?

What type of sensitive data are you going to be storing?

These are just a questions few that have worked for me and the teams that I worked with over the years. Once you have answers to these questions (or any other which are relevant to the context you are in), then you should be starting to dive into the technical side of the problem.

Setting Your Baseline

What do I mean by the baseline here? Well, in this era of software development, most of the problems "can" be solved by already existing techniques and technologies. Knowing these to a certain extend will give you a head start when you are faced with similar problems. Remember, we are writing software to solve business' and our users' problems and the desire is to do that in a most straight-forward and simple way from a user experience point of view. Why do you need to remember this? It could well be your reality that you should solve problems in unique ways as you might be thinking "what's the point of me writing software then if I am here to follow a pattern?". The craft here is in the decision making process to define where to do what. Surely, we may have challenging, unique problems which we can face at certain times. However, if we have our baseline solid, we will surely know whether we should direct our efforts into finding out ways to solve the problems or further understand the depth of it.

I believe I have convinced you at this point now that having a solid knowledge on how some of the exciting systems are architecturally shaped is quite critical for you to progress on having some appreciation on the craft and a solid baseline.

However, before jumping into this, you might want to have some insights on what matters the most in the architectural challenges. This is important because there are A LOT of aspects involved in disambiguating a gnarly, ambiguous problem and solving it within the guidelines of a defined system. Jackson Gabbard, an ex-Facebook employee, has a 50 mins video on system design interviews based on his experience on interviewing hundreds of candidates at Facebook. Even if this is focused on the system design interview objective and what success looks like for that, it's still a very comprehensive resource on what matters the most when it comes to system design. There is also a write-up of this video.

Start Building up Your Data Storage and Retrieval Knowledge

Most of the time, the choice of how you decide to persist and serve data will play a crucial role on the performance of your system. Therefore, you should be able to understand the expectations around data writes and reads about your system first. Then, you should be able to assess these and convert that assessment into a choice. However, you can only do this effectively if you know the existing storage patterns. This essentially means having a good knowledge around database choices.

Databases are really scalable and durable data structures. So, all your knowledge around data structures should be really beneficial around understanding the various database choices. For example, Redis is a data structures server, supporting different kinds of values. It allows you to work with the concept of data strictures such as sets and lists, and provides you to read data through commonly-known algorithms such as LRU in a durable and highly available fashion.

Once you get enough grip around the various data storage patterns, it's now time for you to get into data consistency and availability land. CAP theorem is the first thing you should try to have a good grip of, which you can polish it off by looking deeper into established consistency and availability patterns. These will allow you to have a wide spectrum when it comes to understanding data writes and reads are really very separate concerns and have separate challenges associated to them. By embracing several consistency and availability patterns, you can gain a lot of performance while serving the data to your applications.

Finally around data storage needs, you should also be aware of caching. Should it be both on the client and server? What data will you cache? And why? How will you invalidate the cache? (will it be based on time? If so, how long?). This section of system-design-primer should be a good starting point on this topic.

Communication Patterns

Systems are composed of various components, which can be different processes living inside the same physical node or different machines sitting at the separate parts in your network. Some of these resources might be private within your network but some needs to be accessed publicly by your consumers.

These resources needs to be able to communicate between them and to the outside world. In context of system design, this again introduces another set of unique challenges. Understanding how asynchronous workflows can help you and what are the various communication patterns available such as TCP, UDP, HTTP (which sits on top of TCP), etc. will help you understand the breadth of the problem space and solutions currently available.

When dealing with communication to the outside world, security is always another side-effect that you need to be aware of and actively deal with.

Connection Distribution

I am not sure if this logical grouping makes sense here. I will go with it anyway since it’s the closest term that reflects what I want to cover here.

Systems are formed by gluing multiple components together, and how they communicate with each other often is designed through well-established protocols such as TCP and UDP. However, these protocols are often not enough on their own to cover the needs of today’s systems which can have high load and demands from our consumers. We often need ways to be able to distribute connections in order to handle the high load of our system.

Domain Name System (DNS) sits at the core of this distribution. A DNS translates a domain name such as www.example.com to an IP address. Besides this, some DNS services can route traffic through various methods such as weighted round robin and latency-based to help distribute the load.

Load balancing is very vital and nearly every major system on the Web we interact with today sits behind one or multiple load balancers. Load balancers help us distribute incoming client requests to multiple instances of resources. There both hardware and software forms of load balancers but it’s often that you see software based ones used such as HAProxy and ELB. Reverse proxies are also very smilar to the concept of load balancing with some distinctive differences though. These differences will have an effect on your choice based the needs.

Content Delivery Networks (CDN) are also something which you should be aware of. A CDN is a globally distributed network of proxy servers, serving content from locations closer to the user. CDNs are usually preferred when you are serving static files such as JavaScript, CSS and HTML. It’s also common that you see cloud services offer traffic managers (such as Azure Traffic Manager) which gives you global distribution and reduced latency benefits for your dynamic content. However, these services are mostly beneficial if you have stateless web services.

What About My Business Logic? Structuring Business Logic, Workflows and Components

Thus far, we talked about all the infrastructure related aspects of a system. These are the parts of your system which your users probably have no idea about and to be frank, they don't give a damn about them. What they care about is how they interact with your system, what they can achieve by doing so and how the system acts on behalf of them to make certain decisions and process their data.

As you might guess from this post’s title, I intended this blog post to be about software architecture and system design. Therefore, I wasn’t going to cover the software design patterns which are concerned with how the components are built. However, thinking about this more and more, it’s clear to me that the line between them are very blurred and usually both sides are interconnected. Take Event Sourcing for example. Once you adopt this software architecture pattern, it pretty much effects most parts of your system; how you persist data, what level consistency you choose for your system’s clients to deal with, how you shape the components within your system, so on and so forth. Therefore, I decided to touch on some of the design and architectural patterns related which directly concerns your business logic. Even if it’s going to be just touching the surface, it should be useful for you have some ideas. Here is a few of them:

Collaboration Approaches

It's highly unlikely that you are going to be the only one involved in a project where you need to be part of a system design process. Therefore, you need to be able to collaborate with other folks in your team, both inside and outside of your job function. There is also a breadth and depth of this surface area and as the technical leader, you should be able to address the concerns on each level by going into it with a required depth. The activities here may involve evaluating technology choices together or pinning down the business needs and understanding how the work needs to be parallelised.

First and foremost, you need to have an accurate and shared understanding of what you are trying to achieve as a business goal and what moving parts involved in this aspect. Group modeling techniques such as event storming are powerful methods to accelerate this process and increases your changes of success. You may get into this process before or after you define your service boundaries, deepening on your product/service maturity stage. Based on the level of alignment you see here, you may want to facilitate a separate activity to define the Ubiquitous Language for the bounded context you are operating on. When it comes to communicating the architecture of your system, you may find the C4 model for software architecture from Simon Brown useful, especially when it comes to understanding what level of depth you should go into while visualising what you are trying to convey.

There are most probably other mature techniques available in this space. However, all will tie back to your domain understanding and your experience and knowledge around Domain-driven Design will prove to be handy.

Some Other Resources

Here are some resources which may help you. These are not in any particular oder.

Long time ago (about 5 years, at least), I contributed an article to SignalR wiki about scaling SignalR with Redis. You can still find the article here. I also blogged about it here. However, over time, pictures got lost there. I got a few requests from my readers to refresh those images and I was luckily able to find them :) I decided to publish that article here so that I would have a much better control over the content.

Long time ago (about 5 years, at least), I contributed an article to SignalR wiki about scaling a SignalR application with Redis. You can still find the article here. I also blogged about it here. However, over time, pictures got lost there. I got a few requests from my readers to refresh those images and I was lucky enough to be able to find them :) I decided to publish that article here so that I would have a much better control over the content. So, here is the post :)

Please keep in mind that this is a really old post and lots of things have evolved since then. However, I do believe the concepts still resonate and it’s valuable to show the ways of how to achieve this within a cloud provider’s context.

SignalR with Redis Running on a Windows Azure Virtual Machine

This wiki article will walk your through on how you can run your SignalR application in multiple machines with Redis as your backplane using Windows Azure Virtual Machines for scale out scenarios.

Creating the Windows Azure Virtual Machines

First of all, we will spin up our virtual machines. What we want here is to have two Windows Server 2008 R2 virtual machines for our SignalR application and we will name them as Web1-08R2 and Web2-08R2. We will have the IIS installed on both of these servers and at the end, we will load balance the request on port 80.

Our third virtual machine will be another Windows Server 2008 R2 only for our Redis server. We will call this server Redis-08R2.

Creating a virtual machine running Windows Server 2008 R2 is explained here in details. We followed the same steps to create our first VM named Web1-08R2.

The second VM we will be creating has a slightly different approach than the first one. Under the hood, every virtual machine is a cloud service instance and we want to put our second VM (Web2-08R2) under the same cloud service that our first web VM is running under. To do that, we need to follow the same steps as explained inside the previously mentioned article but when we come to 3rd step in the creation wizard, we should chose Connect to existing Virtual Machine option this time and we should choose our first VM we have just created.

As the last step, we now need to create our redis VM which will be named Redis-08R2. We will follow the same steps as we did when we were creating our second web VM (Web2-08R2).

Setting Up Redis as a Windows Service

After you build the project, you will have all the files you need under msvs\bin\release path as zip files. redisbin.zip file will contain the redis server, redis command line interface and some other stuff. rediswatcherbin.zip file will contain the msi file to install redis as a windows service. You can just copy those zip files to your Redis VM and extract redisbin.zip under c:\redis\bin. Then follow the steps:

Copy this redis.conf file and put it under c:\redis\bin directory. Open it up and add a password by adding the following line of code:

requirepass 1234567

Take this note into considiration when you are setting up your redis password:

Warning: since Redis is pretty fast an outside user can try up to 150k passwords per second against a good box. This means that you should use a very strong password otherwise it will be very easy to break.

Then, extract the rediswatcherbin.zip somewhere and run the InstallWatcher.msito install the service.

Navigate to C:\Program Files (x86)\RedisWatcher directory. You will see a file named watcher.conf inside this directory. Open this file up and replace the entire file with the following text. Only difference here is that we are supplying the redis.conf file directory for the server to use:

Create a folder named inst1 under c:\redis because we have specified this folder as working directory for our redis instance.

When you do a search against windows services in PowerShell, you will see RedisWatcherSvc service is installed.

Run the following PowerShell command to start the service for the first time.

(Get-Service -Name RedisWatcherSvc).Start()

Now we have a Redis server running on our VM. To test if it is actually running, open up a windows command window under c:\redis\bin and run the following command (assuming you set your password 1234567):

redis-cli -h localhost -p 6379 -a 1234567

Now, you have a redis client running.

Ping the redis to see if you are really authenticated:

Now, we are nearly set. As a last step in our redis server, we need to open up TCP port 6379 for external communication. You can do this under Windows Firewall with Advanced Security window as explained here.

Communicating Through Internal Endpoints Between Windows Azure Virtual Machines Under Same Cloud Service

When you are inside one of your web VMs, you can simply look up the redis VM by hostname.

The hostname will resolve to DIP (Dynamic IP Address) which Windows Azure will use internally. We can configure public endpoints through Windows Azure Management Portal easily but in that case, we would be opening redis to the whole world. Also, if we communicate to our redis server through VIP (Virtual IP Address), we would always go through the load balancer which has its own additional cost.

So, we can easily connect to our redis server from any other connected VM by hostname.

The SignalR Application with Redis

Our SignalR application will not be that much different from a normal SignalR application thanks to SignalR.Redis project. All you need to do is to add the SignalR.Redis nuget package into your application and configure SignalR to use Redis as the message bus inside the Application_Start method in Global.asax.cs file:

I put the application under IIS on our both web servers (Web1-08R2 and Web2-08R2) and configured them to run under .NET Framework 4.0 integrated application pool.

For this demo, I am using the Redis.Sample chat application included inside the SignalR.Redis project.

Let's test them quickly before going public. I fired the both web applications inside the servers and here is the result:

Perfectly running! Let's open them up to the world.

Opening up the Port 80 and Load Balancing the Requets

Our requirement here is to make our application reachable over HTTP and at the same time, we want to load balance the request between our two web servers.

To do that, we need to go to Windows Azure Management portal and set up the TCP endpoints for port 80.

First, we navigate to dashboard of our Web1-08R2 VM and hit Endpoints from the dashboard menu:

From there, hit the End Endpoint icon at the bottom of the page:

A wizard is going to appear on the screen:

Click the right-arrow icon and go to next step which is the last one and we will enter the port details there:

After that, our endpoint will be created:

Follow the same steps of Web2-08R2 VM as well and open the Add Endpoint wizard. This time, we will be able to select Load-balance traffic on an existing port. Chose the previously created port and continue:

At the last step, enter the proper details and hit save:

We will see our new endpoint is being crated but this time Load Balanced column indicates Yes.

As we configured our web applications without a host name and they are exposed through port 80, we can directly run reach our application through the URL or Public Virtual IP Address (VIP) which is provided to us. When we run our application, we should see it running as below:

No matter which server it goes, the message will be broadcasted to every client because we will be using Redis as a message bus.

References

A while ago, I have written up on Graphs and gave a few examples about their application for real world problems. In this post, I want to talk about one of the most common graph algorithms, Depth-first search (DFS).

A while ago, I have written up on Graphs and gave a few examples about their application for real world problems. I absolutely love graphs as they are so powerful to model the data for several key computer science problems. In this post, I want to talk about one of the most common graph algorithms, Depth-first search (DFS) and how and where it could be useful.

What is Depth-First Search (DFS)?

DFS is a specific algorithm for traversing and searching a graph data structure. Depending on the type of graph, the algorithm might differ. However, the idea is actually quite simple for a Directed Acyclic Graph (DAG):

You start with a source vertex (let's call it "S")

You visit the first neighbour vertex of that node (let's call this "N")

You do the same for "N" and you keep going till you end up at a leaf vertex (L) (which is a vertex that has no edges to another vertex)

Then you visit the second neighbour of L's parent vertex.

You would be once you exhaust all the vertices.

I must admit that this is a bit simplified version of the algorithm even for a DAG. For instance, we didn't touch on the fact that we might end up actually visiting the same vertex multiple times if we don't take this into account in our algorithm. There is a really good visualization of this algorithm here where you can observe how the algorithm works in a visual way through a logical graph representation.

This is also a good resource which lists out different real world applications of DFS.

Other Graph Traversal Algorithms

As you might guess, DFS is not the only known algorithm in order to traverse a graph data structure. Breadth-First Search (BFS) is a another most known graph traversal algorithm which has the similar semantics to DFS but instead of going in depth on a vertex, it prefers visit the all the neighbors of the current vertex. Bidirectional search is another one of the traversal algorithms which is mainly used to find a shortest path from an initial vertex to a goal vertex in a directed graph.

Easily setting up realistic non-production (e.g. dev, test, QA, etc.) environments is really critical in order to reduce the feedback loop. In this blog post, I want to talk about how you can achieve this if your application relies on MongoDB Replica Set by showing you how to set it up with Docker for non-production environments.

Easily setting up realistic non-production (e.g. dev, test, QA, etc.) environments is really critical in order to reduce the feedback loop. In this blog post, I want to talk about how you can achieve this if your application relies on MongoDB Replica Set by showing you how to set it up with Docker for non-production environments.

Hold on! I want to watch, not read!

I got you covered there! I have also recorded a ~5m covering the content of this blog post, where I also walks you through the steps visually. If you find this option useful, let me know through the comments below and I can aim harder to repeat that :)

What are we trying to do here and why?

If you have an application which works against a MongoDB database, it’s very common to have a replica set in production. This approach ensures the high availability of the data, especially for read scenarios. However, applications mostly end up working against a single MongoDB instance, because setting up a Replica Set in isolation is a tedious process. As mentioned at the beginning of the post, we want to reflect the production environment to the process of developing or testing the software applications as much as possible. The reason for that is to catch unexpected behaviour which may only occur under a production environment. This approach is valuable because it would allow us to reduce the feedback loop on those exceptional cases.

Docker makes this all easy!

This is where Docker enters into the picture! Docker is containerization technology and it allows us to have repeatable process to provision environments in a declarative way. It also gives us a try and tear down model where we can experiment and easily start again from the initial state. Docker can also help us with easily setting up a MongoDB Replica Set. Within our Docker Host, we can create Docker Network which would give us the isolated DNS resolution across containers. Then we can start creating the MongoDB docker containers. They would initially be unaware of each other. However, we can initialise the replication by connecting to one of the containers and running the replica set initialisation command. Finally, we can deploy our application container under the same docker network.

There are a handful of advantages to setting up this with Docker and I want to specifically touch on some of them:

It can be automated easily. This is especially crucial for test environments which are provisioned on demand.

It’s repeatable! The declarative nature of the Dockerfile makes it possible to end up with the same environment setup even if you run the scripts months later after your initial setup.

Familiarity! Docker is a widely known and used tool for lots of other purposes and familiarity to the tool is high. Of course, this may depend on your development environment

Let’s make it work!

First of all, I need to create a docker network. I can achieve this by running the "docker network create” command and giving it a unique name.

docker network create my-mongo-cluster

The next step is to create the MongoDB docker containers and start them. I can use “docker run” command for this. Also, MongoDB has an official image on Docker Hub. So, I can reuse that to simplify the acqusition of MongoDB. For convenience, I will name the container with a number suffix. The container also needs to be tied to the network we have previously created. Finally, I need to specify the name of the replica set for each container.

You will notice that the server I am connected to will be elected as the primary in the replica set shortly. By running “rs.status()”, I can view the status of other MongoDB servers within the replica set. We can see that there are two secondaries and one primary in the replica set.

.NET Core Application

As a scenario, I want to run my .NET Core application which writes data to a MongoDB database and start reading it in a loop. This application will be connecting to the MongoDB replica set which we have just created. This is a standard .NET Core console application which you can create by running the following script:

Notice that I have two interesting dependencies there. Polly is used to retry the read calls to MongoDB based on defined policies. This bit is interesting as I would expect the MongoDB client to handle that for read calls. However, it might be also a good way of explicitly stating which calls can be retried inside your application. Bogus, on the other hand, is just here to be able to create fake names to make the application a bit more realistic :)

This is not the most beautiful and optimized code ever but should demonstrate what we are trying to achieve by having a replica set. It's actually the GetRandom method on the MongoDB collection object which handles the retry:

When it starts, we can see that it will output the result to the console:

Prove that It Works!

In order to demonstrate the effect of the replica set, I want to take down the primary node. First of all, we need to have look at the output of rs.status command we have previously ran in order to identify the primary node. We can see that it’s node1!

Secondly, we need to get the container id for that node.

Finally, we can kill the container by running the “docker stop command”. Once the container is stopped, you will notice that application will gracefully recover and continue reading the data.

I'm quite happy to tell you that I'll be speaking at SQL in the City 2017 on the 13th of December about Latest SQL Compare features and support for SQL Server 2017 with my colleague and fellow MVP, Steve Jones.

I'm quite happy to tell you that I'll be speaking at SQL in the City 2017 on the 13th of December about latest SQL Compare features and support for SQL Server 2017 with my colleague and fellow MVP, Steve Jones.

SQL in the City Redgate's annual virtual event and this year's livestream event focuses on enabling you to be more productive. Technical sessions will dive into the latest Microsoft SQL Server releases, and cover topical issues such as data compliance, protection & privacy.