As a Data Engineer at Drivy, one of my main challenge has been to import data from various datasources into our data warehouse. Working with various datasources is often very hard, because they are inherently different in terms of connection method, freshness level, trust, maturity and stability.

I’ve been talking on our company blog about the need for data quality checkers: a tool which checks and enforces a high level of quality and consistency for data. If you are interested about data quality, data warehousing, testing and alerting, this should be an interesting blog post.

One day, I came accross the Go Concurrency Patterns talk made by Rob Pike (one of the creators of Golang) and I found it fascinating. After this talk, I wanted to explore a bit more the concept of the Google Search code given at the end of the talk.

The goal is to find a behaviour that could be used by a search engine to handle a search query. We have got 3 services (web, images and videos – no ads ahah!) and we want to perform a search on each service according to the query. The goal is to respond as fast as possible.

Architecture

We have got multiple instances of each service. We are going to send the search query in parallel to available instances of web servers, images servers and videos servers. For each server we will take the first returned search result, to meet our goal to respond as fast as possible.

Hyperparameters

We will assume that each server answers a query in a time that follows a normal distribution (the mean is explicit given and is referred to as latency, the standard derivation is inferred from the latency). A search has also a timeout which represents the number of milliseconds we are willing to wait to have search results before exiting (it is possible that search results from all the services have not yet arrived). This is referred to as the timeout parameter.

Finally, we can control how many instances of each service we have available. This is referred to as the replicas parameter.

Execution samples

To test how the variation of the different parameters influence the number of results and when they are returned, you can find below some executions and their results:

Nothing unexpected in these results, this can all be verified by computing probabilities on multiple independent normal laws.

Ameliorations

The existing code is super simple and is definitely not ready for a real life scenario. We could for instance, improve the following points:

I assume that all replicas are always available. The notion of an available replica is hard to define. We don’t want to send requests to replicas that are not healthy, down or are already overwhelmed

I assume that the number of replicas is the same for each service

I assume that the response time of every replica follows a normal law, and is query independent

And countless other things I didn’t think of in a 2-minute window.

Code

Putting aside all the ameliorations I just listed, I find the existing code still interesting because it shows how to use advanced concurrency patterns in Go. The code is available on GitHub, and the main logic resides in the file core/core.go.

What is Updown?

Over the weekend, I’ve been working on creating a Go client for updown.io. Updown lets you monitor websites and online services for an affordable price. Checks can be performed for HTTP, HTTPS, ICMP and a custom TCP connection down to every 30s, from 4 locations around the globe. They also offer status pages, like the one I use for Teen Quotes. I find the design of the application and status pages really slick. For all these reasons, I use Updown for personal and freelance projects.

A Go REST client

I think that it’s the first time I wrote a REST API client in Go, and I feel pretty happy. My inspiration for the package came from the Godo package, the Go library for DigitalOcean. It helped me start and structure my files, structures and functions.

The source code is available on GitHub, under the MIT license. Here is a small glance at what you can do with it.

Enjoying working with Go again

I particularly enjoyed working with Go again, after a few months without touching it. I really like the integration with Sublime Text, the fast compilation, static typing, the golint (linter for Go code, that even takes into account variable names and comments) and go fmt (automatic code formatting) commands. I knew and I experienced once again that developing with Go is fast and enjoyable. You rapidly end up with a code that is nice to read, tested and documented.

Feedback

As always, feedback, pull-requests, or kudos are welcomed! I did not achieved 100% coverage as I was quite lazy and opted for integration tests, meaning tests actually hit the real Updown API when they are performed.

Github does not let you use the same SSH key as a deploy key for several projects. Knowing this, you’ve got 2 choices: edit the configuration of your 1st project and say that this SSH key is not longer a deploy key or find another solution.

Deleting the deploy key of the existing project

To know what is the project associated with your deploy key, you can run the command ssh -T -ai ~/.ssh/id_rsa [email protected] (adjust the path to your SSH key if necessary). Github will then great you with something like:

This code maps a fake Github’s subdomain to the root domain and say that when connecting to the fake subdomain, we should automatically use the previously created SSH key.

Add the newly created SSH public key as a deploy key to the repository of your choice

Clone your Git repository with the fake subdomain: instead of using the URL given by GitHub (git clone [email protected]:vendor/foo-project.git) you will use git clone [email protected]_foo-project.github.com:vendor/foo-project.git

From now on, running git pull will connect to GitHub with the appropriate SSH key and GitHub will not complain 🙂

If you’ve already cloned the Git repository before, you can always change the remote URL to the Git server by editing the file .git/config of your project.

Mentor what?

For the last 3 months, I have been a mentor for a few students on OpenClassrooms. OpenClassrooms is a French MOOC platform, visited by 2.5M people each month and they currently offer more than 1000 courses. They focus on technology courses for now: web development, mobile development, networking, databases for example. A course can be composed of textual explanations, videos, quizzes, practical sessions…

Courses are free, but you can pay a monthly fee to become a “Premium Plus” student, and thanks to this you will have a weekly 45 minutes / 1 hour session with someone experienced (student, professional, teacher…) to help you achieve your goals: getting certifications, finding an internship or starting your career in web development for instance. As a mentor, your primary goal is not to teach a course. Instead, you’re here as a support for students: you can help them understand a difficult part of a course, give them additional exercises, share with them valuable resources, look at their code and do a basic code review.

Mathieu Nebra (co-founder of OpenClassrooms) in a mentoring session

About “my students”

As an engineering student in a well recognised school in France, I’m used to be surrounded by lucky people: they are intelligent, they have good grades and one day they will get an engineering degree. This means that they will have a job nearly no matter what, and a well payed one. At OpenClassrooms, this is very different: a fair amount of students have had difficulties (left school early, were not interested in their first years at university, did some small jobs here and there to pay the rent…) and now they are working hard to improve their life. Web development is a fantastic opportunity: you can learn it from home, you only need a computer (and a cheap one is perfectly okay) and you can find a lot of learning resources for free on the Internet. The job market is not too crowded, and there is a good chance that you can find a job in a local web agency if you know HTML5, CSS3, a PHP framework and some basic jQuery. No need to work long hours, to wake up during the night, to fight to find a part time job to pay your rent; you can make a living by typing text in a text editor.

It has been a very valuable experience for me to listen to people that had bad times, had troubles in their life and now are dedicated to get better, to learn stuff and they just need advices to achieve what they want.

I am a mentor, but I learn

I’m helping my students mostly around web technologies. And this means that I’m supposed to know a lot of stuff about HTML5 (canvas, you know it?), CSS3 (flexbox anyone?), naked PHP (good ol’ PDO API) and JavaScript. Clearly, this is not the case. I don’t even do web development on a monthly basis. At first, I was a bit worried: am I going to be able to remember how I did it, a few years ago? How can you do this feature without a framework? Can I still read a mix of HTML / CSS / PHP, all in the same file? I was surprised, but the answer was yes, and it was very interesting to witness how my brain can actually remember things I did years ago, and how fast I can retrieve this information (just by thinking or by doing the right Google query).

I was also surprised by how broad my role is. Sure, students have some difficulties understanding every aspect of oriented object principles, and I have to go over some concepts multiple times, but who doesn’t? What they really need is not a simple technical advisor. They need to hear from someone experienced that it is perfectly fine to not understand OOP in just 2 weeks, that it is fine to forget method names or to mix up language syntaxes when you write for the first time HTML, CSS, JavaScript and PHP during the same day.

They need to hear from someone that they are doing great, and to remember what they have learned during the last month or so. I found that it helps them a lot to keep a simple schedule somewhere: “for next week, I want to have done these sections from this course, and I need to start looking at this also”. When you look back, they are happy to see that indeed they have finished and done successfully quizzes / activities for multiple courses recently. It is a tremendous achievement for students to know that they have learned something, that they are actually getting somewhere and that their knowledge is growing.

What next?

So far, it has been an incredible experience and I think I have learned a lot, and I do hope that students have learned valuable things thanks to me. I am feeling good because I see that I can help people, I can give back to the community and I can share my passion with people that are interested and deeply motivated.

Today, I ran into an issue. I wanted to test that a function logged a fatal error when something bad happened. The problem with a fatal log message is that it calls os.Exit(1) after logging the message. As a result, if you try to test this by calling your function with the required arguments to make it fail, your test suite is just going to exit.

Well, this is not so easy as explained before. It turns out that the solution is to start a subprocess to test that the function crashes. The subprocess will exit, but not the main test suite. This is explained in a talk about testing techniques given in 2014 by Andrew Gerrand. If you want to check that the fatal message is something specific, you can inspect the standard error by using the os.exec package. Finally, the code to test the crashing part of the previous function would be the following:

Recently, I was working on package that was doing network requests inside goroutines and I encountered an issue: the program was really fast to finish, but the results were awful. This was because the number of goroutines running at the same time was too high. As a result, the network was congested, too many sockets were opened on my laptop and the final performance was degraded: requests were slow or failing.

In order to keep the network healthy while maintaining some concurrency, I wanted to limit the number of goroutines making requests at the same time. Here is a sample main file to illustrate how you can control the maximum number of goroutines that are allowed to run concurrently.

package main
import (
"flag"
"fmt"
"time"
)
// Fake a long and difficult work.
func DoWork() {
time.Sleep(500 * time.Millisecond)
}
func main() {
maxNbConcurrentGoroutines := flag.Int("maxNbConcurrentGoroutines", 5, "the number of goroutines that are allowed to run concurrently")
nbJobs := flag.Int("nbJobs", 100, "the number of jobs that we need to do")
flag.Parse()
// Dummy channel to coordinate the number of concurrent goroutines.
// This channel should be buffered otherwise we will be immediately blocked
// when trying to fill it.
concurrentGoroutines := make(chan struct{}, *maxNbConcurrentGoroutines)
// Fill the dummy channel with maxNbConcurrentGoroutines empty struct.
for i := 0; i < *maxNbConcurrentGoroutines; i++ {
concurrentGoroutines

As a student, I am quite often looking at companies to see what they are doing, to understand the market and discover trends. As an engineering student, I am on the lookout for technical content, written by engineers. I discovered recently that I value a lot openness for engineering teams. Being open can be done in different ways:

Having a technical blog. You can understand this in multiple ways. First, you can have a blog where you talk about new features, new releases of your API / SDK. This one is quite common. The second one is really rare and very valuable to me: you talk about your engineering process, your hiring process, you share reports of outages. If you have open source projects, you have a blog post to let the technical community know about it.

Involvement in communities. You can be involved in communities in multiple ways: regularly sending members of your team to local meetups (not just attending if you can. Presenting and volunteering are awesome), being visible in conferences, giving explicit credit to open source solutions you are using (or giving money to them if you can afford to), host hackathons or hack days at your office. Be explicit about causes you care about and defend them.

Open source. Whether you contribute to open source projects or you open source some of your projects, involvement in the community is a great way to gain some exposure, let people know which technologies you are using and giving back to the community.

An update to the Joel Test?

Maybe some of these points will be in an updated “Joel Test” in the future, even if some people already say that it is partially antiquated. Personally, I would add the following questions to an updated version of the Joel Test:

Do you support developer education by attending conferences, purchasing books (or something equivalent)?

Do you have a simple, documented process to adopt new tools your team uses?

Do you have an engineering blog where you talk about your processes, ideas, beliefs and failures?

You can’t have it all

Being able to answer “Yes” to every questions above seems fairly difficult, and really impossible for small engineering teams. If your company is 1 year old and you are 2 engineers, you cannot put all these things in place. But as they say, “practice makes perfect”, so try to keep these goals in mind. Giving an awesome work environment to your engineers will make them productive, happy to work and so much more! Great engineering teams attract great engineers.

Following my latest post about a Go package to validate UK bank account numbers, I wanted to offer a public API to let people check if a UK bank account number is valid or not. I know that offering a Go package is not ideal for everyone because for the moment Go is not everywhere in the tech ecosystem, and it’s always convenient to have an API you can send requests to, especially in a frontend context. My goal was to offer a JSON API, supporting authentication thanks to a HTTP header and with rate limits. With this, in the future you could adapt rate limits to some API keys, if you want to allow a larger amount of requests for some clients.

Packages I used

I wanted to give cloudflare/service a go because it lets you build quickly JSON APIs with some default endpoints for heartbeat, version information, statistics and monitoring. I used etcinit/speedbump to offer the rate limiting functionality and it was very easy to use. Note that the rate limiting functionality requires a Redis server to store request counts. Finally, I used the famous codegangsta/negroni to create middlewares to handle API authentication and rate limits and keeping my only controller relatively clean.

Deploying behind Nginx

My constraints were the following:

The API should only be accessible via HTTPS and HTTP should redirect to HTTPS.

The Golang server should run on a port > 1024 and the firewall will block access to everything but ports 22, 80 and 443

The only endpoints that should be exposed to the public are /verify, /version and /heartbeat. Statistics and monitoring should be accessible by administrators on localhost through HTTP

I ended up with this Nginx virtual host to suit my needs, I’m not sure if it can be simpler:

With this, I can still access the reserved endpoints by opening a SSH tunnel first with ssh -L4242:127.0.0.1:80 [email protected] and going to http://localhost.antoine-augusti.fr:4242/stats after.

Note that the Golang server is running on port 8080 and it should be monitored by Supervisor or whatever you want to use.

Grabbing the code and a working example

First of all, the API is available on GitHub under the MIT license so that you can deploy and adapt it yourself. If you want to test it first, you can use the API key foo against the base domain https://modulus.antoine-augusti.fr. Here is a cURL call for the sake of the example:

As I was reading through the SEPA specification, I found that it was not that simple to check if a UK bank account number was valid or not. If you’re not familiar with UK banks, they don’t use IBAN to transfer money within the UK, but a combination of a sort code and an account number. A sort code identifies the bank’s branch and each account has got an account number. A sort code is a 6 digits number and an account number can be between 6 and 11 digits, but most of them are 8 digits long.

For example, here is a valid UK bank account:

Sort code: 107999

Account number: 88837491

Algorithms to check if a UK bank account is valid

A very common way to check if a number (bank account, credit card, parking ticket…) is valid, is to apply a modulus algorithm. You perform an operation on each digit (addition, multiplication by a weight, substitution…), when you reach the end you divide by a specific number and you check that the remainder of the division is equal to something. Seems easy, right? Well, this is not that simple for UK bank accounts. In fact, if you want to go through the official specification on the Vocalink website, you will see that they use 2 algorithms, but they have also 15 exceptions to take into account (and some of them are weird or tricky to handle!). You will need to adapt the way you compute the modulus value according to a weight table also.

From the specification to a package

Reading the specification was interesting, but what really motivated me to code a Go package to solve this problem was the fact that test cases where provided in the specification! What a dream: the specification offers you 34 test cases, and they cover nearly all the exceptions. I jumped on the opportunity, it’s not that often that you are offered with a way to check that what you have done is actually right. In fact, I followed a Test Driven Developemnt aproach and it really guided me during the development and especially the refactoring.

Getting the code

The code is available on GitHub under the MIT license and should be well documented and tested. As always, pull requests and bug reports are welcome!