Url Shortener

Feb 23, 2014 | Reading time: 5 min(s)

This article is a 10-minute guide to building your own url shortener. A url
shortener is a service that lets you shorten a long url ex.
http://partechsystems.com/blog/url-shortener to
https://goo.gl/ec1oLT. Some url shorteners that you may be familiar with are

A shortened url has two parts to it, the domain name (ex. goo.gl) and the url
string (ex. ec1oLT). In this article we will concentrate on building the second
part of the url. Then, we will build a working url shortener in Node.js.

Some mathematics: Set theory

The task is to create an injective, non-surjective function from the domain of
full urls to shortened urls. An additional constraint is to minimize the length
of the shortened url. As shown in the figure, this class of functions has the
properties,

An element in the domain (bubble on the left) maps to only one element in the
co-domain (bubble on the right).

There may be elements in the co-domain that do not map to any elements in the
domain.

Note: The domain is the list of urls that we have shortened till date.
This is NOT the list of ALL the urls on the internet. The elements in this
domain will grow and more and more urls are shortened using our service.

The challenge is coming up with a scheme to ensure that this property is
maintained. There are a few hashing algorithms which ensure a one-to-one and
onto property – among them are the MD5 and SHA series. Do note that MD5 and
SHA-1 have a very tiny probability of collisions.

Which means in addition to the alphanumeric characters, we have 33 characters
(safe, extra, national, punctuation, reserved) to work with, making it a total
of 95 characters. If we use all of these characters in a url, a 5 character
url string will be able to represent 95**5 or over 7.7 billion unique urls. We
will be using the alphanumeric characters only in this tutorial as it is easier
for most people to work with these characters. Since there are 62 alphanumeric
characters, we can shorten 916 million urls with a 5 character string (62 ** 5).

We can spend time coming up with an awesome function. Or, we can use incremental
values.

Using incremental values

We are going to use a global auto-increment number to reduce the length of the
generated string. This means, the first url shortened will have the “short” url
as 00001, the second, 00002 and so on.