The best Erlang yet

Today’s Erlang/OTP 17.0 release is ‘the best Erlang yet’ and contains two significant language changes: Maps and Named arguments in funs.

Erlang uses wxWidgets, a cross platform GUI library for it’s GUI tools. This build dependency was hard to get working pre-17, especially for 64-bit Erlang. However, 17.0 brings double rainbows and care bears for everyone that reads this HOWTO. So Enjoy!

Set correct Xcode path for compilation

As far as I know you need have Xcode install to compile Erlang from source. You can download Xcode via the Mac App Store

If you have multiple versions of Xcode installed (beta’s for example), make sure the Command Line Tools are installed and are pointing to the correct Xcode version.

Initiating an install of the Xcode Command Line Tools:

bash
$ xcode-select --install

And verify that the CL-tools point to the correct Xcode install

bash
$ xcode-select -s /Applications/Xcode.app/Contents/Developer

Install wxWidgets

wxWidgets is a Cross Platform GUI library that’s used by Erlang for applications like Observer.

Execute this line and get some coffee, walk the dog, take out the trash and/or play with your kids. Compilation takes a while.

In this HOWTO I’ll show you how to setup a bleeding edge Erlang development VPS and how to run you first Erlang program.

Main ingredient: Cores

Erlang’s main strength is it’s concurrency support. It likes cores, so for our ‘Hello World’ program we obviously need cores. Lot’s! Not 4, not 8, 20!

Create an account on Digital Ocean if you don’t have one yet (love them) and we’re going to boot up their biggest instance. It’s a steal at less than 1 dollar per hour. Just make sure you destroy it when done.

64GB and 20 cores will make our Hello World so snappy!

Pick a datacenter location near you.

Select the latest version of Ubuntu: 13.10 x64.

Create the Droplet.

And ssh to your Droplet with the credentials received from Digital Ocean: ssh root@your_ip_address.

Bleeding Edge Erlang

We’re going to compile Erlang from it’s github repository master branch, At the time of writing it’s a few commits after R17 release candidate 2 which comes with a Hipe LLVM backend, maps and named funs. If that doesn’t make any sense, no worries, just remember it’s the fastest Erlang yet. And fast is good.

Start up Emacs emacs. It will complain that it can’t find projmake-mode. Let’s fix that:

[ESC]-x package-install [Enter] projmake-mode

Exit emacs:

[CTRL]-x [CTRL]-c

Start up Emacs again emacs. Great! We can finally start writing our “Hello World” program. Oh, not, wait. First, we create a projmake file. The file is needed by Projmake-mode, a Flymake inspired mode that compiles your program on every save and shows build errors and warnings inline. Really useful!

[CRTL]-x f projmake [Enter]

Add these line and save the file

cl
(projmake
:name "Hello"
:shell "erlc +native hello.erl")

Ok, now we can really start writing our “Hello World” program and put those 20 cores and 64GB RAM to good use.

Selenium is the industry standard for automated testing of web applications. Together with Webdriver, a ‘remote control’ API for all major browsers, it enables you to create robust integration test for the browser.

Update 11 Feb 2014Dan GudMundsson pointed out that starting with R17 both 32 and 64 bit Erlang will work with wxWidgets. I’ve updated part of this blog post with the instructions found in the official Erlang/OTP installation HOWTO.

Erlang uses wxWidgets, a cross platform GUI library for it’s GUI tools. This build dependency was hard to get working pre-R17, especially for 64-bit Erlang. However, R17 brings double rainbows and care bears for everyone that reads this HOWTO. So Enjoy!

Erlang/OTP is designed for building large,
scalable, soft-realtime systems with high availability. Testing such systems is non-trivial, useful automated testing even more so. That’s why Erlang comes with some advanced testing libraries.

The three most important methods are explained here by a few simple examples:

Unit testing

Quickcheck

Common test

First clone the project from Github using the command:

sh
$ git clone git@github.com:wardbekker/ci_quickstart.git

For compiling and executing the project we use Rebar, a sophisticated build-tool for Erlang projects that follows OTP principles. Steps to build rebar:

sh
$ git clone git://github.com/basho/rebar.git
$ cd rebar
$ ./bootstrap
Recompile: src/getopt
...
Recompile: src/rebar_utils
==> rebar (compile)
Congratulations! You now have a self-contained script called "rebar" in
your current working directory. Place this script anywhere in your path
and you can use rebar to build OTP-compliant apps.

Unit testing with EUnit

Let’s start with the most simple test method; EUnit. It’s Erlang unit testing library. A unit test check if a function returns the expected result for a given input. In the example below the function addition is defined in the module ci_quickstart_math and two assertions are used:

Did all test pass? Excellent! Now the bad news. The actual value of this type of test if quite low. Are we sure the addition function works correct for all possible input? We are now only certain of these cases:

addition(2,2) == 4

addition(1,1) /= 3

And even then, when I change the body of the addition function in obviously something totally wrong:

```erlang
addition(X, Y) –>

4.

```

The tests will still pass!

So, with unit tests our assertions may be correct, but the function body of addition can be a steaming pile of canis faeces.

It’s even worse; As in this case, the arguments of addition are 64-bit small integers, which have a range of -576460752303423489 – 576460752303423488. With two arguments, that is a humongous amount of inputs we should test to be really sure our function works correctly. In the example unit test we only check two? Even adding twenty more cases, the hard worker that you are, effectively have very little value.

Depressed already? On to the good stuff.

QuickCheck

Continuing with the addition example; what we actually want is a test method that generated all possible inputs, and checks the result. Erlang has this, and the method is called QuickCheck. Erlang even has multiple QuickCheck-style libraries available:

A QuickCheck test, also called a property for addition function looks like this:

```erlang
prop_sum() –>

?FORALL(
{X, Y},
{int(), int()},
addition(X,Y) - Y == X
).

```

Test this example from the command line by executing ./shell.sh. You will enter the Erlang shell. Then execute proper:quickcheck(ci_quickstart_math:prop_sum())..

If we look at the implementation of the QuickCheck test, notice that we are not testing specific numbers. We are testing a property of the addition function, namely when we add int X and Y, and subtract Y from the result of the addition, we should be left with X again.

The code {int(), int()} specifies that the QuickCheck should generate tuples with two random integers. Each generated tuple is bound to the pattern {X, Y} by Erlang pattern matching. Quickcheck will generate 100 combinations by default. With the numtests option we can increase this considerably: proper:quickcheck(ci_quickstart_math:prop_sum(),[{numtests,10000}])..

The challenge when using Quickcheck style testing, is to come up with good function properties. This is much harder than writing unit tests. It’s even more difficult to reason about function properties than actually writing the actual function. So why bother?

You need to reason about your code on a deeper level which improves your understanding of the problem you are solving, which tends to result in better code.

Common Test

As you might know, Erlang is a very good fit for building concurrent, distributed and fault tolerant systems. Testing if what you build is actually has those properties, is quite complex.

For that, Erlang offers Common Test. This test frameworks can do the heavy lifting required for meaningful system tests. The inherent complexity of concurrent, distributed en fault tolerant systems is reflected in Common Test. So, in this introduction we only take a very quick glance. In this example we mimic the initial unit test using pattern matching for assertions.

Continuous integration with Travis-CI

During development, you run your Erlang automated tests on your own workstation. But there comes a point where that’s no longer feasible because of the long duration or high load. Or you work in a team setting where it’s important that only working code is integrated. In those cases and for several other good reasons, you need to use a Continuous integration system.

There are several continuous integration systems that allow you to run automated tests for Erlang. In this example we use Travis-CI. It’s a free hosted continuous integration service for the open source community. Travis-CI integrates with the popular Github.

Let’s add our example project to Travis-CI.

Preparation

The build process of Travis-CI is configured with a .travis.yml file in the root of our repository:

How install Erlang/OTP

Writing Erlang

I use Aquamacs, An Emacs for mac users, with the Erlang mode provided by Erlang/OTP. Combined with Eric B Merritt’s projmake-mode and Mochiweb reloader this makes for a productive development environment.

Documentation

The official Erlang documentation is pretty good, but the writing style / structure takes a while to get used to. But the info is certainly there.

I keep a local copy of Erldocs on my development machine for quick access. Unfortunatly it doesn’t have an R16 copy and function signatures are not shown fully correct, but it works for me.

Learn you some erlang is a free online guide (and available as a dead tree version). It’s a very good intro for learning Erlang.

Best places to ask for help

The Erlang Questions Mailinglist is the best place to ask your Erlang questions. Don’t be surprised if you question is answered by Erlang inventors themselves!

As with other programming languages Stack overflow is also a great place to get answers to your pressing Erlang questions.

Erlang books

Compared to Java, the quantity of Erlang books is low. But the quality is pretty good! And a little birdy told me that some great new books will be published in the near future. Warning: affiliate links to Amazon ahead. You will be sponsoring my caffeine intake. Thank you.

Conferences & User groups

The Erlang Factory conferences are the best places to meet professional Erlang developers. I’ve attended a few of them, and I am always amazed by the quality of the speakers and the hallway discussions. Pro tip: make sure you have a substantial lunch and then stay for the drinks.

sh
$ git clone git://github.com/basho/rebar.git
$ cd rebar
$ ./bootstrap
Recompile: src/getopt
...
Recompile: src/rebar_utils
==> rebar (compile)
Congratulations! You now have a self-contained script called "rebar" in
your current working directory. Place this script anywhere in your path
and you can use rebar to build OTP-compliant apps.

The printout of fprof analyse is a text dump of the result, which can
grow over 1000 lines and contains a lot of noise which makes it hard
to locate the bottlenecks. Below a truncated
sample of an actual fprof trace.

OTP Supervision Tree

Looking at the OTP application supervision tree is a good way to understand the architecture of an OTP application.

The application supervisor async_sup starts up the following supervisors:

keyword_sup. A keyword_ser process is created for every unique word in the StackExchange posts. This keyword_ser is linked to the keyword_sup supervisor (a simple_one_for_one supervisor). The keyword_ser child process maintains a list of document positions of a keyword (an inverted index).

facet_sup. A keyword_ser process is also created for every unique facet category in the StackExchange posts. This keyword_ser process is linked to the facet_sup supervisor (a simple_one_for_one supervisor as well). The keyword_ser child process maintains a list of facet values with the IDs of the documents the facets appear in.

The application supervisor also start the following gen_server singleton processes:

document_ser. This server holds a copy of all documents, so it can return the original title and body of matching Stack Overflow posts in the results.

query_ser. This server's task is to run the actual query and return results.

websocket_ser. This server provides a HTTP frontend for the search engine.

No attention is given to fault tolerance (apart from the basic restart strategies), thus parts of the search index are lost if a keyword_ser process terminates.

Demo Data Import

The StackExchange data is provided as XML. Since some of the documents are quite large, it's not recommended to load the full XML documents in memory. The solution is to use a SAX parser which treats a XML file as a stream, and triggers events when new elements are discovered. The search server uses the excellent SAX parser from the Erlsom library by Willem de Jong.

In the example below erlsom:parse_sax reads the XML file from FilePath and calls the function sax_event if an XML element is found.

When the element is a row element (i.e. a post element), attributes like Id, Title and Body are stored in a dictionary. For every post a copy of all the attributes in document_ser is saved. This is used for returning the actual posts for a query match. After that the add_attribute_tokens function is called:

The add_attribute_tokens function does two things. It calls add_facet (discussed later) and it creates a list of tuples with all the words and their position in the document. This process is called tokenization. Each token/position tuple is then submitted to the add_keyword_position function of the keyword_ser for indexing.

Indexing

Indexing of the tuples, or keywords, is handled by the keyword_ser. For every unique word a keyword_ser process is started if not already present. The state of a keyword_ser process is a dictionary with the document ID as key and a list of positions as value. The document ID corresponds to the ID of the Stack Overflow post.

The keyword_server_name function generates a unique name under which the keyword_ser process is registered, so the module can check if a keyword already has a process or a new process needs to be created.

Stemming

Stemming is the process for reducing inflected words to their base form. Computing and computer both are stemmed to comput. So when a user searches on computing, it also matches text that contains computer. This makes it possible to return results that are relevant, but do not exactly match the query.

erlang:phash2 is used to transform the stemmed name to a hash, to make sure the registered process name is valid.

Faceting

Faceted search is an important navigation feature for search engines. A user can drill down the search results by filtering on pre-defined attributes, like in this example of a digital camera search on CNET:

As mentioned above, the data import the function add_attribute_tokens also calls the add_facet function. Using pattern matching the Tags and the Creationdate attributes are selected for faceting. Tags is a so called multivalue facet, as a Stack Overflow post can have one or more tags assigned. For every tag and creation date the facet_ser:add_facet_value function is called.

facet_ser works very similar to keyword_ser. For every facet category, Tag or Creationdate in our case, a facet_ser processes is started. The state of a facet_ser is a dictionary with the Tag or Creationdate values as key and their document IDs as dictionary values.

Querying and Relevance Ranking

In previous sections is shown:

how the XML demo data is parsed.

how this data is stemmed and indexed by creating a keyword_ser process for every unique keyword.

how this data is indexed for faceted search by creating a facet_ser process for every facet category.

With the function stackoverflow_importer_ser:import() these steps are executed, and your Erlang node is now ready for querying. So how does that work?

Querying

Querying is handled by passing the user's query terms to the function do_async_query of the singleton query_ser server. When calling this function you need to specify the module, function and optional reference attribute which will be called when query results are available.

In the handle_cast the following steps are executed:

keyword_ser:do_query return all document ids that contain one or more of the user's query terms, including the relevance ranking score, which will be discussed below.

All original documents are stored during indexing in a document_ser process. All matching documents are collected.

The callback function is invoked with the matching documents and their ranking scores as arguments.

Facet results are retrieved for any FacetCategories that are specified by calling facet_ser:get_facets.

And the callback function is invoked a second time with the facet results as arguments.

Relevance Ranking

Relevance in this context denotes how well a retrieved document matches the user's search query. Most fulltext search-engines use the BM25 algorithm to determine the ranking score of each document, so let's use that too.

BM25 calculates a ranking score based on the query term frequency in each documents.

Displaying the Search Results

As discussed, the query_ser:do_async_query can be called to query our full-text search engine. To allow users to send queries and see the result the websocket_ser module is created. This singleton gen_serverstarts up a Misultin HTTP server on Port 3000. If you browse to http://localhost:3000 you will see a search box. Communication with the search engine is done through websockets.

So, when a user posts a query, this message is received by the websockets_ser:handle_websocket receive block. The query_ser:do_async_query function is called and query results are expected on websockets_ser:query_results function.

The query_results function formats the results as HTML and sends this through the websocket. When received, the HTML is appended to the user's page.

A similar process is executed when the facet results are received:

Improvements

Some obvious features that are lacking from this sample application:

The author of this post is an Erlang newbie. Corrections/suggestions to the code are most welcome. You can send them to <ward@tty.nl>

Pretty much no attention is given to performance / memory usage.

Fault tolerence for the index data. When a server containing index state dies, it will not be revived.

Tuple structures passed between modules are not specified. Would be nice to use record syntax for it.