There's an updated version of Hadoopi, a Hadoop distribution for the Raspberry Pi. Hadoopi supports various components of the Hadoop ecosystem including HBase, Hive, and Spark. The new release has wired networking (for improved performance and reliability) plus the addition of metrics collection with Prometheus and visualisation of those metrics in Grafana dashboards.

Hadoopi is a project on GitHub that has the configuration files and Chef code to configure a cluster of five Raspberry Pi 3s as a working Hadoop distribution running Hue.

If the idea of running Hadoop on Raspberry Pis sounds unlikely, it follows a number of earlier experiments doing similar things with Hadoop and Raspberry PIs. It is also an illustration of just what the Raspberry Pi is capable of, along with other eyecatching uses we've covered such as the Raspberry Pi cluster created by GCHQ, the UK equivalent of the NSA, and the simulation of the Turing-Welchman Bombe.

The notion of using Raspberry Pis to learn Hadoop makes a lot of sense. If you're interested in big data and want to get to grips with Hadoop, you either need to spend a lot of money on hardware or you use a cloud-based Hadoop distribution. Hadoop has a distributed architecture that needs multiple computers to work. Cloud-based systems mask the interaction between the software and the hardware, so making it harder to get an in-depth understanding of how the cluster works.

One of the first Raspberry Pi-based Hadoop systems was demonstrated by Jamie Whitehorn at the Strata + Hadoop World conference in 2013, and Andy Burgin, the creator of Hadoopi acknowledges Whitehorn as one of the inspirations of Hadoopi.

On GitHub Burgin, points out that while Hadoopi runs most of the Hadoop ecosystem, there's no support for Impala because the Pi hasn't really got enough power while running other Hadoop components. Security is HDFS based rather than using Sentry, and the hardware limitations in terms of 1GB of memory and only four cores means you can really only run one task at a time and it's not fast. Burgin says:

"Its slowwwwwwwww - the combination of teeny amount of RAM and only 4 cores means this is not built for speed, be realistic with your expectations!"

The fact that the Raspberry Pi is ARM based means you have to compile Hadoop (with the correct version of protobuf libraries, Oozie and Hue.

The updated version of Hadoopi adds support for Grafana Dashboards that are are built using Prometheus metrics collected by a number of exporters. The Node exporter provides metrics about the state of each node; the MySQL exporter provides metrics for the mysql server; and the JMX exporter exposes selected jmx metrics.

The Hadoopi page on GitHub has full details of how to set up and run the cluster.