My motivation

I was so curious about excellence of the image recognition with TensorFlow on Raspberry Pi. Also, the Jupyter notebook is very convenient to instantly code as a quick prototype. So, in terms of error rate of the image classification, Inception V3(3.46%) is more excellent than human(5.1%) whereas raspberry pi’s processing speed is very slow compare to my laptop.

Enabling PWM output on GPIO pins.

Available PINS

The following object depicts available pins for all revisions of raspberry-pi, the key is the actual number of the physical pin header on the board,the value is the GPIO pin number assigned by the OS, for the pins with changes between board revisions, the value contains the variations of GPIO pin number assignment between them (eg.rev1, rev2, rev3).

You should just be concerned with the key (number of the physical pin header on the board), Cylon.JS takes care of the board revision and GPIO pin numbers for you, this full list is for reference only.

I work as a robotics teacher in Sydney. I want to introduce my AI robot to my students in my class next month. In addition, I’m joining NASA Open Innovation Initiative (also known as NASA Space Apps Challenge) with my AI robot to measure the space environment such as temperature, humidity, and pressure. So, I’m so excited!!

Introduction

The IBM Watson Cloud Robot can recognize a human face, voice, and text like a human. The robot clearly recognized the celebrity (Elon Musk) and who he was. Also, it recognized my voice & any text. (YouTube)

This instructable will cover the basic steps that you need to follow to get started with open sources such as Watson nodes (Visual Recognition V3, Speech To Text, Text To Speech) for IBM Bluemix, Node-RED, MQTT v3.1. MQTT(Message Queueing Telemetry Transport) is a Machine-To-Machine(M2M) or Internet of Things (IoT) connectivity protocol that was designed to be extremely lightweight and useful when low battery power consumption and low network bandwidth is at a premium. It was invented in 1999 by Dr. Andy Stanford-Clark and Arlen Nipper and is now an Oasis Standard .

(7) Try dragging & dropping any node from the left-hand side to right-hand side. It’s really easy to code. ( You can conveniently use the visual editor offline as well as online. ) Download all files at Download list. (1) Click the number (1) at the right-hand side corner shown in NodeRED on web-browser. (2) Click the Import button on the drop down menu. (3) Open the Clipboard shown in the above 1st picture. (4) Lastly, paste the given JSON format text of ‘____ver0.1.txt’ (Download List) in Import nodes editor.

Step 5: Setting up MQTT v3.1 on Raspberry Pi2

There are two options such as using eclipse paho, installing a mosquitto sever. Also, you can use (1) option instead of (2) opption.

(1) Using “iot.eclipse.org”.

Click each MQTT node and Type it.

iot.eclipse.org

(2) Setting up MQTT v3.1 on Raspberry Pi2

This message broker(Mosquitto) is supported by MQTT v3.1 and it is easily installed on the Raspberry Pi and somewhat less easy to configure. Next we step through installing and configuring the Mosquitto broker. We are going to install & test the MQTT “mosquitto” on terminal window. Click that.

Step 6: Checking your NodeRED codes with MQTT on Raspberry Pi2

When you will use the JSON format of the ‘NodeRED_Text_files_ver0.1.txt’ (Download List) on Node-RED, it’s automatically set up & coded each data. I have already set up the each data in each node.

(1) Click each node.

(2) Check information inside each node has been prefilled.

(3) Please don’t change the set data. (The above can be customized for more advanced users.)

Step 7: Adding & Setting up PID node, Dashboard on Raspberry Pi2

Searching the Nodes

Node-RED comes with a core set of useful nodes, but there are a growing number of additional nodes available for installing from both the Node-RED project as well as the wider community. You can search for available nodes in the Node-RED library or on the npm repository .

For example, we are going to search ‘node-red-node-pidcontrol’ at the npm web. Click here .

Then, we are going to install npm package, node-red-node-pidcontrol, node-red-dashboard on Raspberry Pi.

To add additional nodes you must first install the npm tool, as it is not included in the default installation. The following commands install npm and then upgrade it to the latest 2.x version.

sudo apt-get update

sudo apt-get install npm

sudo npm install -g npm@2.x

hash -r

cd /home/pi/.node-red

For example, ‘npm install node-red-{example node name}’

Copy the ‘npm install node-red-node-pidcontrol’ from the npm web. Paste it on a terminal window.

Step 8: Configuring the PS3 EYE camera with microphone

This Sony PS3 eye USB camera that can achieve up to 187 frames per second can be found for under $8 on Amazon.com that should make it quite a bargain for those wishing to experiment with CV projects. The PlayStation Eye camera for the PS3 is similar to a web camera but can also be used for computer vision and gesture recognition tasks. The PlayStation Eye has been supported by the Linux kernel since the late Linux 2.6 days but with a future update (Linux 3.20 or later given that the 3.19 merge window is closed) will support higher modes.

(1) Install a USB driver on Raspberry Pi.

sudo apt-get install fswebcam

(2) Take a picture and then check the ‘visionImage.jpg’ file in the /home/pi

(3) Don’t forget to put the Bluemix service credentials for Watson Services such as Visual recognition, Speech to Text, and, Text to Speech. ( How to use the IBM Bluemix platform: https://console.ng.bluemix.net/docs/ )

(4) Make an image file (jpg) server for every boot.

<p>cd /etc/xdg/autostart/</p>

<p>sudo nano imageFileServer.desktop</p>

Type the description below or put the ‘imageFileServer.desktop’ file into /etc/xdg/autostart/ folder.

You need to explicitly enable the serial port on the GPIO pins. The reason for this is a change with the Pi 3 to use the hardware serial port for Bluetooth and instead use a slightly different software serial port for the GPIO pins. A side effect of this change is that the serial port will actually change speed as the Pi CPU clock throttles up and down–this will unfortunately cause problems for most serial devices like GPS receivers!

Luckily there’s an easy fix detailed in this excellet blog post to force the Pi CPU into a fixed frequency which prevents speed changes on the serial port. The Pi might not perform as well but it will have a stable serial port speed.

Introduction

This instructable will cover the basic steps that you need to follow to get started with open sources such as Node-RED, MQTT v3.1, and Watson NodeRED for IBM Bluemix. MQTT(Message Queueing Telemetry Transport) is a Machine-To-Machine(M2M) or Internet of Things (IoT) connectivity protocol that was designed to be extremely lightweight and useful when low battery power consumption and low network bandwidth is at a premium. It was invented in 1999 by Dr. Andy Stanford-Clark and Arlen Nipper and is now an Oasis Standard.

The different tutorials can cause a great deal of confusion, which is why I have tried to make the easiest setup possible. Specifically, this instructable will cover how to code the Node-RED on Raspberry Pi2 as an MQTT client by connecting to your home wireless network and how to send sensor data. When you finish this project, I suggest you another M2M communication approach.

Step 5: Setting up MQTT v3.1 on Raspberry Pi2

Setting up MQTT v3.1 on Raspberry Pi2

This message broker(Mosquitto) is supported by MQTT v3.1 and it is easily installed on the Raspberry Pi and somewhat less easy to configure. Next, we step through installing and configuring the Mosquitto broker.

We are going to install & test the MQTT “mosquitto” on a terminal window.

curl -O http://repo.mosquitto.org/debian/mosquitto-repo.gpg.key

sudo apt-key add mosquitto-repo.gpg.key

rm mosquitto-repo.gpg.key

cd /etc/apt/sources.list.d/

sudo curl -O http://repo.mosquitto.org/debian/mosquitto-jessie.list

sudo apt-get update

Next install the broker and command line clients:

mosquitto – the MQTT broker (or in other words, a server)

mosquitto-clients – command line clients, very useful in debugging

python-mosquitto – the Python language bindings

sudo apt-get install mosquitto mosquitto-clients python-mosquitto

As is the case with most packages from Debian, the broker is immediately started. Since we have to configure it first, stop it.

sudo /etc/init.d/mosquitto stop

Now that the MQTT broker is installed on the Pi we will add some basic security.

Create a config file:

cd /etc/mosquitto/conf.d/
sudo nano mosquitto.conf

Let’s stop anonymous clients connecting to our broker by adding a few lines to your config file. To control client access to the broker we also need to define valid client names and passwords. Add the lines:

allow_anonymous false

password_file /etc/mosquitto/conf.d/passwd

require_certificate false

Save and exit your editor (nano in this case).

From the current /conf.d directory, create an empty password file:

sudo touch passwd

We will use the mosquitto_passwd tool to create a password hash
for user pi:

sudo mosquitto_passwd -c /etc/mosquitto/conf.d/passwd pi

You will be asked to enter your password twice. Enter the password you wish to use for the user you defined.

Testing Mosquitto on Raspberry Pi

Now that Mosquitto is installed we can perform a local test to see if it is working:

Open three terminal windows. In one, make sure the Mosquitto broker is running:

mosquitto

In the next terminal, run the command line subscriber:

mosquitto_sub -v -t 'topic/test'

You should see the first terminal window echo that a new client is connected.

In the next terminal, run the command line publisher:

mosquitto_pub -t 'topic/test' -m 'helloWorld'

You should see another message in the first terminal window saying another client is connected. You should also see this message in the subscriber terminal:

topic/test helloWorld

We have shown that Mosquitto is configured correctly and we can both publish and subscribe to a topic.

When you finish testing all, let’s set up below that.

sudo /etc/init.d/mosquitto start

Step 6: Checking your NodeRED codes with MQTT on Raspberry Pi2

When you have already used the JSON format of the ‘SmartGasValve_NodeRED.txt’ on Node-RED, it’s automatically set up & coded each data. I have already set up the each data in each node.

Searching the Nodes

Node-RED comes with a core set of useful nodes, but there are a growing number of additional nodes available for install from both the Node-RED project as well as the wider community.
You can search for available nodes in the Node-RED library or on the npm repository.

For example, we are going to search Twilio at the npm web. Click here.

Next step, we are going to install Twilio on the raspberry pi.

Installing npm packaged node

To add additional nodes you must first install the npm tool, as it is not included in the default installation. The following commands install npm and then upgrade it to the latest 2.x version.

sudo apt-get update

sudo apt-get install npm

sudo npm install -g npm@2.x

hash -r

cd /home/pi/.node-red

For example, ‘npm install node-red-{example node name}’

Copy the ‘npm install node-red-node-twilio’ from the npm web. Paste it on terminal.

Then, we are going to install both node-red-node-watson and node-red-contrib-play-audio.

Introduction

My motivation for PID Control For CPU Temperature of Raspberry Pi came for many reasons such as very hot CPU, very noisy fan’s sound and fast battery consumption because the hot CPU makes the system really unstable while using Raspberry Pi for a long time. So, I have optimized the failing by using PID node on Node-RED. It’s visually helpful for a trainee to understand the PID control system for an educational purpose.

This will cover the basic steps that you need to follow to get started with open sources like PID node, MQTT node in the Node-RED. Also, it’s really painful and hard to tune 3 gains like KP, KI, and KD as manual tuning(Trial and error) method. There are many tuning methods such as Manual tuning, Ziegler–Nichols, Tyreus Luyben, Cohen–Coon, Åström-Hägglund and Software tools such as Simulink in Matlab or Excel PID Simulator (enclosed). I’ve already provided my source codes in the Download List but If you use a different fan, you should tune PID gains because most physical fan’s characteristics are different. You can get more information from the linked web (PID controller).

MQTT(Message Queueing Telemetry Transport) is a Machine-To-Machine(M2M) or Internet of Things (IoT) connectivity protocol that was designed to be extremely lightweight and useful when low battery power consumption and low network bandwidth is at a premium. It was invented in 1999 by Dr. Andy Stanford-Clark and Arlen Nipper and is now an Oasis Standard. I’ve already published about how to approach the MQTT below the linked webs.

(7) Try dragging & dropping any node from the left-hand side to right-hand side. It’s really easy to code. ( You can conveniently use the visual editor offline as well as online. ) Download the ‘PID_Control_For_CPU_TEM_ver0.5.txt’ file. (1) Click the number (1) at the right-hand side corner shown in NodeRED on web-browser. (2) Click the Import button on the drop down menu. (3) Open the Clipboard shown in the above 1st picture. (4) Lastly, paste the given JSON format text of ‘PID_Control_For_CPU_TEM_ver0.5.txt’ in Import nodes editor.

Step 5: Setting up MQTT v3.1 on Raspberry Pi2

Setting up MQTT v3.1 on Raspberry Pi2

This message broker(Mosquitto) is supported by MQTT v3.1 and it is easily installed on the Raspberry Pi and somewhat less easy to configure. Next we step through installing and configuring the Mosquitto broker. We are going to install & test the MQTT “mosquitto” on terminal window.

curl -O http://repo.mosquitto.org/debian/mosquitto-repo.gpg.key

sudo apt-key add mosquitto-repo.gpg.key

rm mosquitto-repo.gpg.key

cd /etc/apt/sources.list.d

sudo curl -O http://repo.mosquitto.org/debian/mosquitto-jessie.list

sudo apt-get update

Next install the broker and command line clients:

mosquitto – the MQTT broker (or in other words, a server)

mosquitto-clients – command line clients, very useful in debugging

python-mosquitto – the Python language bindings

sudo apt-get install mosquitto mosquitto-clients python-mosquitto

As is the case with most packages from Debian, the broker is immediately started. Since we have to configure it first, stop it.

sudo /etc/init.d/mosquitto stop

Now that the MQTT broker is installed on the Pi we will add some basic security.

Create a config file:

cd /etc/mosquitto/conf.d/
sudo nano mosquitto.conf

Let’s stop anonymous clients connecting to our broker by adding a few lines to your config file. To control client access to the broker we also need to define valid client names and passwords. Add the lines:

We will to use the mosquitto_passwd tool to create a password hash for user pi:

sudo mosquitto_passwd -c /etc/mosquitto/conf.d/passwd pi

You will be asked to enter your password twice. Enter the password you wish to use for the user you defined.

Testing Mosquitto on Raspberry Pi

Now that Mosquitto is installed we can perform a local test to see if it is working: Open three terminal windows. In one, make sure the Mosquitto broker is running:

mosquitto

In the next terminal, run the command line subscriber:

mosquitto_sub -v -t 'topic/test'

You should see the first terminal window echo that a new client is connected.In the next terminal, run the command line publisher:

mosquitto_pub -t 'topic/test' -m 'helloWorld'

You should see another message in the first terminal window saying another client is connected. You should also see this message in the subscriber terminal:

topic/test helloWorld

We have shown that Mosquitto is configured correctly and we can both publish and subscribe to a topic.When you finish testing all, let’s set up below that.

sudo /etc/init.d/mosquitto start

Step 6: Checking your NodeRED codes with MQTT on Raspberry Pi2

When you will use the JSON format of the ‘PID_Control_For_CPU_TEM_ver0.5.txt‘ on Node-RED, it’s automatically set up & coded each data. I have already set up the each data in each node.

(1) Click each node.

(2) Check information inside each node has been prefilled.

(3) Please don’t change the set data. (The above can be customized for more advanced users.)

Step 7: Adding & Setting up PID node, Dashboard on Raspberry Pi2

Searching the Nodes

Node-RED comes with a core set of useful nodes, but there are a growing number of additional nodes available for installing from both the Node-RED project as well as the wider community. You can search for available nodes in the Node-RED library or on the npm repository.

For example, we are going to search ‘node-red-node-pidcontrol‘ at the npm web. Click here.

Then, we are going to install npm package, node-red-node-pidcontrol, node-red-dashboard on Raspberry Pi.

To add additional nodes you must first install the npm tool, as it is not included in the default installation. The following commands install npm and then upgrade it to the latest 2.x version.

sudo apt-get update

sudo apt-get install npm

sudo npm install -g npm@2.x

hash -r

cd /home/pi/.node-red

For example, ‘npm install node-red-{example node name}’

Copy the ‘npm install node-red-node-pidcontrol’ from the npm web. Paste it on a terminal window.

(7) The chart to display on the web browser is same as the gauge (1 – 9 steps).

Step 9: Tuning PID controller

There are many tuning methods such as manual tuning, Ziegler–Nichols, Tyreus Luyben, Cohen–Coon, Åström-Hägglund and software tools such as Simulink in Matlab or Excel PID Simulator(enclosed). I’ve used 2 tuning methods like manual tuning, Ziegler-Nichols method and software tools such as Matlab, Simulink, and Excel. (According to Wikipedia: PID Controller)

Manual Tuning(Trial and error)

How do the PID parameters affect system dynamics?

We are most interested in four major characteristics of the closed-loop step response. They are

– Rise Time: the time it takes for the plant output y to rise

– Overshoot: how much the peak level is higher than the steady state, normalized against the steady – state.– Settling Time: the time it takes for the system to converge to its steady state.

– Steady-state Error: the difference between the steady-state output and the desired output.

(NT: No definite trend. Minor change.)

How do we use the table?

Typical steps for designing a PID controller are Determine what characteristics of the system needs to be improved.

– Use KP to decrease the rise time.

– Use KD to reduce the overshoot and settling time.

– Use KI to eliminate the steady-state error.

– This works in many cases, but what would be a good starting point? What if the first parameters we choose are totally crappy? Can we find a good set of initial parameters easily and quickly?

Ziegler–Nichols method

– Ziegler and Nichols conducted numerous experiments and proposed rules for determining values of KP, KI, and KD based on the transient step response of a plant.

– They proposed more than one methods, but we will limit ourselves to what’s known as the first method of Ziegler-Nichols in this tutorial. It applies to plants with neither integrators nor dominant complex-conjugate poles, whose unit-step response resemble an S-shaped curve with no overshoot. This S-shaped curve is called the reaction curve. This S-shaped curve is called the reaction curve.

– The S-shaped reaction curve can be characterized by two constants, delay time L and time constant T, which are determined by drawing a tangent line at the inflection point of the curve and finding the intersections of the tangent line with the time axis and the steady-state level line.

– The Ziegler-Nichols Tuning Rule Table

Using the parameters L and T, we can set the values of KP, KI, and KDaccording to the formula shown in the table above.

These parameters will typically give you a response with an overshoot about 25% and good settling time. We may then start fine-tuning the controller using the basic rules that relate each parameter to the response characteristics. KP, KI, and KD based on the transient step response of a plant.

PID tuning software

– Matlab: PID Controller Tuning

– Simulink: PID Controller Tuning

– Excel PID simulator

– Etc

PID control VS On/Off control

– On/Off control: An on-off controller is the simplest form of temperature control device. The output from the device is either on or off, with no middle state. An on-off controller will switch the output only when the temperature crosses the setpoint. For heating control, the output is on when the temperature is below the setpoint, and off above setpoint. Since the temperature crosses the setpoint to change the output state, the process temperature will be cycling continually, going from below setpoint to above, and back below. In cases where this cycling occurs rapidly, and to prevent damage to contactors and valves, an on-off differential, or “hysteresis,” is added to the controller operations. This differential requires that the temperature exceeds setpoint by a certain amount before the output will turn off or on again. On-off differential prevents the output from “chattering” or making fast, continual switches if the cycling above and below the setpoint occurs very rapidly. On-off control is usually used where a precise control is not necessary, in systems which cannot handle having the energy turned on and off frequently, where the mass of the system is so great that temperatures change extremely slowly, or for a temperature alarm. One special type of on-off control used for alarm is a limit controller. This controller uses a latching relay, which must be manually reset, and is used to shut down a process when a certain temperature is reached.

– PID control: This controller provides proportional with integral and derivative control, or PID. This controller combines proportional control with two additional adjustments, which helps the unit automatically compensate for changes in the system. These adjustments, integral and derivative, are expressed in time-based units; they are also referred to by their reciprocals, RESET, and RATE, respectively. The proportional, integral and derivative terms must be individually adjusted or “tuned” to a particular system using trial and error. It provides the most accurate and stable control of the three controller types, and is best used in systems which have a relatively small mass, those which react quickly to changes in the energy added to the process. It is recommended in systems where the load changes often and the controller is expected to compensate automatically due to frequent changes in setpoint, the amount of energy available, or the mass to be controlled.

R on the Raspberry Pi

I’m interested in running R on the Raspberry Pi, and on Raspbian in particular. There are loads of Debian packages for R, and I’m hoping that many of these find there way into Raspbian eventually. Right now it is possible to install and run R from Raspbian, but relatively few packages are available. However, the package r-base can be installed, and that is enough to get up and running with a basic R installation. So,

% sudo apt-get install r-base
% R

should be enough to get started. Indeed, here’s a little Raspbian session to illustrate R running on the Pi:

Nice! Base graphics such as scatter plots and histograms all work fine, and can be piped to a remote X server if needed. So even without all the add-on packages it is a perfectly reasonable platform for basic data analysis. To benchmark it, I used my standard Gibbs sampling script, gibbs.R

Unfortunately, this takes over 400 minutes, which is around 3 times slower than the equivalent python benchmarking script that I have run on Raspbian. On Intel, R is around half the speed of python, so there’s a bit of a gap there, but actually python runs slower than it should on the Pi anyway. Comparing against R on Intel, on my fast i7 laptop, this R script takes around 7 minutes, and on my Atom based netbook, it takes around 57 minutes. This is consistent with my other findings – namely that the speed difference between C and higher level languages is greater on the Pi than on Intel. Nevertheless, for many basic data analysis tasks, speed isn’t that much of an issue, and it’s certainly going to be very convenient to have R on the Pi.

Carsten Mönning and Waldemar SchillerHadoop has developed into a key enabling technology for all kinds of Big Data analytics scenarios. Although Big Data applications have started to move beyond the classic batch-oriented Hadoop architecture towards near real-time architectures such as Spark, Storm, etc., [1] a thorough understanding of the Hadoop & MapReduce & HDFS principles and services such as Hive, HBase, etc. operating on top of the Hadoop core still remains one of the best starting points for getting into the world of Big Data. Renting a Hadoop cloud service or even getting hold of an on-premise Big Data appliance will get you Big Data processing power but no real understanding of what is going on behind the scene.To inspire your own little Hadoop data lab project, this four part blog will provide a step-by-step guide for the installation of open source Apache Hadoop from scratch on Raspberry Pi 2 Model B over the course of the next three to four weeks. Hadoop is designed for operation on commodity hardware so it will do just fine for tutorial purposes on a Raspberry Pi. We will start with a single node Hadoop setup, will move on to the installation of Hive on top of Hadoop, followed by using the Apache Hive connector of the free SAP Lumira desktop trial edition to visually explore a Hive database. We will finish the series with the extension of the single node setup to a Hadoop cluster on multiple, networked Raspberry Pis. If things go smoothly and varying with your level of Linux expertise, you can expect your Hadoop Raspberry Pi data lab project to be up and running within approximately 4 to 5 hours.We will use a simple, widely known processing example (word count) throughout this blog series. No prior technical knowledge of Hadoop, Hive, etc. is required. Some basic Linux/Unix command line skills will prove helpful throughout. We are assuming that you are familiar with basic Big Data notions and the Hadoop processing principle. If not so, you will find useful pointers in [3] and at: http://hadoop.apache.org/. Further useful references will be provided in due course.
Part 1 – Single node Hadoop on Raspberry Pi 2 Model B (~120 mins)

The initial Raspberry setup procedure is described by, amongst others, Jonas Widriksson athttp://www.widriksson.com/raspberry-pi-hadoop-cluster/. His blog also provides some pointers in case you are not starting off with a Raspberry Pi accessory bundle but prefer obtaining the hard- and software bits and pieces individually. We will follow his approach for the basic Raspbian setup in this part, but updated to reflect Raspberry Pi 2 Model B-specific aspects and providing some more detail on various Raspberry Pi operating system configuration steps. To keep things nice and easy, we are assuming that you will be operating the environment within a dedicated local wireless network thereby avoiding any firewall and port setting (and the Hadoop node & rack network topology) discussion. The basic Hadoop installation and configuration descriptions in this part make use of [3].
The subsequent blog parts will be based on this basic setup.

Raspberry Pi setup

Powering on your Raspberry Pi will automatically launch the pre-installed NOOBS installer on the SD card. Select “Raspbian”, a Debian 7 Wheezy-based Linux distribution for ARM CPUs, from the installation options and wait for its subsequent Installation procedure to complete. Once the Raspbian operating system has been installed successfully, your Raspberry Pi will reboot automatically and you will be asked to provide some basic configuration settings using raspi-config. Note that since we are assuming that you are using NOOBS, you will not need to expand your SD card storage (menu Option Expand Filesystem). NOOBS will already have done so for you. By the way, if you want or need to run NOOBS again at some point, press & hold the shift key on boot and you will be presented with the NOOBS screen.

Basic configuration

What you might want to do though is to set a new password for the default user “pi” via configuration optionChange User Password. Similarly, set your internationalisation options, as required, via optionInternationalisation Options.

More interestingly in our context, go for menu item Overclock and set a CPU speed to your liking taking into account any potential implications for your power supply/consumption (“voltmodding”) and the life-time of your Raspberry hardware. If you are somewhat optimistic about these things, go for the “Pi2” setting featuring 1GHz CPU and 500 MHz RAM speeds to make the single node Raspberry Pi Hadoop experience a little more enjoyable.

Under Advanced Options, followed by submenu item Hostname, set the hostname of your device to “node1”. Selecting Advanced Options again, followed by Memory Split, set the GPU memory to 32 MB.

Finally, under Advanced Options, followed by SSH, enable the SSH server and reboot your Raspberry Pi by selecting<Finish> in the configuration menu. You will need the SSH server to allow for Hadoop cluster-wide operations.
Once rebooted and with your “pi” user logged in again, the basic configuration setup of your Raspberry device has been successfully completed and you are ready for the next set of preparation steps.

Network configuration

To make life a little easier, launch the Raspbian GUI environment by entering startx in the Raspbian command line.(Alternatively, you can use, for example, the vi editor, of course.) Use the GUI text editor, “Leafpad”, to edit the /etc/network/interfaces text file as shown to change the local ethernet settings for eth0 from DHCP to the static IP address 192.168.0.110. Also add the netmask and gateway entries shown. This is the preparation for our multi-node Hadoop cluster which is the subject of Part 4 of this blog series.

Java environment

Hadoop is Java coded so requires Java 6 or later to operate. Check whether the pre-installed Java environment is in place by executing:

java –version
You should be prompted with a Java 1.8, i.e. Java 8, response.

Hadoop user & group accounts

Set up dedicated user and group accounts for the Hadoop environment to separate the Hadoop installation from other services. The account IDs can be chosen freely, of course. We are sticking here with the ID examples in Widriksson’s blog posting, i.e. group account ID “hadoop” and user account ID “hduser” within this and the sudo user groups.

sudo addgroup hadoop

sudo adduser –-ingroup hadoop hduser

sudo adduser to group sudo

SSH server configuration

Generate a RSA key pair to allow the “hduser” to access slave machines seamlessly with empty passphrase. The public key will be stored in a file with the default Name “id_rsa.pub” and then appended to the list of SSH authorised keys in the file “authorized_keys”. Note that this public key file will need to be shared by all Raspberry Pis in an Hadoop cluster (Part 4).

su hduser

mkdir ~/.ssh

ssh-keygen –t rsa –P “”

cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Verify your SSH server access via: ssh localhost
This completes the Raspberry Pi preparations and you are all set for downloading and installing the Hadoop environment.

Hadoop installation & configuration

Similar to the Rasbian installation & configuration description above, we will talkyou through the basic Hadoop installation first, followed by the various
environment variable and configuration settings.

Basic setup

You need to get your hands on the latest stable Hadoop version (here: version 2.6.0) so initiate the download from any of the various Apache mirror sites (here: spacedump.net).

sudo tar –xvzf hadoop-2.6.0.tar.gz -C /opt/
Following extraction, rename the newly created hadoop-2.6.0 folder into something a little more convenient such as “hadoop”. cd /opt

sudo mv Hadoop-2.6.0 hadoop
Running, for example, ls –al, you will notice that your “pi” user is the owner of the “Hadoop” directory, as expected. To allow for the dedicated Hadoop user “hduser” to operate within the Hadoop environment, change the ownership of the Hadoop directory to “hduser”. sudo chown -R hduser:hadoop hadoop
This completes the basic Hadoop installation and we can proceed with its configuration.

Environment settings

Switch to the “hduser” and add the export statements listed below to the end of the shell startup file ~/.bashrc. Instead of using the standard vi editor, you could, of course, make use of the Leafpad text editor within the GUI environment again. su hduser

export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
This way both the Java and the Hadoop installation as well as the Hadoop binary paths become known to your user environment. Note that you may add the JAVA_HOME setting to the hadoop-env.sh script instead, as shown below.
Apart from these environment variables, modify the /opt/hadoop/etc/hadoop/hadoop-env.sh script as follows. If you are using an older version of Hadoop, this file can be found in: /opt/hadoop/conf/. Note that in case you decide to relocate this configuration directory, you will have to pass on the
directory location when starting any of the Hadoop daemons (see daemon table below) using the –config option. vi /opt/hadoop/etc/hadoop/hadoop-env.sh
Hadoop assigns 1 GB of memory to each daemon so this default value needs to be reduced via parameterHADOOP_HEAPSIZE to
allow for Raspberry Pi conditions. The JAVA_HOME setting for the location of the Java implementation may be omitted if already set in your shell environment, as shown above. Finally, set the datanode’s Java virtual machine to client mode. (Note that with the Raspberry Pi 2 Model B’s ARMv7 processor, this
ARMv6-specific setting is not strictly necessary anymore.) # The java implementation to use. Required, if not set in the home shell

Hadoop daemon properties

With the environment settings completed, you are ready for the more advanced Hadoop daemon configurations. Note that the configuration files are not held globally, i.e. each node in an Hadoop cluster holds its own set of configuration files which need to be kept in sync by the administrator using, for example, rsync.
Modify the following files, as shown below, to configure the Hadoop system for operation in pseudodistributed mode. You can find these files in directory /opt/hadoop/etc/hadoop. In the case of older Hadoop versions, look for the files in:/opt/hadoop/conf

core-site.xml

Common configuration settings for Hadoop Core.

hdfs-site.xml

Configuration settings for HDFS daemons:
The namenode, the secondary namenode and the datanodes.

mapred-site.xml

General configuration settings for MapReduce
daemons. Since we are running MapReduce usingYARN, the MapReduce jobtracker and tasktrackers are replaced with a single resource manager running on the namenode.

Hadoop Data File System (HDFS) creation

HDFS has been automatically installed as part of the Hadoop installation. Create a tmp folder within HDFS to store temporary test data and change the directory ownership to your Hadoop user of choice. A new HDFS installation needs to be formatted prior to use. This is achieved via -format. sudo mkdir -p /hdfs/tmp

sudo chown hduser:hadoop /hdfs/tmp

sudo chmod 750 /hdfs/tmp

hadoop namenode -Format

Launch HDFS and YARN daemons

Hadoop comes with a set of scripts for starting and stopping the various daemons. They can be found in the /bindirectory. Since you are dealing with a single node setup, you do not need to tell Hadoop about the various machines in the cluster to execute any script on and you can simply execute the following scripts straightaway to launch the Hadoop file system (namenode, datanode and secondary namenode) and YARN resource manager daemons. If you need to stop these daemons, use the stop-dfs.sh and stop-yarn.sh script, respectively. /opt/hadoop/sbin/start-dfs.sh

/opt/hadoop/sbin/start-yarn.shCheck the resource manager web UI at http://localhost:8088 for a node overview. Similarly,http://localhost:50070 will provide you with details on your HDFS. If you find yourself in need for issue diagnostics at any point, consult the log4j.log file in the Hadoop installation directory/logs first. If preferred, you can separate the log files from the Hadoop installation directory by setting a new log directory in HADOOP_LOG_DIR and adding it to script hadoop-env.sh.

With all the implementation work completed, it is time for a little Hadoop processing example.

An example

We will run some word count statistics on the standard Apache Hadoop license file to give your Hadoop core setup a simple test run. The word count executable represents a standard element of your Hadoop jar file. To get going, you need to upload the Apache Hadoop license file into your HDFS home directory.

hadoop fs -copyFromLocal /opt/hadoop/LICENSE.txt /license.txt
Run word count against the license file and write the result into license-out.txt.

hadoop jar /opt/hadoop-examples-2.6.0.jar wordcount /license.txt /license-out.txt
You can get hold of the HDFS output file via:

hadoop fs -copyToLocal /license-out.txt ~/Have a look at ~/license-out.txt/part-r-00000 with your preferred text editor to see the word count results. It should look like shown in the extract below.

We will build on these results in the subsequent parts of this blog series on Hive QL and its SAP Lumira integration.

Following on from the Hadoop core installation on a Raspberry Pi 2 Model B in Part 1 of this blog series, in this Part 2, we will proceed with installing Apache Hive on top of HDFS and show its basic principles with the help of last part’s word count Hadoop processing example.
Hive represents a distributed relational data warehouse featuring a SQL-like query language, HiveQL, inspired by the MySQL SQL dialect. A high-level comparison of the HiveQl and SQL is provided in [1]. For a HiveQL command reference, see: https://cwiki.apache.org/confluence/display/Hive/LanguageManual.
The Hive data sits in HDFS with HiveQL queries getting translated into MapReduce jobs by the Hadoop run-time environment. Whilst traditional relational data warehouses enforce a pre-defined meta data schema when writing data to the warehouse, Hive performs schema on read, i.e., the data is checked when a query is launched against it. Hive alongside the NoSQL data warehouse HBase represent frequently used components of the Hadoop data processing layer for external applications to push query workloads towards data in Hadoop. This is exactly what we are going to do in Part 3 of this series when connecting to the Hive environment via the SAP Lumira Apache Hive standard connector and pushing queries through this connection against the word count output file.

First, let us get Hive up and running on top of HDFS.

Hive installation
The latest stable Hive release will operate alongside the latest stable Hadoop release and can be obtained from Apache Software Foundation mirror download sites. Initiate the download, for example, from spacedump.net and unpack the latest stable Hive release as follows. You may also want to rename the binary directory to something a little more convenient.
cd ~/
wget http://apache.mirrors.spacedump.net/hive/stable/apache-hive-1.1.0-bin.tar.gz
tar -xzvf apache-hive-1.1.0-bin.tar.gz
mv apache-hive-1.1.0-bin hive-1.1.0
Add the paths to the Hive installation and the binary directory, respectively, to your user environment.
cd hive-1.1.0
export HIVE_HOME={{pwd}}
export PATH=$HIVE_HOME/bin:$PATH
export HADOOP_USER_CLASSPATH_FIRST=true
Make sure your Hadoop user chosen in Part 1 (here: hduser) has ownership rights to your Hive directory.
chown -R hduser:hadoop hive
To be able to generate tables within Hive, run the Hadoop start scripts start-dfs.sh and start-yarn.sh (see also Part 1). You may also want to create the following directories and access settings.
hadoop fs -mkdir /tmp
hadoop fs -chmod g+w /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /user/hive/warehouse
Strictly speaking, these directory and access settings assume that you are intending to have more than one Hive user sharing the Hadoop cluster and are not required for our current single Hive user scenario.
By typing in hive, you should now be able to launch the Hive command line interface. By default, Hive issues information to standard error in both interactive and noninteractive mode. We will see this effect in action in Part 3 when connecting to Hive via SAP Lumira. The -S parameter of the hive statement will suppress any feedback statements.
Typing in hive –service help will provide you with a list of all available services [1]:

Hive equivalent to hadoop jar. Will run Java applications in both the Hadoop and Hive classpath.

metastore

Central repository of Hive meta data.

If you are curious about the Hive web interface, launch hive –service hwi, enter http://localhost:9999/hwi in your browser and you will be shown something along the lines of the screenshot below.

If you run into any issues, check out the Hive error log at /tmp/$USER/hive.log. Similarly, the Hadoop error logs presented in Part 1 can prove useful for Hive debugging purposes.

An example (continued)

Following on from our word count example in Part 1 of this blog series, let us upload the word count output file into Hive’s local managed data store. You need to generate the Hive target table first. Launch the Hive command line interface and proceed as follows.
create table wcount_t(word string, count int) row format delimited fields terminated by ‘\t’ stored as textfile;
In other words, we just created a two-column table consisting of a string and an integer field delimited by tabs and featuring newlines for each new row. Note that HiveQL expects a command line to be finished with a semicolon.

The word count output file can now be loaded into this target table.
load data local inpath ‘~/license-out.txt/part-r-00000’ overwrite into table wcount_t;
Effectively, the local file part-r-00000 is stored in the Hive warehouse directory which is set touser/hive/warehouse by default. More specifically, part-r-00000 can be found in Hive Directoryuser/hive/warehouse/wcount_t and you may query the table contents.
show tables;

select * from wcount_t;
If everything went according to plan, your screen should show a result similar to the screenshot extract below.

If so, it means you managed to both install Hive on top of Hadoop on Raspberry Pi 2 Model B and load the word count output file generated in Part 1 into the Hive data warehouse environment. In the process, you should have developed a basic understanding of the Hive processing environment, its SQL-like query language and its interoperability with the underlying Hadoop environment.

In the next part of this series, we will bring the implementation and configuration effort of Parts 1 & 2 to fruition by running SAP Lumira as a client against the Hive server and will submit queries against the word count result file in Hive using standard SQL with the Raspberry Pi doing all the MapReduce work. Lumira’s Hive connector will translate these standard SQL queries into HiveQL so that things appear pretty standard from the outside. Having worked your way through the first two parts of this blog series, however, you will be very much aware of what is actually going on behind the scene.