Archive for January 2018

Introduction

Raspbian is the main operating system for the Raspberry Pi, but there are quite a few alternatives. Raspbian is based on Debian Linux and there has been a good uptake on the Raspberry Pi which means that most Linux applications have ARM compiled packages available through the Debian APT package manager. As a consequence it’s quite easy to create a Raspberry Pi Linux distribution, so there are quite a few of them. Kali Linux is a specialist distribution that is oriented to hackers (both black and white hat). It comes with a large number of hacking tools for gaining access to networks, compromising computers, spying on communications and other fun things. One cool thing is that Kali Linux has a stripped down version for the Raspberry Pi that is oriented towards a number of specialized purposes. However with the apt-get package manager you can add pretty well anything that has been left out.

If you watch the TV show Mr. Robot (highly recommended) then you might notice that all the cool in-the-know people are running Kali Linux. If you want to get a taste of what it’s all about and you have a Raspberry Pi then all you need is a free micro-SD card to give it a try.

Are You a Black or White Hat?

If you are a black hat hacker looking to infiltrate or damage another computer system, then you probably aren’t reading this blog. Instead you are somewhere on the darknet reading much more malicious articles than this one. This article is oriented more to white hat hackers or to system administrators looking to secure their computing resources. The reason it’s important for system administrators to know about this stuff is that they need to know what they are really protecting against. Hackers are very clever and are always coming up with new hacks and techniques. So it’s important for the good guys to know a bit about how hackers think and to have defenses and protections against the imaginative attacks that might come their way. This now includes the things the bad guys might try to do with a Raspberry Pi.

Anyone that is responsible for securing a network or computer has to test their security and certainly one easy way to get started is to hit it with all the exploit tools included with Kali Linux.

Why Kali on the Pi?

A lot of hacking tasks like cracking WiFi passwords take a lot of processing power. Cracking WPA2 passwords is usually done on very powerful computers with multiple GPU cards and very large dictionary databases. Accomplishing this on a Raspberry Pi would pretty much take forever if it could actually do it. Many hacking tasks are of this nature, very compute intensive.

The Raspberry Pi is useful due to its low cost and small size. The low cost makes it disposable, if you lose it then it doesn’t matter so much and the small size means you can hide it easily. So for instance one use would be to hide a Raspberry Pi at the site you are trying to hack. Then the Raspberry Pi can monitor the Wifi traffic looking for useful data packets that can give away information. Or even leave the Pi somewhere hidden connected to a hardwired ethernet connection. Then Kali Linux has tools to get this information to external sources in a secretive way and allows you to remotely control it to direct various attacks.

Many companies build their security like eggs with a hard to penetrate shell and often locating a device on their premises can bypass their main security protections. You can then run repeated metasploit attacks looking for weaknesses from the inside. Remember your security should be more like an onion with multiple nested layers, so getting through one doesn’t give an attacker access to everything.

Installing Kali Linux

The Kali Linux web site includes a complete disk image of the Raspberry Pi version. You just need to burn this to a micro-SD card using a tool like ApplePi Baker. They you just put the micro-SD in your Raspberry Pi, turn it on and off you go. However there are a few necessary steps to take before you really start:

The root password is toor, so change this first time you boot up.

The Kali Linux instructions point out you need to refresh your SSH certificates since otherwise you get the ones included with the image. The download page has instructions on how to do this.

The image is configured for 8Gig so if you have a larger SD card then you need to repartition it to get all the free space. I used the GParted program for this which I got via “apt-get install gparted”. Note that to use apt-get you need to connect to WiFi or the Internet. Another option is to get Raspbian’s configuration program and use that, it works with most variants of Debian Linux and allows you to do some other things like setup overclocking. You can Google for the wget command to get this.

Update the various installed programs via “apt-get update” and “apt-get upgrade”. (If you aren’t still logged on as root you need to sudo these).

Now you are pretty much ready to play. Notice that you are in a nice graphical environment and that the application menu is full of hacking tools. These aren’t as many hacking tools as the full Kali distribution, but these are all ones that work well on the Raspberry Pi. They als limited the number so you can run off a really cheap 8 Gig micro-SD card.

I see people say you should stick to command line versions of the tools on the Pi due to its processing power and limited memory, but I found I could add the GUI versions and these worked fine. For instance the nmap tool is installed, but the zenmap graphical front end isn’t. Adding zenmap is easy via “apt-get install zenmap” and then this seems to work fine. I think the assumption is that most people will use the Raspberry version of Kali headless (meaning no screen or keyboard) so it needs to be accessed via remote control software like secure shell which means you want to avoid GUIs.

Summary

Installing Kali Linux on a micro-SD card for your Raspberry Pi si a great way to learn about the various tools that hackers can easily use to try and penetrate, spy on or interfere with people’s computers. There are quite a few books on this as well as many great resources on the Web. Kali’s website is a great starting point. Anyway I find it quite eye opening the variety of readily tools and how easy it is for anyone to use them.

Introduction

Previously I wrote an article on an introduction to Assembler programming on the Raspberry Pi. This was quite a long article without much of a coding example, so I wanted to produce an Assembler language version of the little program I did in Python, Scratch, Fortran and C to flash three LEDs attached to the Raspberry Pi’s GPIO port on a breadboard. So in this article I’ll introduce that program.

This program is fairly minimal. It doesn’t do any error checking, but it does work. I don’t use any external libraries, and only make calls to Linux (Raspbian) via software interrupts (SVC 0). I implemented a minimal GPIO library using Assembler Macros along with the necessary file I/O and sleep Linux system calls. There probably aren’t enough comments in the code, but at this point it is fairly small and the macros help to modularize and explain things.

Main Program

Here is the main program, that probably doesn’t look structurally that different than the C code, since the macro names roughly match up to those in the GPIO library the C function called. The main bit of Assembler code here is to do the loop through flashing the lights 10 times. This is pretty straight forward, just load 10 into register r6 and then decrement it until it hits zero.

GPIO and Linux Macros

Now the real guts of the program are in the Assembler macros. Again it isn’t too bad. We use the Linux service calls to open, write, flush and close the GPIO device files in /sys/class/gpio. Similarly nanosleep is also a Linux service call for a high resolution timer. Note that ARM doesn’t have memory to memory or operations on memory type instructions, so to do anything we need to load it into a register, process it and write it back out. Hence to copy the pin number to the file name we load the two pin characters and store them to the file name memory area. Hard coding the offset for this as 20 isn’t great, we could have used a .equ directive, or better yet implemented a string scan, but for quick and dirty this is fine. Similarly we only implemented the parameters we really needed and ignored anything else. We’ll leave it as an exercise to the reader to flush these out more. Note that when we copy the first byte of the pin number, we include a #1 on the end of the ldrb and strb instructions, this will do a post increment by one on the index register that holds the memory location. This means the ARM is really very efficient in accessing arrays (even without using Neon) we combine the array read/write with the index increment all in one instruction.

If you are wondering how you find the Linux service calls, you look in /usr/include/arm-linux-gnueabihf/asm/unistd.h. This C include file has all the function numbers for the Linux system calls. Then you Google the call for its parameters and they go in order in registers r0, r1, …, r6, with the return code coming back in r0.

Makefile

Here is a simple makefile for the project if you name the files as indicated. Again note that WordPress and Google Docs may mess up white space and quote characters so these might need to be fixed if you copy/paste.

IDE or Not to IDE

People often do Assembler language development in an IDE like Code::Blocks. Code::Blocks doesn’t support Assembler language projects, but you can add Assembler language files to C projects. This is a pretty common way to do development since you want to do more programming in a higher level language like C. This way you also get full use of the C runtime. I didn’t do this, I just used a text editor, make and gdb (command line). This way the above program has no extra overhead the executable is quite small since there is no C runtime or any other library linked to it. The debug version of the executable is only 2904 bytes long and non debug is 2376 bytes. Of course if I really wanted to reduce executable size, I could have used function calls rather than Assembler macros as the macros duplicate the code everywhere they are used.

Summary

Assembler language programming is kind of fun. But I don’t think I would want to do too large a project this way. Hats off to the early personal computer programmers who wrote spreadsheet programs, word processors and games entirely in Assembler. Certainly writing a few Assembler programs gives you a really good understanding of how the underlying computer hardware works and what sort of things your computer can do really efficiently. You could even consider adding compiler optimizations for your processor to GCC, after all compiler code generation has a huge effect on your computer’s performance.

Introduction

I predicted that 2018 would be a very bad year for data breaches and security problems, and we have already started the year with the Intel x86 specific Meltdown exploit and the Spectre exploit that works on all sorts of processors and even on some JavaScript systems (like Chrome). Since my last article was on Assembler programming and most of these type exploits are created in Assembler, I thought it might be fun to look at how Spectre works and get a feel for how hackers can retrieve useful data out of what seems like nowhere. Spectre is actually a large new family of exploits so patching them all is going to take quite a bit of time, and like the older buffer overrun exploits, are going to keep reappearing.

I’ve been writing quite a bit about the Raspberry Pi recently, so is the Raspberry Pi affected by Spectre? After all it affects all Android and Apple devices based on ARM processors. The main Raspberry Pi operating system is Raspbian which is variant of Debian Linux optimized for the Pi. A recent criticism of Raspbian is that it is still 32-Bit. It turns out that running the ARM in 32-bit mode eliminates a lot of the Spectre attack scenarios. We’ll discuss why this is in the article. If you are running 64-Bit software on the Pi (like running Android) then you are susceptible. You are also susceptible to the software versions of this attack like those in JavaScript interpreters that support branch prediction (like Chromium).

The Spectre hacks work by exploiting how processor branch prediction works coupled with how data is cached. The exploits use branch prediction to access data it shouldn’t and then use the processor cache to retrieve the data for use. The original article by the security researchers is really quite good and worth a read. Its available here. It has an appendix at the back with C code for Intel processors that is quite interesting.

Branch Prediction

In our last blog post we mentioned that all the data processing assembler instructions were conditionally executed. This was because if you perform a branch instruction then the instruction pipeline needs to be cleared and restarted. This will really stall the processor. The ARM 32-bit solution was good as long as compilers are good at generating code that efficiently utilize these. Remember that most code for ARM processors is compiled using GCC and GCC is a general purpose compiler that works on all sorts of processors and its optimizations tend to be general purpose rather than processor specific. When ARM evaluated adding 64-Bit instructions, they wanted to keep the instructions 32-Bit in length, but they wanted to add a bunch of instructions as well (like integer divide), so they made the decision to eliminate the bits used for conditionally executing instructions and have a bigger opcode instead (and hence lots more instructions). I think they also considered that their conditional instructions weren’t being used as much as they should be and weren’t earning their keep. Plus they now had more transistors to play with so they could do a couple of other things instead. One is that they lengthed the instruction pipeline to be much longer than the current three instructions and the other was to implement branch prediction. Here the processor had a table of 128 branches and the route they took last time through. The processor would then execute the most commonly chosen branch assuming that once the conditional was figured out, it would very rarely need to throw away the work and start over. Generally this larger pipeline with branch prediction lead to much better performance results. So what could go wrong?

Consider the branch statement:

if (x < array1_size) y = array2[array1[x] * 256];

This looks like a good bit of C code to test if an array is in range before accessing it. If it didn’t do this check then we could get a buffer overrun vulnerability by making x larger than the array size and accessing memory beyond the array. Hackers are very good at exploiting buffer overruns. But sadly (for hackers) programmers are getting better at putting these sort of checks in (or having automated or higher level languages do it for them).

Now consider branch prediction. Suppose we execute this code hundreds of times with legitimate values of x. The processor will see the conditional is usually true and the second line is usually executed. So now branch prediction will see this and when this code is executed it will just start execution of the second line right away and work out the first line in a second execution unit at the same time. But what if we enter a large value of x? Now branch prediction will execute the second line and y will get a piece of memory it shouldn’t. But so what, eventually the conditional in the first line will be evaluated and that value of y will be discarded. Some processors will even zero it out (after all they do security review these things). So how does that help the hacker? The trick turns out to be exploiting processor caching.

Processor Caching

No matter how fast memory companies claim their super fast DDR4 memory is, it really isn’t, at least compared to CPU registers. To get a bit of extra speed out of memory access all CPUs implement some sort of memory cache where recently used parts of main memory are cached in the CPU for faster access. Often CPUs have multiple levels of cache, a super fast one, a fast one and a not quite as fast one. The trick then to getting at the incorrectly calculated value of y above is to somehow figure out how to access it from the cache. No CPU has a read from cache assembler instruction, this would cause havoc and definitely be a security problem. This is really the CPU vulnerability, that the incorrectly calculated buffer overrun y is in the cache. Hackers figured out, not how to read this value but to infer it by timing memory accesses. They could clear the cache (this is generally supported and even if it isn’t you could read lots of zeros). Then time how long it takes to read various bytes. Basically a byte in cache will read much faster than a byte from main memory and this then reveals what the value of y was. Very tricky.

Recap

So to recap, the Spectre exploit works by:

Clear the cache

Execute the target branch code repeatedly with correct values

Execute the target with an incorrect value

Loop through possible values timing the read access to find the one in cache

This can then be put in a loop to read large portions of a programs private memory.

Summary

The Spectre attack is a very serious new technique for hackers to hack into our data. This will be like buffer overruns and there won’t be one quick fix, people are going to be patching systems for a long time on this one. As more hackers understand this attack, there will be all sorts of creative offshoots that will deal further havoc.

Some of the remedies like turning off branch prediction or memory caching will cause huge performance problems. Generally the real fixes need to be in the CPUs. Beyond this, systems like JavaScript interpreters, or even systems like the .Net runtime or Java VMs could have this vulnerability in their optimization systems. These can be fixed in software, but now you require a huge number of systems to be patched and we know from experience that this will take a very long time with all sorts of bad things happening along the way.

The good news for Raspberry Pi Raspbian users, is that the ARM in the older 32-Bit mode isn’t susceptible. It is only susceptible through software uses like JavaScript. Also as hackers develop these techniques going forwards perhaps they can find a combination that works for the Raspberry, so you can never be complacent.

Introduction

Most University Computer Science programs now do most of their instruction in quite high level languages like Java which don’t require you to need to know much about the underlying computer architecture. I think this tends to create quite large gaps in understanding which can lead to quite bad program design mistakes. Learning to program C puts you closer to the metal since you need to know more about memory and low level storage, but nothing really teaches you how the underlying computer works than doing some Assembler Language Programming, The Raspbian operating system comes with the GNU Assembler pre-installed, so you have everything you need to try Assembler programming right out of the gate. Learning a bit of Assembler teaches you how the Raspberry Pi’s ARM processor works, how a modern RISC architecture processes instructions and how work is divided by the ARM CPU and the various co-processors which are included on the same chip.

A Bit of History

The ARM processor was originally developed to be a low cost processor for the British educational computer, the Acorn. The developers of the ARM felt they had something and negotiated a split into a separate company selling CPU designs to hardware manufacturers. Their first big sale was to Apple to provide a CPU for Apple’s first PDA the Newton. Their first big success was the inclusion of their chip design in Apple’s iPods. Along the way many chip makers like TI which had given up competing on CPUs built single chip computers around the ARM. These ended up being included in about every cell phone including those from Nokia and Apple. Nowadays pretty up every Android phone is also built around ARM designs.

ARM Assembler Instructions

There have been a lot of ARM designs from early simple 16 bit processors up through 32 bit processors to the current 64bit designs. In this article I’m just going to consider the ARM processor in the Raspberry Pi, this processor is a 64 bit processor, but Raspbian is still a 32 bit operating system, so I’ll just be talking about the 32 bit processing used here (and I’ll ignore the 16 bit “thumb” instructions for now).

ARM is a RISC processor which means the idea is that it executes very simple instructions very quickly. The idea is to keep the main processor simple to reduce complexity and power usage. Each instruction is 32 bits long, so the processor doesn’t need to think about how much to increment the program counter for each instruction. Interestingly nearly all the instructions are in the same format, where you can control whether it sets the status bits, can test the status bits on whether to do anything and can have four register parameters (or 2 registers and an immediate constant). One of the registers can also have a shift operation applied. So how is all this packed into on 32 bit instruction? There are 16 registers in the ARM CPU (these include the program counter, link return register and the stack pointer. There is also the status register that can’t be used as a general purpose register. This means it takes 4 bits to specify a register, so specifying 4 registers takes 16 bits out of the instruction.

Below are the formats for the main types of ARM instructions.

Now we break out the data processing instructions in more detail since these comprise quite a large set of instructions and are the ones we use the most.

Although RISC means a small set of simple instructions, we see that by cleverly using every bit in those 32 bits for an instruction that we can pack quite a bit of information.

Since the ARM is a 32-Bit processor meaning among other things that it can address a 32-Bit address space this does lead to some interesting questions:

The processor is 32-Bits but immediate constants are 16-Bits. How do we load arbitrary 32-Bit quantities? Actually the immediate instruction is 12-Bits of data and 4 Bits of a shift amount. So we can load 12 Bits and then shift it into position. If this works great. If not we have to be tricky by loading and adding two quantities.

Since the total instruction is 32-Bits how do we load a 32-Bit memory address? The answer is that we can’t. However we can use the same tricks indicated in number 1. Plus you can use the program counter. You can add an immediate constant to the program counter. The assembler often does this when assembling load instructions. Note that since the ARM is pipelined the program counter tends to be a couple of instructions ahead of the current one, so this has to properly be taken into account.

Why is there the ability to conditionally execute all the data processing instructions? Since the ARM processor is pipelines, doing a branch instruction is quite expensive since the pipeline needs to be discarded and then reloaded at the new execution point. Having individual instructions conditionally execute can save quite a few branch instructions leading to faster execution. (As long as your compiler is good at generating RISC type code or you are hand coding in Assembler).

The ARM processor has a multiply instruction, but no divide instruction? This seems strange and unbalanced. Multiply (and multiply and add) are recent additions to the ARM processor. Divide is still considered too complicated and slow, plus is it really used that much? You can do divisions on either the floating point coprocessor or the Neon SIMD coprocessor.

A Very Small Example

Assembler listings tend to be quite long, so as a minimal set let’s start with a program that just exits when run returning 17 to the command line (you can see this if you type echo $@ after running it). Basically we have an assembler directive to define the entry point as a global. We move the immediate value 17 to register R0. Shift it left by 2 bits (Linux expects this). Move 1 to register R7 which is the Linux command code to exit the program and then call service zero which is the Linux operating system. All calls to Linux use this pattern. Set some parameters in registers and then call “svc 0” to do the call. You can open files, read user input, write to the console, all sorts of things this way.

Here is a simple makefile to compile and link the preceding program. If you save the above as model.s and then the makefile as makefile then you can compile the program by typing “make” and run it by typing “./model”. as is the GNU-Assembler and ld is the linker/loader.

Note that make is very particular about whitespace. It must be a true “tab” character before the commands to execute. If Google Docs or WordPress changes this, then you will need to change it back. Unfortunately word processors have a bad habit of introducing syntax errors to program listings by changing whitespace, quotes and underlines to typographic characters that compilers (and assemblers) don’t recognize.

Summary

Although these days you can do most things with C or a higher level language, Assembler still has its place in certain applications that require performance or detailed hardware interaction. For instance perhaps tuning the numpy Python library to use the Neon coprocessor for faster operation of vector type operations. I do still think that every programmer should spend some time playing with Assembler language so they understand better how the underlying processor and architecture they use day to day really works. The Raspberry Pi offers and excellent environment to do this with the good GNU Macro Assembler, the modern ARM RISC architecture and the various on-chip co-processors.

Just the Assembler introductory material has gotten fairly long, so we won’t get to an assembler version of our flashing LED program. But perhaps in a future article.