This project is submitted for

Description

ChipWhisperer is the first open-source toolchain for embedded hardware security research including side-channel power analysis and glitching. The innovative synchronous capture technology is unmatched by other tools, even from commercial vendors. Similar commercial equipment is too expensive ($30k+), and being closed-source limits usefulness for academics. Instead this project bridges the gap between academic research and in-the-trenches engineering. Several peer-reviewed publications describe the design, matched with hours of hands-on tutorials for getting started.

The objective of ChipWhisperer is nothing short of revolutionizing the entire embedded security industry. Every designer who uses encryption in their design should be able to perform a side-channel attack, and understand the ramifications of these attacks on their designs. The open-source nature of the ChipWhisperer makes this possible, and my hope is that it becomes the start of a new era of hardware security research.

Details

Links to project details (GIT, Wiki, Docs) are on the left. The laziest intro to this project is the 2-minuite video created for THP Quarterfinals (i.e. the first video I made):

Or even better, see my full 5-min final video (there is also the semifinal video, see link at left):

The most commitment is reading this page! Well if you are still reading, strap yourself in...

What's This About?

Lots of people have tried to design secure systems, and alas there is lots of failures. But what if you did everything correct: no buffer overflows, no unsanitized inputs, no default passwords. Unfortunately this isn't good enough - even perfectly implemented encryption algorithms such as AES-256 will reveal encryption keys. It's not due to incorrect implementation, it's a fundamental artifact of the design.

This has been known for a long time - the first paper on this was published in 1998. But if you are an engineer or independent researcher tools to get started are expensive, or require you to do a lot of work yourself scripting together lower-cost tools. This project is my attempt to eliminate this problem.

I'm eliminating the problem for good by making my tools open source. Because this whole area is an active research area, the tools need to be open source. This isn't a case of attempting to seem sexy by adding the word 'open-source', but placing something of commercial value into the open-source domain, in the hope it spurs a larger community. Think of something like Wireshark - it's extremely valuable, and could easily be sold as a high-end product. But most of that value comes from it being open source, and hence containing a huge array of protocol dissectors, far beyond what a commercial vendor could support. For my designs, part of the larger community includes hours of tutorials on this area - the objective of ChipWhisperer is not just the engineering that went into the software and hardware, but having tutorials and documentation that could be used as a complete course in side-channel analysis and glitching.

It's also worth stressing that there is no 'tricks' to the open-source nature of this project. It's not just part of the design that's open source, I'm not using a restrictive non-commercial license, and I've already had other people build these units from PCB design files. The objective of this is project is to open up this area of research to a much wider audience, and the commercial value I lose from limiting how much I can charge for the tools (since anyone can make them) is far offset by the greater value added to the community.

It's useful to point out how critical this field of embedded security has become, and why it's interesting to see attacks against AES (which I tend to focus on in my demos). The 'Internet of Things' requires some wireless communication network - be it IEEE 802.15.4, ZigBee (which uses 802.15.4), or Bluetooth Low Energy. Since these are wireless protocols, security is of paramount importance - and the designers acknowledge that. Attacks against AES are interesting because all three of the previous protocols use AES-128 for security. Unfortunately AES-128 isn't just a "check box" that indicates your system is secure, despite one document listing that because Bluetooth low energy has 128 bit AES, it's "secure against attack and hacking" (see page 45). The idea that implementations are secure because the underlying algorithm is secure will cost somebody a lot of money when it blows up in their face, and they have to fix millions of already deployed devices.

Assuming designers aren't foolish enough to send encryption keys over SPI (see Travis Goodspeed's attacks), and have actually done the implementation correctly, and haven't introduced backdoors, we can still break the AES implementation. This isn't a theoretical attack, but a real-world attack that every embedded designer needs to understand. It's clear that very few designers are aware of this issue, based on how infrequently it is brought up when looking over datasheets, design specifications, and application notes. And no, it's not enough to use hardware accelerators - an attack has been demonstrated against the XMEGA crypto engine (presentation slides, details on page 77 of thesis, article at ACM behind paywall). See the 2684 pages of Bluetooth specification for example, not a hit for 'side channel' to be found:

ChipWhisperer won't secure the internet of things. But it will hopefully jolt people into believing that "secure because math" isn't a good enough answer. Even these theoretically unbreakable cryptographic algorithms have great weaknesses during implementation, and they may be much easier to break than you ever assumed. So let's start looking into how this works.

Side channel analysis takes advantage of the fact that changing the state of a digital line uses a small amount of power. Switching from a 'zero' to a 'one' takes a small charge for example. Many digital ICs will also push the lines into a 'pre-charge' state in-between transitions to reduce the worst-case time delay, such that on every cycle the bus goes from an intermediate state to a final state. For us this means we can almost directly infer the Hamming Weight (number of one's) on a digital bus based on the power consumption.

So what does that give us? Consider that we had the following system, which is a simple XOR of some input data with a secret key, where we don't see the final output:

While we can build the following matrix, given some known inputs, along with the associated hamming weights based on the power measurement:

Then one can simply guess what the secret key was! Based on our guess we can determine which guess best aligns with the real measurements. In the following example if the secret key was 0xEF, we would end up with the hamming weight matching our observations:

Finally, the reason this works so well is that it allows us to break a single byte of the encryption key at a time! Thus the minimal guess-and check means guessing 256 possibilities for each byte, and doing that 16 times:

Glitching is another devious attack on embedded systems. This takes advantage of the fact that at some point in your code you'll have a test of the input password, signature, or whatever else. So consider we have this code:

It's actually possible to manipulate the system to cause that check to fail, or for instructions to be skipped. One method of doing this is inserting a quick glitch into the clock, as the following example from the ChipWhisperer shows:

Even somewhat more interesting, is the fact you can do this with 'power glitching'. This means inserting some sort of low-voltage spike into the VCC line of the device you are targetting. This works even for advanced chips, like a Raspberry Pi or Android Smartphone. The VCC line glitch might look like this:

This can cause a user-land application to fail on something like an Android smartphone - here is an example where I'm causing an incorrect calculation, this example comes from my project log update:

There is a full ChipWhisperer VCC glitching tutorial which targets an AVR microcontroller, in the same fashion as the clock glitching tutorial. Now that you get an idea of why these attacks are so interesting, let's look at what ChipWhisperer can do.

The system is a fusion of closely operating FPGA blocks and a Python interface communicating over a high-speed USB 2.0 interface. It even uses partial reconfiguration to reprogram the Spartan 6 FPGA during operation to fine-tune certain parameters that would otherwise be fixed when implementing the FPGA. Remote database storage of traces is used to power high-performance analysis, levelling the playing field for the independent researcher who doesn't have access to costly computing hardware.

Having the computer connectivity of the hardware is fundamental to the operation of this device. In addition it's possible (and sometimes required) to have the device split over several locations via a network. This can mean the ChipWhisperer is running on one computer, with data being saved to a larger network store. Even for researchers who do have local access to a high-performance computer, the remote storage is often useful, since the physical attack may be occurring at a different spot from the analysis computer.

The blocks themselves can be implemented into many different FPGAs - this system is not limited to the capture hardware created as part of this project.

This project has spawned a number of useful modules, some of which are already being used in other open source projects. The following section briefly summarizes some of the hardware modules, software modules, and techniques which I created for ChipWhisperer (but are useful for a variety of open-source projects).

Synchronous Sampling: The synchronization of the sample clock to the device clock fundamentally differentiates the ChipWhisperer from commercial solutions, even the extremely expensive ones. This allows the ChipWhisperer to break systems that would otherwise require 5GS/s or faster oscilloscopes according to published academic papers. Currently the ChipWhisperer is the only solution (commercial or otherwise) using synchronous sampling with variable phase offset, allowing it to attack devices with internal oscillators or with varying-clock countermeasures. The use of Synchronous Sampling is the basis for three academic papers (including a journal article), demonstrating the innovation of this technique. More details will be presented later.

OpenADC: The OpenADC was the first module created, and is the high-speed ADC block. In addition I've published the FPGA code for storing samples and downloading those samples to the computer via Python. Besides my academic papers using the OpenADC, I've found a few other papers (1,2, 3) using the OpenADC for doing research into low-power wireless networks and crypto. It's extremely exciting to see my work being used already! More details of the OpenADC are given later in this description.

PyQtGraph Parameter Tree Updates: This project uses PyQtGraph for both graphing along with setting of parameters for almost the entire project. This involved some updates to the PyQtGraph implementations, specifically the ability of setting parameters to automatically download them to the hardware, and verify the setting in hardware.

FPGA Project File Generation: The Xilinx ISE Project navigator files are an XML based format, but have a serious problem when attempting to commit them to GIT: they change for every version of ISE! In addition you need different project files for each FPGA device supported. This causes many headaches: commit conflicts for different versions, along with maintaining multiple files for each project. ChipWhisperer uses a simple text file to automatically generate both the ISE Project file and associated COREGen files, see details in the log post.

FPGA SAD Trigger: The Sum of Absolute Difference (SAD) trigger FPGA block performs real-time pattern matching of a stored pattern to the incoming waveform. This means the pattern matching runs at the ADC speed (i.e. 105MS/s), and was successfully implemented in a low-cost (i.e. fairly slow) Spartan 6 FPGA. This would be trivial to do in software, but unacceptably slow and with jitter relative to the device clock. The FPGA block is able to detect a match exactly six sample clocks after the final sample of the pattern being digitized. More details of this are presented later.

FPGA Dynamic Clock Blocks: FPGAs provide various blocks for clock control, but typically expose a fairly complex interface. As part of my project I designed several modules that simplify this interface, allowing you to access the dynamic phase shift and dynamic frequency generation blocks. There is even Python code for automatically configuring the blocks given a desired output frequency for example, and the proper parameters are dynamically downloaded to the blocks. In addition this system supports an advanced feature called Partial Reconfiguration to allow you to dynamically tune all features of the clock module blocks, even a number that according to Xilinx are fixed at design-time.

FPGA USB Interface: A classic problem in FPGA designs is where one needs to control a few registers from a computer. I designed my own interface for this, which can run on almost any other FPGA development board, provides the ability to almost max out high-speed USB when downloading data, and has a simple Python interface. More details of this are presented later.

Waveform Plotting: The ChipWhisperer requires high-speed plotting of many waveforms. This is primarily handled by the PyQtGraph library, but that library has been extended to support additional features such as a dock with a toolbar for accessing various plot functions. Like most of the ChipWhisperers source it's all very modular, meaning you can rip that waveform display code out for something else.

When starting this project, it was destined to be open source. ChipWhisperer does not aim to be just a complete tool, but also a useful platform for further research. For example I assume 99.5% of users will never modify the FPGA code, and couldn't care that it's open source. But there is still that 0.5% - and the value of the open-source code to that 0.5% is what makes it worthwhile. So who are the 0.5%? I assume they would mostly be researchers; the area of embedded hardware security is an extremely active research area. There's a number of conferences and journals in this area, and researchers in this area are no strangers to FPGA work, or even designing custom chips. For that 0.5% of users, this project could save them from months to years of work (since they don't need to redo my work).

The core Python code is more likely to be modified (since it requires less effort than the FPGA design), but even then I've tried to make it as easy to 'hack in' extra modules as possible. I know from experience that of those that DO wish to modify the code, they will mostly want to get something working quickly. This is part of the reason the code supports all sorts of dynamic Python execution and loading of external modules (discussed in detail later).

By sharing the ChipWhisperer design, it provides a useful starting point for these researchers to build upon. If you decided to work on a real-time analysis algorithm implemented in an FPGA, the ChipWhisperer would be a perfect platform for your work. In addition the platform is commercially available, meaning that when researchers disseminate results based on the ChipWhisperer, it's trivial for someone else to duplicate or verify the results.

I've already received feedback from people using portions of this project. The ADC board (OpenADC) has been used in a number of other projects, and I've even had people in other countries thank me for providing the designs, as it's difficult for them to import PCBs in their country. But since they had the design files, they could have them made locally without issue.

Having previous experience with open-source projects, I'm familiar with many of the issues that hit these projects. In particular documentation is often a problem. Documentation doesn't seem as sexy as hard-core engineering, but unless the project is well documented it has zero hope of continuing once the main developer moves on or is hit by a bus. ChipWhisperer has a massive documentation repository, and it's still growing as this project is in it's infancy! Let's look at that next.

There are several main sources of documentation:

Full Project Documentation: This is the major documentation, and includes both python API documentation, along with detailed instructions for installing python modules, using the hardware, etc. This documentation aims to be a polished resource.

The Wiki: The wiki contains additional detail such as most recent releases, instructions for building hardware, BOMs, various small notes, and example traces. This aims to be a 'living' documentation so is subject to frequent changes, and has many short and simple pages such as PCB errata.

Presentations & Whitepapers: There are a number of presentations I've previously given. The link to the left includes a few of the earlier Blackhat presentations, which have a long whitepaper too!

Videos: I've got several hours of video tutorials. See the full list later on in the 'video' section of this document.

The GIT Repository: Some people believe code is self-documenting. I've tried to help it along with docstrings and whatnot, but the GIT repository is the ultimate source for all things about this project.

Here's a few pictures of the documentation:

Some of the blocks on the main PCB are shown below. The OpenADC is my open-source ADC board which was designed as part of this project. The rest of the chips have various glue logic for easing interface to the FPGA, and a USB-connected AVR for 'additional stuff'. This can mean using it to program a target, talking some specific protocol, etc.

Target IO Interface

Twelve IO lines pass through level translators for use in a connection to the target device. Two of them use high-speed translators, which can be used for generating a clock to the target device, triggering a glitch, or receiving a clock from the target device. The ChipWhisperer can even be used as a simple clock generator for digital devices too - from the GUI a requested clock frequency is automatically generated by the internal clock module. Normally the output is fed over a standard ribbon cable. While not an impedance matched connector, for many experiments this performs 'well enough' in practice. The following shows some figures after 8 inches of ribbon cable. Note the 'near end' waveform taken at the back-side of the connector for the 198 MHz test frequency shows less duty cycle distortion compared to the far end waveform. This suggests using a shorter cable or designing a breakout board to plug into the header with SMA cables might be successful for high frequencies to reduce duty cycle distortion. The oscilloscope used in these tests had a 350 MHz analog bandwidth, meaning the 198 MHz waveforms don't have all the detail present (overshoot/undershoot + edges attenuated severely).

AVR-USB Connection

The AVR-USB connection is an AT90USB162 device. It can be programmed with a AVR-ISP MK2 clone firmware from the LUFA project, or can be programmed with other interface code such as my example USB-SPI driver. This allows for a complete development system, since you can use this device to program new cryptographic code into the device being tested.

FPGA Module Power Supplies

Originally, I wasn't sure if the LX25 FPGA would be powerful enough, so the system was designed to accept larger FPGA modules with everything up to a Spartan 6 LX150. These larger FPGAs require higher current sourcing capabilities, so the supplies were originally designed to meet these higher current limits.

As an example the 2.5V rail is being tested with an electronically switched (via a relay) load in the following figures. Some contact bounce of the relay can be seen, but notice there is little change in the noise on the supply rail even at these high currents.

Details of the test jig are shown in my project log update.

External PLL

Due to limits in the FPGA clock blocks, an external PLL is also present. Whereas the Spartan 6 clock blocks are spec'd down to an input frequency of ~5MHz, the external PLL chip can operate down to ~1MHz. This allows an extended input frequency range, in addition to providing a LVDS input path for the clock.

Multi-Target Victim Board

The multi-target victim board is a simple demonstration platform. This can be programmed with various cryptographic algorithms, and provides the ability to monitor power consumption and insert clock glitches. It can be used stand-alone with a normal oscilloscope (i.e. it is not tied to the ChipWhisperer Capture hardware) because of the Low Noise Amplifiers which can boost the small signals to levels a regular oscilloscope can measure.

OpenADC

The OpenADC board is a modular ADC board. It uses the 'PMOD' connectors which are supported by a bunch of FPGA boards, especially those from Digilent. It's been designed to be fairly low-noise, and I've had a lot of positive feedback from that design! It's a simple 2-layer board, although it's been carefully routed such that the bottom layer is almost entirely ground plane, check it out:

There's no separate analog/digital ground, instead the layout tries to keep the analog and digital portions separated such that digital ground currents won't flow over the analog portions. I'd love to hear your feedback, but it seemed in my research that separating them can add issues with ground loops when the separation isn't 100% perfect (i.e. you run a digital trace over the analog ground, causing the digital return current to take a much longer path than it would have with a single plane).

The 3.0V analog supply for the ADC comes from an on-board LDO regulator, which filters the 3.3V input supply. The LNA chip required a 5.0V supply so there is also a 3.3V to 5.0V switched-capacitor based DC-DC on board. You'll see a number of ferrite beads (look at the Lx parts) that form supply filters.

The OpenADC has already been used in other academic publications beside my own. I have no connection with the following authors, I happen to discover their paper while searching my own references: (1),(2),(3).

Links to Schematics, Gerbers, BOM, Assembly Instructions

Everything is done with 2-layer PCBs to keep cost down. The following is a list of most of the hardware design files involved in this project, although see the GIT repository for full project design files, including beta/incomplete boards. Some of the links go to the GIT repo, and you have to hit the "Download" link to get a .zip of that folder.

The following are the "core" files which are used to build the ChipWhisperer Capture Rev2:

Project Logs

It's happening! For the past few months I've been working on making a lower-cost version of the ChipWhisperer. I've now completed enough prototype tests to be confident of my ChipWhisperer-Lite:

Because the cool thing to do is have crowd-funding, I'm running a Kickstarter to make this widely available! The assembled/tested boards will go for about $180 USD. The single-unit Digikey parts cost is about $90 USD and it's a 4-layer PCB, so it's a pretty reasonable price (I think). Of course it's still open-source so you can build your own or do whatever else you want.

I'll have more details of this coming up, but the Kickstarter is live so please check it out! I've got some prototype boards made up already so plan on having a give-away if there is enough to spare (got to check they all work first).

Hard to believe it's been 2+ months since my last update! Anyway I've been working on an updated software release, along with firmware for the new ChipWhisperer-Lite. But to go with the new software release I wanted to (finally) have signed drivers, so you can do something like this:

No more "unsigned driver" warning! Unfortunately it just costs money to do this - there's no easy way around it, so I had put it off for the longest time. There's a great guide by David Grayson to the whole process which I used.

If you also need to repeat this I posed some step-by-step instructions to my own blog too. They are basically just notes I used form David, so don't add much. So in conclusion, that is the least fun way to spend $562 (the cost of a 3-year certificate). On to more fun ways to spend money!

Lately I've also been having fun with a combination of the T962A oven with open-source firmware for soldering, and a Silhoutte Cameo for cutting solder paste stencils. It's a much faster way of assembling compared to my mostly hand-soldering or hot-air iron I'd been using in the past. I hope to have a complete video up on that - for now I've got a quick video on the stencil cutter part:

This also gives you a view of the assembly of the ChipWhisperer-Lite board! Hope you find it interesting.

For a while now I've been planning the low-cost version of ChipWhisperer, which I call 'ChipWhisperer Lite'. The idea is to integrate everything onto a single board, and use a lower-cost FPGA (Spartan 6 LX9 instead of LX25). Some features will have to be dropped (such as the SAD trigger) to fit this device, but for most users it'll be perfect. I also plan on bringing this hardware with me for the training course I'll be running.

Here's a quick screen-shot of the basic PCB parts placement. The right half is the 'Target Board', which will be a break-away from the capture hardware (left half):

One of the more interesting aspects of the design will be the change of USB interface chip. To keep costs down I'm not using the Cypress EZ-USB chip. Instead I'll be using a Atmel SAM3U, and providing my own firmware. I'm hoping to publish the firmware for use in other projects - lots of people want high-speed USB connectivity for their FPGA project. I think this can be made into a nice generic solution, and will be cheaper than anything else (including the FTDI FT2232H).

I've got a SAM3U-EK dev-kit which I've done some experiments on, and it seems to be 'sufficiently fast'. It won't be as high-bandwidth as the EZ-USB solution which can almost max out the USB transfer bandwidth, but considering that this board doesn't have on-board SDRAM, that won't be an issue (not as much data to transfer!).

As a side-note: I've got some longer-term plans to release a whole bunch of USB firmware solutions for low-cost chips. Both Atmel & Microchip make some nice USB microcontrollers which are very low-cost - with the right firmware they could be turned into USB-Serial, USB-Parallel Port, USB-Keyboards, etc. But back to the project at hand:

Current estimates of the BOM cost are about $100 in single-unit quantities (the analog portion alone is ~$40), so would expect to sell around $300 (BOM x3 is fairly reasonable estimate). But this version won't have BGA parts - all TQFP (albeit small pitch), so it should be well within the realm of 'hand assembly' possible. This will make it the lowest cost version available by a long shot, and I really want to keep a version of this hardware at the "hand-assembly" level.

I also appreciate all the kind remarks about this project getting second place for HaD Prize too! It's been an honour to make it so far in the face of such intense competition, and having seeing a *lot* of the other cool projects that were involved (beyond just the final five), I was very pleased with those results. It's been a fun ride of course, but lots of work still to do, so no time to rest yet ;-)

As a reminder: I'm headed to CARDIS next week in Paris, so hopefully get a chance to meet some people in person!

Mooltipass

I had a few questions about if this project could attack the Mooltipass project. I haven't had a lot of time to look it over in detail, but the actual AES-256 implementation isn't too critical for that project. That's because the only time the key is in use is when you physically have the card inserted into the device - without the card the system won't be processing encryptions.

From a side-channel analysis perspective, attacking the Mooltipass main unit would be pointless. If you had physical access to tamper with the hardware, there is easier things to do. The real question is how secure is the Smart Card used for storing the secret key. This is something I haven't investigated... I know there was previously published attacks against these devices, see for example these slides and paper [2011], also this paper [2012] and this paper [2010]. I've barely had time to read those in depth, so will reserve any judgement until I've got more time! In the mean-time I ordered a bunch of the smartcards used by the Mooltipass to play with. The most likely scenario is that it's possible to break the card, but it takes long enough that you would (hopefully) notice it missing, and could invalidate the affected passwords. If someone really wanted to get your passwords they could always use a big wrench anyway, so it's all about managing the threat.

EDIT: Something else that's worth point out, is it's trivial to purchase smart cards with higher security levels. They are considerably more expensive than the ones being used in the Mooltipass, but if you really wanted to ensure your Smart Card is secure it's possible to do. This might require some work on both the smart card & the mooltipass itself, but one of the nice parts of open hardware is nothing is set in stone. If this was a commercial solution they would probably only ever support the one card, but you aren't locked in like that due to the open-source nature.

0.08 Release

I pushed the 0.08 release. This includes a few goodies, the most interesting is a tutorial on replicating the XMEGA attack. It also includes some fixes for VISA-connected scopes along with documentation for them. That's all for now!

Long ago when I started prototyping ideas, I was using a $89 FPGA Board (Avnet LX9 board). The ChipWhisperer system has always supported this board, but it's not as well documented as my newer, fancier hardware. I wanted to make a log post to demonstrate this system still working, as if you have a 'passing interest' in side channel analysis (SCA), this setup is a MUCH cheaper method of getting started. It's a lot slower than the full ChipWhisperer Capture Rev 2 hardware and due to the smaller FPGA doesn't support any features besides simple analog capture. It's all the same FPGA code (using my project management described in another log file), just with stuff missing to fit into the LX9 device.

This isn't the only cheap option, the Papillio Pro can also be used too which is even cheaper ($85), and there is a cheaper OpenADC you can build with more limited sample rate called the OpenADC Lite. Here's some of the cheap options:

Setting up the LX9 board with the homemade OpenADC looks like this, where the total cost will be in the $150 vicinity ($89 for FPGA, about $50 to build the OpenADC, $10 for random stuff):

You can also build an AVR target on a breadboard, here shown with a 'commercial' OpenADC module:

For your viewing pleasure here is a video version of the complete setup & attacking AES on the AVR. You can see it still only takes about 30 measurements of the power consumption to break AES, so it's the same resulting attack performance as my full-blown hardware. The main difference you see here is the slower capture, but it still takes under 2 minuites to perform all 50 power measurements, so it's far from unreasonable speed!

Fun with Plot.ly

Through other entries in the HaD Prize I became aware of plot.ly, which is basically something akin to a 'cloud-based' plotting service. This sounds somewhat insane at first - of all the things I want in the cloud, plotting complicated mathematical graphs probably isn't one of them.

But when combined with distributed analysis and remote solving, this might be more reasonable than you expect. In particular if you have multiple processes running, at some point you need to collect and plot (or otherwise analyze) this data. Due to the mentioned embarrassingly parallel nature of the side-channel analysis problem (see project details for more on this), you can frequently plot the output of each analyser instance independently. Since each instance is probably headless, and you almost definitely wouldn't want to manually combine this data, using a cloud-based service actually might be easier than writing a protocol to collect the data and then plot it locally. In addition if you are using something like Amazon EC2 anyway, using plot.ly means you would be communicating from Amazon's network directly to plot.ly, which is probably going to be much faster than my internet connection downloading data from all the independent nodes!

So despite by misgivings about cloud-based plotting service, I do see for some limited applications where it is easier than plotting locally on my computer. So if you are willing to be dragged kicking and screaming through a cloud-based plotting example, I'll show you how it works. The objective is to end up with a web-based graph which looks like this for example (correlation output vs. sample number):

If you want to see this yourself go to https://plot.ly/~coflynn/1 (and ignore the attempt to get you to sign up). This is showing you that for Byte 0, the maximum correlation was detected around sample point 38, with the correct key-byte hypothesis of 0x2B. I'll demonstrate how I created this graph on my attack system.

First off, we'll again use the Python-based analysis script. One of the default API calls which will be attempted is called doneAnalysis() once the attack is complete. It's at this point we'll plot the required data. The default template does not have a doneAnalysis() call, so you can simply add one like the following shows:

Inside that doneAnalysis() function, you can paste the following, where the user & token can be seen by going to the example plot page when you are signed in:

Pretty easy! Most of the above code was adapted from my OutputVsTime class in the ResultsPlotting.py file, including the min/max fill code. This is required since you'd otherwise be plotting 256 lines overtop of each other, which is very slow when displaying on plot.ly (see a test if you don't believe me). You can of course just plot all the lines if you want, but this way you only have to plot 3, which is much...

First off - I decided to write some FAQs. I was going to add them to the main project details, but I've hit a maximum character limit in my detail! So for now I've copy/pasted them here, sorry for the big old wall of text! I also wanted to talk about my FPGA project management, see after the FAQs for those details.

FAQs

Q: Why did you use an FPGA, they are pretty complicated, can you get away using a raspberry pi or something?

A: Unfortunately it's not possible. You need to capture at a very fast rate, even something like the HackRF would be very boarderline on the capture speed. The objective is to capture on every clock edge - so even a 7.37MHz micro needs at least 7.37 MS/s, but I usually use 4x the clock rate, requiring 29.4 MS/s. This isn't even for a medium-speed ARM type device!

It's possible to use Software Defined Radio (SDR) tools, but it's far from ideal, and you'll be limited in your attack targets. The SDR tools without an FPGA cannot perform the advanced pattern-based trigger, and may not even support a basic edge-type trigger.

If you are capturing with a regular oscilloscope instead of my synchronous capture method the bandwidth requirement is about 10x the clock rate of the DUT, sometimes even (much) higher for hardware encryption implementations. In addition the FPGA adds a lot of features such as triggering based on analog pattern and super high resolution glitch generation which is impossible without the FPGA.

Q: Couldn't you just have used a Red Pitaya for the hardware?

A: Had the Red Pitaya existed when I first started this project, I probably would have tried to use it, as it's a pretty nice development environment! Unfortunately it still cannot be used as-is, so I still would have ended up with something like the ChipWhisperer in it's current form. The front-end of my OpenADC adds up to +55dB gain which is software adjustable, whereas the Red Pitaya has only a fixed small range which would require an external amplifier. The OpenADC also has a huge analog bandwidth (beyond the sample rate for undersampling applications, which the synchronous sampling can exploit), whereas the Red Pitaya limits you to the Nyquist rate. Finally the ChipWhisperer still adds a bunch of extra features such as the target programming interface, power supplies for probes, high-slew-rate level translators for clock input and output, an external PLL chip for low-frequency operation, and a high-speed USB interface.

Q: Couldn't you have just used GNURadio as a software framework?

A: My original intention was to use GNURadio as a GUI, since it has a lot of built-in blocks such as filters which would be useful. But the requirements of side-channel analysis are different enough from communications systems to make the use of GNURaido seem like a hack, and would require considerably more work than my current GUI.

In addition the install procedure for GNURadio on Windows can be pretty complicated. I really wanted as straight-forward an application as possible, and I felt that the barrier to entry for most people would be considerably raised if using GNURadio. Long-term, I may end up with blocks for GNURadio that perform some of the analysis.

Q: What's the deal with the commercial version?

When I first started this project, one of the main requests was that people wanted to just buy the finished hardware. Spinning out of that is the complete commercial version with fancy waterproof case etc. It's no different from what you can build yourself, but comes fully tested and in a nice storage case!

The ChipWhisperer name is a registered trademark, and this was done in part to allow me to reliably sell a commercial version of the project. Only products sold via NewAE Technology Inc are allowed to use the ChipWhisperer name, but this doesn't stop someone else from selling a version under a different name. I want to ensure that the commercial version is of the highest quality, and wanted to avoid someone selling a cheap poorly-assembled version that at first appearance...

Lots of fun updates! First, I'm honoured to have made it to the final five for the Hackaday Prize. There has been a ton of work showing up on all these projects, so it must have been a razor-thin margin between the projects that made it and the ones that didn't, and a lot of time spent by the judges! Thanks to all involved for this.

Next, I've pushed the 0.08RC1 release, available on the software release page. The 0.08RC1 release supports all the features you need for the AES-256 bootloader attack, so go ahead and download the sample traces and break the bootloader yourself. This includes the Python-based analysis script feature which I think will become one of the core functions for the CW-Analyzer due to it's flexibility. I'm still hoping to add some additional support for remote networked capture boxes in the coming week or two.

Finally, two academic papers that might be of interest to you (I swear). The first explores the use of clock recovery with the ChipWhisperer. You can read a version of it on IACR EPrint, and I'm happy to announce it was accepted into the Journal of Cryptographic Engineering, so will appear there at some point in the future. This paper goes to show that having open hardware and software will make it easier than ever for researchers to duplicate my work... it's simply not possible to have this level of transparency in how I obtained my research results without invoking the open-source model.

The second is a paper based on my work attacking the AES-256 bootloader, which is still in progress. The pre-print is available from IACR E-Print service (not sure where the final version will end up yet, if anywhere). But it goes to show the results of attacking the XOR operation in the I.V. of the AES CBC mode.

Speaking of papers, I recently discovered that my OpenADC project has been used by a few other researchers doing work into low-power wireless networks and ended up in some published papers. It's definitely cool to see these tools propagating 'in the wild'. I've added links to my main project description with those papers.

Just a quick update - I've got a major chunk of the documentation for the AES-256 bootloader attack online. This is uploaded as part of the online ChipWhisperer documentation. The fun thing is you don't need hardware to follow along - I've uploaded an example capture (part of the larger set of examples). You need to clone the ChipWhisperer GUI in GIT for this to work and follow the installation instructions elsewhere in the online documentation. I'm going to shortly make a release of the ChipWhisperer software which includes the features required for this tutorial. I hope to upload a video tutorial of this when I get some more time - so far the online documentation isn't quite complete (doesn't include the IV example yet or signature), but the cool part is there at least!

So go ahead - spend an afternoon breaking an implementation of the AES-256 decryption! What fun! If you get stuck leave a comment here or register on the NewAE Forum. And if you'll be around Paris hunt me down at CARDIS 2014, as I'll be attending that next month.

First off - I'm planning on attending CARDIS 2014 which is Nov 5-7 in Paris, France. If you happen to be going let me know, as will have my ChipWhisperer platform with me. I'll try to bring some blank PCBs to give away, although I've got to check on my stock status!

The second thing I wanted to show is the iterations of my 'product test boards'. This part of the blog post isn't part of the ChipWhisperer open-source project but related to it. As part of selling a commercial version, I want to ensure that commercial version meets certain quality specifications. So I needed ways to test the board, and I wanted to catch even small errors (wrong part values mounted, etc). My first test jig was just to verify the switching regulators on the main board. This used mechanical switches to switch a load on-off, and a mask limit test on an oscilloscope for pass/fail:

This was transitioned to a board under USB control, using a USB-HID implementation via the LUFA library. This now meant that the loads could be switched on/off electronically, with the same idea of an oscilloscope performing the mask limit testing. The left half of my test board provides this feature, but it also adds a lot more. Here is the automated power supply test jig:

There's still a lot more things that need testing! In particular there is 6 GPIO connections. These are critical to the ChipWhisperer's function, as they are expected to pass everything from communications to high-speed glitches. A bad solder joint or missing decoupling capacitor on the driver might allow it to pass a 'simple' function test (high/low), but fail when you tried to use it in real life. So I use a bank of relays to allow me to turn on/off a termination resistor for each output, and also route the pins together. Since they are I/O pins I need to test both input and output. These allow me to check:

Shorted input/output pins (one pin turned into output at a time, rest are inputs. Toggling output pin shouldn't affect inputs, if it does suggests solder bridge on translation IC)

In addition there is circuitry to check the 'AVR + XMEGA' programmer built into the ChipWhisperer, which is just a pair of chips mounted on the test board. As part of the test script it reprogram/verifies them multiple times at maximum supported SPI speed (~2MHz). There shouldn't be any errors, or it suggests problems with the physical signals.

Finally there is a LVDS driver to check the LVDS clock input, and resistive loads to check the +7V/-7V power supply output voltage & ripple. The following photograph shows the rest of the connections to the test board. I haven't shown all the oscilloscope probes in either of these photos - if testing a number of boards all of the outputs are wired up to a scope, and a mask on each channel makes testing a lot quicker!

Anyway I thought that might be of interest to some people! It's part of the answer to the question "how do you make money with open hardware?". In this case my answer is that a lot of people (companies, universities, etc) want to buy a finished product that they know works. For them they would rather buy something that is known to work, and not spend even half a day fixing/building it themselves!

Powering the board up should allow you to test the power supply sections. You can also program the AVR-USB at this point as detailed at http://www.newae.com/sidechannel/cwdocs/hwcapturerev2.html [NB: the hackaday.io instructions web link settings aren't working for me often, so apologies on the lack of clickable links].

2

Step 2

You can then decide to mount the parts in the OpenADC section, or build a separate OpenADC PCB. My examples always use a separate PCB, but the exact same circuitry is placed onto the PCB itself. See detailed build instructions on the OpenADC Wiki [link: https://www.assembla.com/spaces/openadc/wiki/Building_the_OpenADC], also adding a 2x30 pin male header to mate with the 2x30 female header on the ChipWhisperer baseboard.

The full BOM for the OpenADC is available at http://www.assembla.com/spaces/openadc/documents/aCyIcwKlur4RUaacwqjQYw/download/aCyIcwKlur4RUaacwqjQYw

3

Step 3

Add the ZTEX LX25 FPGA Module, which with the OpenADC, should now look like this, using a 0.375" hex stand-off with 4-40 screws for the OpenADC:

Hi Colin. Just discovered your work in this area; excellent work all around. I'm sure you get this question all the time, but I am interested in the full version of the ChipWhisperer (kit) for embedded security research, but it appears to be out of stock. I understand you're concentrating your efforts on the Lite right now, but is there any way I can purchase the older ChipWhisperer kit within the next month or so?

Hi Vik - unfortunately it will be a while yet, the one FPGA module went out of stock, and because of that we decided not to order more of the main baseboards. At this time the CW-Lite will be ready faster, so we are switching entirely to that! There will be a more advanced kit using the CW-Lite too, which will fulfill almost the same tasks as the CW full kit. Thanks for your interest!

still picking my jaw up off the floor! this is the most professional assemblage of information and good hardware design for side-channel analysis i've seen. IMHO, this is the benchmark/gold standard of open source hardware/documents/software projects. your tool could/should be used iteratively to develop smoothing/obfuscation steps in new crypto-CODEC algorithms as well as the traditional fault analysis.

have you considered using a PIC32MZ2048ECM for an evaluation target? it has a built-in crypto engine which supports a plethora of analysis worthy algorithms in hardware:

Thanks for your kind words! I'm starting to see a lot of interest in this area, and I really think it takes a lot of open-source tools to make people start caring. I haven't checked out that chip yet - seems there is a dev kit, so will add onto my list of interesting targets. The datasheet doesn't talk about side-channel so assume it's similar to XMEGA in leakage...

Hello coflynn, I think you've hit most of the requirements to be considered for the next round of the Hackaday Prize, but I couldn't see links to code repositories, libraries, licenses or permissions needed for your project. Please add these before August 20th.
Thanks for entering and good luck!

Thanks! If you DIY everything I think it's about $300-$400 depending how much of it you build. The FPGA board is $200 which is the main cost, although it's possible to build part of the FPGA file for cheaper boards (Spartan 6 LX9 boards). These versions have less features but still useful...