DataStore: Fast, Low-Power Persistent Data Storage

Introduction

On-chip flash. This is part of the processor chip and is used to store the eMote code as well as user program code.

On-board flash. This is a separate component on the board and is available for user data.

External flash. This is available via an SD flash memory connector.

Our focus is the on-board flash supported by the DataStore class.

This on-board flash component uses NOR technology, which means that, in contrast to NAND technology, access is relatively fast and an arbitrary position can be re-written without erasing a whole block (see Wikipedia's article on flash memory). The eMote NOR flash has a total capacity of 16 MBytes, divided into 126 blocks of 128 KBytes.

There are two classes involved: DataStore manages the flash as a whole with such methods as initialize, delete and erase current data. DataReference manages a particular unit of data, with such methods as read and write. When there is no confusion, we let DataStore stand for both classes.

Design Criteria

DataStore was designed with the following goals in mind.

Spread wear out over the memory to prolong the memory lifetime. Flash memories are limited in terms of the number of writes (typically around 100,000) that can be performed on a location before it fails. DataStore tries to spread writes evenly over memory space to avoid rewriting some locations much more than others.

Make access fast and energy-efficient.

Make it quick and easy to retrieve and update data.

Make the data persistant, so that it can be recovered after a reboot.

How It Works

Initially, the flash memory consists entirely of unused space. When you use DataStore, you're going to want to write, read, update and delete data. A unit of data is called a valid allocation (or just "alllocation" if there's no confusion); it consists of a fixed-length array of type byte, short, int or other value data types. The eMote Class & Member Documentation for DataStore and DataReference can provide you with the details. So you'll create an allocation consisting of, say, 7 bytes, populate a 7-byte array with the data, and then write the data to the allocation. You can do this repeatedly for other allocations which can consist of different-sized arrays of different types. Later you can read an allocation's data back into an array. If you want, you can modify the data and update it by writing it again.

What DataStore does with an update is important. If the update were to put the allocation back to the same location in the flash, there's the potential for wearing out that part of the flash storage. So instead, DataStore invalidates the allocation at that point in the flash by marking it as invalid and writes the data to unused space. This tends to spread the writes out, satisfying our need to avoid uneven wear.

If you update an entire allocation, then the time and energy cost of doing so is not much more than the cost of a fresh write. But DataStore lets you update just a part of an allocation; say, one integer in the middle of an allocation of 13 integers. In this case, the update will read the unchanged values from the flash and write it with the changed value to an unused location. Since reading from flash takes more time and energy than reading from main memory, this will be more costly than updating all 13 values from main memory.

Eventually, of course, you'll run out of unused space for a new allocation, in which case the invalid allocation space has to be converted to unused space. The DataStore garbage collector, described below, is invoked to free up unused space.

We want to be able to recover our data after a reboot. DataStore does not keep a directory structure in the flash. Instead, it can scan the flash, recover the allocations and make them available to you. So you can, for example, have one program that collects data, storing it with DataStore, and have another program that reads the data and exfiltrates it to a host PC.

Flash Memory Structure

In the discussion so far, we've mentioned valid allocations, invalid allocations, and unused space. For technical reasons, an allocation cannot span a block. There are some implications that arise from this. First, the maximum size is that of a block, minus the size required for an allocation header (16 bytes). As a block fills up, there might not be room for the next new allocation. In that case, the new allocation will go into the beginning of the next block. However, the space that's left over in the previous block must be dealt with. If it's at least the size of an allocation header, it will be marked as a dummy allocation. If it's smaller, then the unused space will be added to the allocation (whether invalid or not) that's just before it in memory. Later, during garbage collection, the dummy allocations will be converted back to unused space.

As you can see, the allocation and updating processes proceed forward in flash until such time as garbage collection is called for. Hence this approach is sometimes called a "log file structure" because it resembles a logging system that always writes new data without going back and rewriting previous stuff.

With this understanding, we can define the DataStore Invariant.

DataStore Invariant

The following conditions must hold whenever the DataStore is not being actively modified:

Every location in the flash is included in exactly one of the following:

A valid allocation.

An invalid allocation.

A dummy allocation.

Unused space.

If a block begins with unused space then the entire block is unused space.

The last condition is present for the sake of efficiency. When scanning the memory for existing allocations, this lets DataStore check the first byte of a block and decide whether to scan it or skip it.

Of course, the invariant can be temporarily violated during the execution of a DataStore method that modifies the memory.

Persistence and Starting Fresh

Since the DataStore flash is persistent, contents are saved between power cycles of the mote. A program can recover the existing contents of the flash by using the ReadAllDataReferences method. This gets references back in user-specified batches. So an exfiltration program would generally just get a batch of references and, for each one, read the allocation data and transmit it to a host.

At other times, a program wants to start fresh. In this case, any existing allocations must be removed. There are two ways to do this. The method DeleteAllData marks all allocations as invalid. New allocations will, as usual, go into the next suitable unused space and eventually the garbage collector will recover the original allocations. The other way is to use the method EraseAllData. This converts all the flash storage to unused space.

Which method is better to use? DeleteAllData will generally provide better wear leveling for the flash since new allocations will start off from where the previous program left off. On the other hand, each of the allocations has to be inspected and if necessary marked as invalid, and garbage collection to reclaim memory is less efficient than a bulk erase. EraseAllData will be faster if there are only a few references present at the beginning of the flash. On the other hand, this will start the program from the beginning of the flash, imposing more wear on the beginning part of the flash.

Garbage Collection

Since DataStore does not update allocations in place and requires that allocations occupy contiguous storage, a garbage collector (GC) is needed to convert invalid and dummy allocations to unused space so it can be reused and to compact it so that new (or updated) allocations will fit. Doing this for the entire flash storage is time-consuming, so our approach is incremental: when unused space runs low, GC will free up just enough to let processing continue. The implementation makes uses three pointers.

The log pointer indicates the location in flash storage where the next write operation can take place.

The erase pointer indicates the last block that was erased.

The flash storage region between the log and the erase pointers is where new and updated allocations are written. The allocations are written to the log pointer, which is then updated to indicate the next available flash storage location.

DataStore keeps one block of unused space reserved for GC. Eventually, when a new or updated allocation is to be written, there will not be enough unused space available except for the reserved block. In that case, GC is started. GC compacts the oldest used block by copying valid allocations to the reserved empty block; when that is done, the oldest block is erased and is available to store allocations; in addition, it checks subsequent blocks and erases them until it finds one that contains a valid allocation. The erase pointer is updated to indicate the last block erased.

This copy-and-erase process continues until enough unused space is created to store the new record and the allocation is written. GC returns an error if, having processed all blocks, the allocation still cannot be written.