Pure speed

WhiteDB is a lightweight NoSQL database library written in C, operating fully in main memory.
There is no server process. Data is read and written directly from/to shared memory, no sockets are used between
WhiteDB and the application program.

Data storage

Data is kept in shared memory by default, making all the data accessible to separate processes.

Each database record is a tuple of N elements, encoded in WhiteDB-s simple compact format.
You can store both conventional datatypes and direct pointers to records: the latter
enables highly efficient traversal of complex data.

Supported features

indexes (T-tree)

persistence through logging and memory dumps

concurrency through locking

limited queries (conjunctive only)

CSV and RDF support

Linux and Windows

Python bindings

command line utility tools

Built with WhiteDB

Roboswarm

An early version of WhiteDB was used in the Roboswarm EU project
enhancing the (cooperative) intelligence of iRobot Roombas functioning
as a swarm.

All the external commands and data arriving from the robot sensors are stored in a WhiteDB
onboard Roomba, running on a tiny linux computer. The reasoner generates new
tasks for the Roomba reactively, in real-time, using rules and the
WhiteDB contents.

Telemedicine

Intelligent telemedicine systems developed at eliko use WhiteDB running on
a small MIPS type CPU for storing and analysing sensor data.

The principles of using whiteboard systems with WhiteDB as a core tool for fast interprocess communication of multi-agent
systems are described
in a Ph.D thesis "Whiteboard Architecture for the Multi-agent Sensor Systems"

Technology

Direct memory access

Each record is stored as an array (N-tuple) of integers: configurable as either 32 or 64 bits.
The integers in the tuple encode values directly or as pointers. Columns have no type: any encoded
value can be stored to any field.

You can always get a direct pointer to a record, store it into a field of a record
or use it in your own program directly. A record pointer can thus be used as an automatically
assigned id of the record which requires no search at all to access the record.

To search for a record, either scan the chain of all records, scan a sublist/tree you
have built yourself or perform an index search on an indexed field.

Data encoding

The low bits of an integer in a record indicate the type of data. Anything which does not fit into the
remainining bits is allocated separately and pointed to by the same integer.

Long strings are allocated uniquely, i.e. using the same string in many fields does not
take up additional space and allows fast string equality check.

A record pointer is a persistent offset of the record, usable
as an automatic id of the record. Pointers allow
fast traversal of
complex data without search.

Allocation and garbage collection

Conventional
malloc
does not function in shared memory, since we have to use offsets instead
of conventional pointers. Hence WhiteDB uses its own implementation of malloc for
shared memory.

A record and a uniquely kept long string can be pointed to
from several fields.
Hence we use
reference counting garbage collection
embedded into
our allocation algorithm when deleting records and long strings.
Reference counting is incremental and does not cause long
pauses.

Locking

We use a database level lock implemented via a task-fair
atomic spinlock queue for concurrency control,
but alternative faster and simpler preference policies can be configured: either a
reader-preference or a writer-preference spinlock.

Generally, a database level lock is characterized by very low overhead but
maximum possible contention.
This means that processes should spend as little time between
acquiring a lock and releasing it, as possible.

We provide safe atomic updates of simple values without taking a write lock.

Indexes

The simplest index provided is a T-tree index on any field containing
any mixture of objects (integers, strings, etc). The index is automatically maintained when records
are added, deleted or changed.

The efficiency of indexing can be greatly enhanced by using template indexes, which
create an index only for records having a given value in a given field. For example,
create an index on column 0 that only contains records where the 2-nd column is equal to 6.

Persistent storage

Two mechanisms are available for storing the shared memory database to disk. First, the whole
database can be dumped and restored. Since the database uses offsets instead of conventional
pointers, the absolute adress locations are not important.

Second, all inserts, deletions and updates can be logged to a file. The compact log thus created can be played
back to restore the contents of the database (normally after the last dump). Logging can be switched
on and off, depending on the data criticality/performance requirements.