TeaFiles

Millions of Values Per Second

time series persistence high performance C++ C# Python Open Source No-SQL Free

The bare speed of binary files. With sugar.

TeaFile is a file format

to store time series

in binary flat files.

An optional header holds a description of file contents

including a description of the item type layout (schema).

The file format is designed to be simple so that APIs are created easily.

DiscreteLogics publishes the format and

releases APIs for C#, C++ and Python under the GPL

TeaFiles provide fast read/write access to time series data from any software package on any platform.
Time Series are considered homogeneous collections of items, ordered by their timestamp. Items are stored
in raw binary format, such that data can be memory mapped for fast read/write
access. In order to ensure correct data interpretation when data is exchanged between
multiple applications, TeaFiles optionally embedd a description of the data layout
in the file header, along with other optional description of the file's contents.

Design

Performant

The most performant way to write time series into a persistent media are flat files.
The most efficient way to read time series data from a persistent media is to memory map a file, possibly enhanced by a read ahead mechanism.

Simple & Solid

Simple time series persistence means to us easy to use APIs, simple file layout, well understood technologies involved. It happens that the file system
is just that: simple and rock solid on every operating system.

Self contained

The drawback of binary files is their opaqueness: Reading them requires knowledge about the structure of their content. TeaFiles overcome this by packing
meta data into the file that describes the items in the file making it self contained and self describing. Every TeaFile can therefore be opened
without any further knowledge about its content and structure.

Versatile

Analysing time series data often involves more than a single software or tool, like R, Octave/Matlab,
custom C++, Java or C# programs. TeaFiles provide a simple, very loosely coupled way to make these
programs work together - the file is the interface. Number and time formats have been carefully examined to provide such universal accessibility.

Open for all programs and operating systems

To allow data exchange between arbitrary programs, the file format was designed to be as simple as possible,
so that writing access libraries (APIs) for new targets remains as simple as possible.

File Format Spec

TeaFile APIs

TeaFiles can be read and written using raw file I/O methods available in every programming environment. APIs encapsulate access to TeaFiles more conveniently.
We provide several open source APIs introduced below, all licensed under the GPL
.
Find more detailed information about them in the corresponding repositories.

TeaFiles.Net

Create a TeaFile and write values

// the time series item typestructTick{publicTimeTime;publicdoublePrice;publicintVolume;}// create file and write some valuesusing(vartf=TeaFile<Tick>.Create("silver.tea")){tf.Write(newTick{Price=5,Time=DateTime.Now,Volume=700});tf.Write(newTick{Price=15,Time=DateTime.Now.AddHours(1),Volume=1700});// ...}

The call to TeaFile<Tick>.Create() does the whole work provided by the C#
API for TeaFiles: It analyzes the Tick struct to find fields names, types and field
offsets and writes these values into the file header. We just wrote our first TeaFile, so lets read it.

Read the file - typed

Notably the type expect to be stored in the file was provided up front calling TeaFile<Tick>.OpenRead().
This is perfectly fine if we have this knowledge. But what if not? "Untyped reading" allows to open a file
without knowing the type inside:

Read the file - untyped

// read untyped - we know nothing about the type of item in the fileusing(vartf=TeaFile.OpenRead("silver.tea")){foreach(Itemitemintf.Items){Console.WriteLine(tf.Description.ItemDescription.GetNameValueString(item));}}

output:
Price=5 Time=20.8.2011 23:50
Price=15 Time=21.8.2011 00:50

This time the call to TeaFile.OpenRead() returns the untyped version of a TeaFile
that exposes a description of the item stored in the file. So TeaFile is the anonymous sister
of TeaFile<T>, they are both unrelated classes from C# point of view, but logically
related - they both serve as interface to the contents of a TeaFile, untyped or typed.

The item values are returned as collection of Item classes that hold a collection of values, one for each
field in the item struct. The ItemDescription instance in turn offers the GetNameValueString
method that can transform an item into a pretty printed string of that item. Such
anonymous file reading can be used in 2 ways: Either you really access the data
inside by iterating the collection of Item values, which is much slower than
accssing the file data the typed way. Or you simply open the file, check its ItemDescription
which gives all information about the items stored and use this information to create
such struct in C# then using it to instantiate a typed TeaFile<T> instance.

Looks quite similar as the .Net version. The read code now uses memory mapping, which is considerably faster
than normal file reading. There is a back difference however that is not visible yet here: The file holds
a rudimentary description of the item only, in particular the layout of the item is not included in the header
of the TeaFile. This makes it impossible to read this file untyped or to even inspect its content. In other words,
the file we wrote is not really self describing (at least a little bit of information is included: the name of the
item type "Tick" and its size). We will improve this:

Reflection in C++

In order to make our file self describing, we give our API code more knowledge about the type. Since C++ still lacks
serious reflection capabilities, we help out a bit as follows:

This allows the C++ API to analyze the current struct and do the same checks as
the .Net API when reading a TeaFile. (This Description class could easily be created
by tools or even the C Preprocessor.)

Installation

Source

Documentation

License

The TeaFile APIs available at these code repositories are licensed under the GNU General Public License v3.
In addition to the terms of this license, use and distribution of this code shall be attributed to discretelogics, referencing "discretelogics.com".