Introduction

High volume numeric data may be generated as a result of scientific
simulations, or engineering plant monitoring. The "Extreme
Numeric
Compression" (XNC) algorithm has been developed for the storage
high
frequency measurements made on manufacturing plant.

Using the XNC algorithm, data will typically between 5-15% of the size
of a comparable CSV file.

In summary - Storing differences

Column #1
Date:
6 bytes

Column #2
Time:
16 bytes

Column #3 Volt Amplitude: 24 bytes

Column #4 Volt
Angle
24 bytes

Total
70 bytes

Storing differences as binary, rather than absolute values reduces the
size of the sample fragment from 110 bytes to 70 bytes

Space saving technique #3.
Store as integers of the minimum
precision

The next space saving technique is to ensure numbers are always stored
as an integer type with the lowest possible precision (or range).

Consider the sample VOLT_ANGLE series:

Absolute
values

x
1000000

Differences

145.487718

145487718

144.713594

144713594

-774124

143.929042

143929042

-784552

143.929042

143929042

0

143.933594

143933594

4552

When storing differences using LongInt precision (signed 32 bit
integer), we managed to squeeze this down to 24 bytes:

First record (must store the absolute value): 8 bytes

Remaining four records: 4 x 4 byte

We can do better than this by mixing the integer types we used to store
the differences.

The first record is a positive integer that fits within the limits of
the LongWord data type: 4 bytes

The next two records don't change and still require: 2 x 4 bytes

The fourth record fits within the limit of the Byte data type: 1 x 1
byte

The last record fits within the limit of the Word data type: 2 x 2
bytes

This gives a size reduction from 24 to 15 bytes.

The size reduction from this technique reduce if the magnitude of the
differences is highly variable as some space must be used to store
information
about the size of the next record. This is discussed more in the
section on the "Control Byte".

For example, the sequence 0, 2, 3, 6, 9 will compress well using this
technique. The sequence 0, 9823, 3, 5, 10000 will not.

Space saving technique #4.
Store multiple small integers in a
single
byte

Where differences are very small, it may be possible to squeeze several
numbers into a single byte.

Consider the contrived series:

Absolute
values

x
1000000

Differences

145.487718

145487718

145.487719

145487719

1

145.487719

145487719

0

145.487718

145487718

-1

145.487717

145487717

-1

The values 1, 0 and -1 can all be stored as ShortInt giving
space
requirements;

Implementation in other languages

The XNC algorithm will be implemented in C#, but this work is not yet
complete.

Future enhancements

At times the differences between records in a series are the
same. In this
case, the Control Byte should be modified to store a count of the
number of records that are the same, rather than a value for each
record;

The differences between records in a series may be within a
finite set.
When this is the case, an list of the differences could be stored in
the series header with references to this index being stored for each
record rather than the actual differences;

There should be some XNC algorithm version info stored in the
file
header;

Add functionality to limit the resolution of values being stored,
say to 16bit. Tiw will be of use when compressing data from an analogue
to digital converter where the precision of the conversion hardware is
known.