Subversion checkout URL

Home

Pages 4

Clone this wiki locally

bcolz: columnar and compressed data containers

bcolz provides columnar and compressed data containers. Column
storage allows for efficiently querying tables with a large number of
columns. It also allows for cheap addition and removal of column. In
addition, bcolz objects are compressed by default for reducing
memory/disk I/O needs. The compression process is carried out
internally by Blosc, a high-performance compressor that is optimized
for binary data.

Rational

By using compression, you can deal with more data using the same
amount of memory. In case you wonder: which is the price to pay in
terms of performance? you should know that nowadays memory access is
the most common bottleneck in many computational scenarios, and CPUs
spend most of its time waiting for data, and having data compressed in
memory can reduce the stress of the memory subsystem.

In other words, the ultimate goal for bcolz is not only reducing the
memory needs of large arrays, but also making bcolz objects to make operations
faster than using a traditional ndarray object from NumPy. That is
already the case for some special cases now (2011), but will happen
more generally in a short future, when bcolz will be able to take
advantage of newer CPUs integrating more cores and wider vector units
(256 bit and more).