A Moo class providing read/write access to bit streams including support for numerous variable length codes. Adding new codes as roles is easily done. An adaptive code (ARice) is included that typically will use fewer bits on most inputs than fixed codes.

Bit streams are often used in data compression and in embedded products where memory is at a premium. Using variable length codes allows high performance compression of integer data. Common codes such as fixed-bit-length, unary, gamma, delta, Golomb, and Rice codes are included, as well as many interesting other codes such as Levenstein, Even-Rodeh, Fibonacci C1 and C2, generalized Fibonacci, and Goldbach codes to name a few. Flexible codes such as Comma, Taboo, Start-Stop codes are also implemented.

One common application is lossless image compression, where a predictor turns each pixel into a small error term, which can then be efficiently encoded. Another application is storing opcodes that have a very uneven distribution (e.g. some opcodes are very common, some are uncommon).

For higher performance, the Data::BitStream::XS module can be installed, which will speed up operation of this module greatly. It may also be used directly if the absolute best speed must be obtained, although that bypasses Moo/Moose and hence will not allow custom roles.

This is a classic prediction-coding style compression method, used in many applications. Most lossless image compressors use this method, though often with some extra steps further reduce the error term. JPEG-LS, for example, uses a very simple predictor, and puts its effort into relatively complex bias estimations and adaptive determination of the parameter for Rice coding.

The escape code is not included by default, so this shows how we can add it to the package. You can also use Moo::Role-apply_roles_to_object> and give it a stream object as the first argument, which will apply the role just to the single stream. Alternately, if you have Moose, you can use Data::BitStream::Code::Escape-meta->apply($stream);> or other MOP operations.

Note that if we used the text interface we don't have to do this, as the Escape module includes code info that the Data::BitStream module will find by default. This involves an extra lookup to find the method, but is convenient:

Creates a new object. By default it has no associated file and is mode RW. An optional hash of arguments may be supplied. Examples:

$stream = Data::BitStream->new( mode => 'ro' );

The stream is opened as a read-only stream. Attempts to open it for write will fail, hence all write / put methods will also fail. This is most useful for opening a file for read, which will ensure no changes are made.

A file is associated with the stream. Upon closing the file, going out of scope, or otherwise being destroyed, the stream will be written to the file, with the given header string written first. While the current implementation writes at close time, later implementations may write as the stream is written to.

A file is associated with the stream. The contents of the file will be slurped into the stream. The given number of header lines will be skipped at the start. While the current implementation slurps the contents, later implementations may read from the file as the stream is read.

Returns undef if the code is not known, 0 if the code is non-universal, and a non-zero integer if it is universal.

The argument is a text name, such as 'Gamma', 'Rice(2)', etc.

A code is universal if there exists a constant C such that C plus the length of the code is less than the optimal code length, for all values. What this typically means for us in practical terms is that non-universal codes are fine for small numbers, but their size increases rapidly, making them inappropriate when large values are possible (no matter how rare). A classic non-universal code is Unary coding, which takes k+1 bits to store value k. This is very good if most values are 0 or near zero. If we have rare values in the tens of thousands, it's not so great. It is likely to be fatal if we ever come across a value of 2 billion.

Used for the dispatch table methods code_put and code_get as well as other helper methods like code_is_universal and code_is_supported. This is typically handled internally, but can be used to register a new code or variant. An example of an Omega-Golomb code:

Writes $value to the stream using $bits bits. $bits must be between 1 and maxbits, unless value is 0 or 1, in which case bits may be larger than maxbits.

The stream length will be increased by $bits bits. Regardless of the contents of $value, exactly $bits bits will be used. If $value has more non-zero bits than $bits, the lower bits are written. In other words, $value will be effectively masked before writing.

Returns the stream as some scalar holding the data in some implementation specific way. This may be portable or not, but it can always be read by the same implementation. It might be more efficient than the raw format.

The stream is set to the packed big-endian vector $packed which has $bits bits of data. If $bits is not present, then length($packed) will be used as the byte-length. It is recommended that you include $bits.

A read-only boolean indicating whether the stream is open for writing or reading. Methods for read such as read, get, skip, rewind, skip, and exhausted are not allowed while writing. Methods for write such as write and put are not allowed while reading.

The write_open and erase_for_write methods will set writing to true. The write_close and rewind_for_read methods will set writing to false.

The read/write distinction allows implementations more freedom in internal caching of data. For instance, they can gather writes into blocks. It also can be helpful in catching mistakes such as reading from a target stream.

All coding methods are biased to 0. This means values from 0 to 2^maxbits-1 (for universal codes) may be encoded, even if the original code as published starts with 1.

All get_ methods take an optional count as the last argument. If $count is 1 or not supplied, a single value will be read. If $count is positive, that many values will be read. If $count is negative, values are read until the end of the stream.

get_ methods called in list context will return a list of all values read. Called in scalar context they return the last value read.

put_ methods take one or more values as input after any optional parameters and write them to the stream. All values must be non-negative integers that do not exceed the maximum encodable value (typically ~0, but may be lower for some codes depending on parameter, and non-universal codes will be practically limited to smaller values).

Reads/writes one or more values from the stream in generalized Fibonacci coding. The order m should be between 2 and 16. These codes are described in Klein and Ben-Nissan (2004). For m=2 the results are identical to the standard C1 form.

Reads/writes one or more values from the stream in Fibonacci C2 coding. Specifically, the order m=2 C2 codes of Fraenkel and Klein. Note that these codes are not prefix-free, hence they will not mix well with other codes in the same stream.

Reads/writes one or more values from the stream in Comma coding. The number of bits bits should be between 1 and 16. bits=1 implies Unary coding. bits=2 is the ternary comma code. No leading zeros are used.

Reads/writes one or more values from the stream in block-based Taboo coding. The parameter taboo is the binary string of the taboo code to use, such as '00'. taboo='1' implies Unary coding. taboo='0' implies Unary1 coding. No more than 16 bits of taboo code may be given. These codes are a more efficient version of comma codes, as they allow leading zeros.

Reads/writes one or more values from the stream in Golomb coding using the supplied subroutine instead of unary coding, which can make them work with large outliers. For example to use Fibonacci coding for the base:

Reads/writes one or more values from the stream in Rice coding using the supplied subroutine instead of unary coding, which can make them work with large outliers. For example to use Omega coding for the base:

Reads/writes one or more values from the stream in the Zeta coding of Paolo Boldi and Sebastiano Vigna. The parameter k must be between 1 and maxbits (32 or 64). Typical values for k are between 2 and 6.

Reads/writes one or more values from the stream in Adaptive Rice coding using the supplied subroutine instead of Elias Gamma coding to encode the base. The value of $k will adapt to better fit the values. This interface will likely change to make $k a reference.

These methods wrap up all the previous encoding and decoding methods in an internal dispatch table. code is a text name of the code, such as 'Gamma', 'Fibonacci', etc. Codes with parameters are called as 'Rice(2)', 'StartStop(0-0-2-4-14)', etc.

The Data::Buffer module has some similarities, and may be easier to use if your structure maps directly to typical C structs. The main feature it has that isn't replicated here is the template functionality. The primary difference is Data::BitStream allows arbitrary bit lengths (it isn't byte oriented), and of course all the different codes. It also allows direct storage of 64-bit integers, and bigints (using binary strings).