Previous topic

Next topic

Quick search

The BitStream and ConstBitStream classes contain number of methods for reading the bitstring as if it were a file or stream. Depending on how it was constructed the bitstream might actually be contained in a file rather than stored in memory, but these methods work for either case.

In order to behave like a file or stream, every bitstream has a property pos which is the current position from which reads occur. pos can range from zero (its value on construction) to the length of the bitstream, a position from which all reads will fail as it is past the last bit. Note that the pos property isn’t considered a part of the bitstream’s identity; this allows it to vary for immutable ConstBitStream objects and means that it doesn’t affect equality or hash values.

The property bytepos is also available, and is useful if you are only dealing with byte data and don’t want to always have to divide the bit position by eight. Note that if you try to use bytepos and the bitstring isn’t byte aligned (i.e. pos isn’t a multiple of 8) then a ByteAlignError exception will be raised.

For simple reading of a number of bits you can use read with an integer argument. A new bitstring object gets returned, which can be interpreted using one of its properties or used for further reads. The following example does some simple parsing of an MPEG-1 video stream (the stream is provided in the test directory if you downloaded the source archive).

The read / readlist methods can also take a format string similar to that used in the auto initialiser. Only one token should be provided to read and a single value is returned. To read multiple tokens use readlist, which unsurprisingly returns a list.

The format string consists of comma separated tokens that describe how to interpret the next bits in the bitstring. The tokens are:

where here we are also taking advantage of the default uint interpretation for the second and third tokens.

You are allowed to use one ‘stretchy’ token in a readlist. This is a token without a length specified which will stretch to fill encompass as many bits as possible. This is often useful when you just want to assign something to ‘the rest’ of the bitstring:

a,b,everthing_else=s.readlist('intle:16, intle:24, bits')

In this example the bits token will consist of everything left after the first two tokens are read, and could be empty.

It is an error to use more than one stretchy token, or to use a ue, se, uie or se token after a stretchy token (the reason you can’t use exponential-Golomb codes after a stretchy token is that the codes can only be read forwards; that is you can’t ask “if this code ends here, where did it begin?” as there could be many possible answers).

The pad token is a special case in that it just causes bits to be skipped over without anything being returned. This can be useful for example if parts of a binary format are uninteresting:

The unpack method works in a very similar way to readlist. The major difference is that it interprets the whole bitstring from the start, and takes no account of the current pos. It’s a natural complement of the pack function.

To search for a sub-string use the find method. If the find succeeds it will set the position to the start of the next occurrence of the searched for string and return a tuple containing that position, otherwise it will return an empty tuple. By default the sub-string will be found at any bit position - to allow it to only be found on byte boundaries set bytealigned=True.

The reason for returning the bit position in a tuple is so that the return value is True in a boolean sense if the sub-string is found, and False if it is not (if just the bit position were returned there would be a problem with finding at position 0). The effect is that you can use ifs.find(...): and have it behave as you’d expect.

rfind does much the same as find, except that it will find the last occurrence, rather than the first.

>>> t=BitArray('0x0f231443e8')>>> found=t.rfind('0xf')# Search all bit positions in reverse>>> print(found)(31,) # Found within the 0x3e near the end

For all of these finding functions you can optionally specify a start and / or end to narrow the search range. Note though that because it’s searching backwards rfind will start at end and end at start (so you always need start < end).

To replace all occurrences of one BitArray with another use replace. The replacements are done in-place, and the number of replacements made is returned. This methods changes the contents of the bitstring and so isn’t available for the Bits or ConstBitStream classes.

>>> s=BitArray('0b110000110110')>>> s.replace('0b110','0b1111')3 # The number of replacements made>>> s.bin'111100011111111'

The emphasis with the bitstring module is always towards not worrying if things are a whole number of bytes long or are aligned on byte boundaries. Internally the module has to worry about this quite a lot, but the user shouldn’t have to care. To this end methods such as find, findall, split and replace by default aren’t concerned with looking for things only on byte boundaries and provide a parameter bytealigned which can be set to True to change this behaviour.

This works fine, but it’s not uncommon to be working only with whole-byte data and all the bytealigned=True can get a bit repetitive. To solve this it is possible to change the default throughout the module by setting bitstring.bytealigned. For example:

>>> s=BitArray('0xabbb')>>> s.find('0xbb')# look for the byte 0xbb(4,) # found, but not on byte boundary>>> s.find('0xbb',bytealigned=True)# try again...(8,) # not found on any byte boundaries>>> bitstring.bytealigned=True# change the default behaviour>>> s.find('0xbb')(8,) # now only finds byte aligned