This version includes: (1) One additional constructor mode that
allows a bit vector to be constructed directly from the bytes
type objects in the memory. (2) A bugfix in the slice function
for the case when the upper and the lower bounds of the slice
range are identical. (3) A bugfix for the next_set_bit() method.

Version 3.2:

This version includes support for constructing bit vectors
directly from text strings and hex strings. This version also
includes a safety check on the sizes of the two argument bit
vectors when calculating Jaccard similarity between the two.

Version 3.1.1:

This version includes: (1) a fix to the module test code to
account for how string input is handled in the io.StringIO class
in Python 2.7; (2) some improvements to the documentation.

Version 3.1:

This version includes: (1) Correction for a documentation error;
(2) Fix for a bug in slice assignment when one or both of the
slice limits were left unspecified; (3) The non-circular bit
shift methods now return self so that they can be chained; (4) A
method for testing a bitvector for its primality; and (5) A
method that uses Python's 'random.getrandbits()' to generate
a bitvector that can serve as candidate for primes whose bitfield
size is specified.

Version 3.0:

This is a Python 3.x compliant version of the latest incarnation
of the BitVector module. This version should work with both
Python 2.x and Python 3.x.

Version 2.2:

Fixed a couple of bugs, the most important being in the
bitvector initialization code for the cases when the
user-specified value for size conflicts with the user-specified
int value for the vector. Version 2.2 also includes a new
method runs() that returns a list of strings of the consecutive
runs of 1's and 0's in the bitvector. The implementation of
the circular shift operators has also been improved in Version
2.2. This version allows for a chained invocation of these
operators. Additionally, the circular shift operators now
exhibit expected behavior if the user-specified shift value is
negative.

Version 2.1:

Includes enhanced support for folks who use this class for
computer security and cryptography work. You can now call on
the methods of the BitVector class to do Galois Field GF(2^n)
arithmetic on bit arrays. This should save the users of this
class the bother of having to write their own routines for
finding multiplicative inverses in GF(2^n) finite fields.

Version 2.0.1:

Fixed numerous typos and other errors in the documentation page
for the module. The implementation code remains unchanged.

Version 2.0:

To address the needs of the folks who are using the BitVector
class in data mining research, the new version of the class
includes several additional methods. Since the bitvectors used
by these folks can be extremely long, possibly involving
millions of bits, the new version of the class includes a much
faster method for counting the total number of set bits when a
bitvector is sparse. [But note that this new bit counting
method may perform poorly for dense bitvectors. So the old bit
counting method has been retained.] Also for data mining folks,
the new version of the class is provided with similarity and
distance calculation metrics such as the Jaccard similarity
coefficient, the Jaccard distance, and the Hamming distance.
Again for the same folks, the class now also has a
next_set_bit(from_index) method. Other enhancements to the
class include methods for folks who do research in cryptography.
Now you can directly calculate the greatest common divisor of
two bitvectors, or find the multiplicative inverse of one
bitvector modulo another bitvector.

Version 1.5.1:

Removed a bug from the implementation of the right circular
shift operator.

Version 1.5:

This version should prove to be much more efficient for long
bitvectors. Efficiency in BitVector construction when only its
size is specified was achieved by eliminating calls to
_setbit(). The application of logical operators to two
BitVectors of equal length was also made efficient by
eliminating calls to the padding function. Another feature of
this version is the count_bits() method that returns the total
number of bits set in a BitVector instance. Yet another feature
of this version is the setValue() method that alters the bit
pattern associated with a previously constructed BitVector.

Version 1.4.1:

The reset() method now returns 'self' to allow for cascaded
invocation with the slicing operator. Also removed the
discrepancy between the value of the __copyright__ variable in
the module and the value of license variable in setup.py.

Version 1.4:

This version includes the following two upgrades: 1) code for
slice assignment; and 2) A reset function to reinitialize a
previously constructed BitVector. Additionally, the code was
cleaned up with the help of pychecker.

Version 1.3.2:

Fixed a potentially misleading documentation issue for the
Windows users of the BitVector class. If you are writing an
internally generated BitVector to a disk file, you must open the
file in the binary mode. If you don't, the bit patterns that
correspond to line breaks will be misinterpreted. On a Windows
machine in the text mode, the bit pattern 000001010 ('\n') will
be written out to the disk as 0000110100001010 ('\r\n').

Version 1.3.1:

Removed the inconsistency in the internal representation of
bitvectors produced by logical bitwise operations vis-a-vis the
bitvectors created by the constructor. Previously, the logical
bitwise operations resulted in bitvectors that had their bits
packed into lists of ints, as opposed to arrays of unsigned
shorts.

Version 1.3:

(a) One more constructor mode included: When initializing a new
bitvector with an integer value, you can now also specify a size
for the bitvector. The constructor zero-pads the bitvector
from the left with zeros. (b) The BitVector class now supports
'if x in y' syntax to test if the bit pattern 'x' is contained
in the bit pattern 'y'. (c) Improved syntax to conform to
well-established Python idioms. (d) What used to be a comment
before the beginning of each method definition is now a
docstring.

Version 1.2:

(a) One more constructor mode included: You can now construct a
bitvector directly from a string of 1's and 0's. (b) The class
now constructs a shortest possible bit vector from an integer
value. So the bit vector for the integer value 0 is just one
bit of value 0, and so on. (c) All the rich comparison operators
are now overloaded. (d) The class now includes a new method
'intValue()' that returns the unsigned integer value of a bit
vector. This can also be done through '__int__'. (e) The
package now includes a unittest based framework for testing out
an installation. This is in a separate directory called
"TestBitVector".

Version 1.1.1:

The function that does block reads from a disk file now peeks
ahead at the end of each block to see if there is anything
remaining to be read in the file. If nothing remains, the
more_to_read attribute of the BitVectorobject is set to False.
This simplifies reading loops. This version also allows
BitVectors of size 0 to be constructed

Version 1.1:

I have changed the API significantly to provide more ways for
constructing a bit vector. As a result, it is now necessary to
supply a keyword argument to the constructor.

INSTALLATION:

The BitVector class was packaged using Distutils. For installation,
execute the following command-line in the source directory (this is
the directory that contains the setup.py file after you have
downloaded and uncompressed the tar archive):

python setup.py install

You have to have root privileges for this to work. On Linux
distributions, this will install the module file at a location that
looks like

/usr/lib/python2.7/dist-packages/

If you do not have root access, you have the option of working
directly off the directory in which you downloaded the software by
simply placing the following statements at the top of your scripts
that use the BitVector class

import sys
sys.path.append( "pathname_to_BitVector_directory" )

To uninstall the module, simply delete the source directory, locate
where BitVector was installed with "locate BitVector" and delete
those files. As mentioned above, the full pathname to the installed
version is likely to look like
/usr/lib/python2.6/site-packages/BitVector*

If you want to carry out a non-standard install of BitVector, look
up the on-line information on Disutils by pointing your browser to

The bits stored now will correspond to the binary
representation of the integer. The resulting bit vector is
the shortest possible bit vector for the integer value
supplied. For example, when intVal is 0, the bit vector
constructed will consist of just the bit 0.

(C3) When initializing a bit vector with an intVal as shown above,
you can also specify a size for the bit vector:

will return the bit vector consisting of the bit pattern
00000000. The zero padding needed for meeting the size
requirement is always on the left. If the size supplied is
smaller than what it takes to create the shortest possible
bit vector for intVal, an exception is thrown.

This bit vector itself is incapable of holding the bits. To
now create bit vectors that actually hold the bits, you need
to make the following sort of a call on the above variable
bv:

bv1 = bv.read_bits_from_file( 64 )

bv1 will be a regular bit vector containing 64 bits from the
disk file. If you want to re-read a file from the beginning
for some reason, you must obviously first close the fileobject that was acquired with a call to the BitVector
constructor with a filename argument. This can be
accomplished by

(C10) You can also construct a bit vector directly from a bytes type
object you previously created in your script. This can be
useful when you are trying to recover the integer parameters
stored in public and private keys. A typical usage scenario:

for accessing (and setting) the bit at the position that is
indexed M. You can retrieve the bit at position M by bv[M].
Note that the index 0 corresponds to the first bit at the left
end of a bit pattern. This is made possible by the
implementation of the __getitem__ and __setitem__ methods.

The first slice assignment will set the 6th, 7th, and the 8th
bits of the bit vector bv1 according to the first three bits of
bv2. The second slice assignment will set the first three bits
of bv1 according to the three bits in bv3. This is made
possible by the slice setting code in the __setitem__ method.

is legal assuming that the index range is not violated. A
negative index carries the usual Python interpretation: The
last element of a bit vector is indexed -1 and the first
element -(n+1) if n is the total number of bits in the bit
vector. Negative subscripts are made possible by
special-casing such access in the implementation of the
__getitem__ method (actually it is the _getbit method).

(7) You can reset a previously constructed bit vector to either the
all-zeros state or the all-ones state by

IMPORTANT: The size of a bit vector must be a multiple of of 8
for this write function to work. If this
condition is not met, the function will throw an
exception.

IMPORTANT FOR WINDOWS USERS: When writing an internally
generated bit vector out to a disk file, it is
important to open the file in the binary mode as
shown. Otherwise, the bit pattern 00001010
('\n') in your bitstring will be written out as
0000110100001010 ('\r\n'), which is the
linebreak on Windows machines.

(20) You can also write a bit vector directly to a stream object, as
illustrated by

(21) You can pad a bit vector from the left or from the right with a
designated number of zeros

bitvec.pad_from_left( n )

bitvec.pad_from_right( n )

In the first case, the new bit vector will be the same as the
old bit vector except for the additional n zeros on the left.
The same thing happens in the second case except that now the
additional n zeros will be on the right.

(22) You can test if a bit vector x is contained in another bit
vector y by using the syntax 'if x in y'. This is made
possible by the override definition for the special
__contains__ method.

(23) You can change the bit pattern associated with a previously
constructed BitVector instance:

(26) You can calculate the similarity and the distance between two
bit vectors using the Jaccard similarity coefficient and the
Jaccard distance. Also, you can calculate the Hamming distance
between two bit vectors:

(29) You can test whether the integer value of a bit vector is a
power of two. The sparse version of this method will work much
faster for very long bit vectors. However, the regular version
may work faster for small bit vectors.

(40) You can call getTextFromBitVector() to directly convert a bit
vector into a text string (this is a useful thing to do only if
the length of the vector is an integral multiple of 8 and every
byte in your bitvector has a print representation):

The bits of a bit vector are stored in 16-bit unsigned ints
following Josiah Carlson's recommendation to that effect on the
Pyrex mailing list. As you can see in the code for `__init__()',
after resolving the argument with which the constructor is called,
the very first thing the constructor does is to figure out how many
of those 2-byte ints it needs for the bits (see how the value is
assigned to the variable `two_byte_ints_needed' toward the end of
`__init__()'). For example, if you wanted to store a 64-bit array,
the variable 'two_byte_ints_needed' would be set to 4. (This does
not mean that the size of a bit vector must be a multiple of 16.
Any sized bit vectors can be constructed --- the constructor will
choose the minimum number of two-byte ints needed.) Subsequently,
the constructor acquires an array of zero-initialized 2-byte ints.
The last thing that is done in the code for `__init__()' is to
shift the bits into the array of two-byte ints.

As mentioned above, note that it is not necessary for the size of a
bit vector to be a multiple of 16 even though we are using C's
unsigned short as as a basic unit for storing the bit arrays. The
class BitVector keeps track of the actual number of bits in the bit
vector through the "size" instance variable.

Note that, except for one case, the constructor must be called with
a single keyword argument, which determines how the bit vector will
be constructed. The single exception to this rule is for the
keyword argument `intVal' which can be used along with the `size'
keyword argument. When `intVal' is used without the `size' option,
the bit vector constructed for the integer is the shortest possible
bit vector. On the other hand, when `size' is also specified, the
bit vector is padded with zeroes from the left so that it has the
specified size. The code for `__init__()' begins by making sure
your constructor call only uses the acceptable keywords. The
constraints on how many keywords can be used together in a
constructor call are enforced when we process each keyword option
separately in the rest of the code for `__init__()'.

The first keyword option processed by `__init__()' is for
`filename'. When the constructor is called with the `filename'
keyword, as in

the call returns a bit vector on which you must subsequently invoke
the `read_bits_from_file()' method to actually obtain a bit vector
consisting of the bits that constitute the information stored in
the file.

The next keyword option considered in `__init__()' is for `fp',
which is for constructing a bit vector by reading off the bits from
a file-like object, as in

The bits stored in the bit vector thus created correspond to the
big-endian binary representation of the integer argument provided
through `intVal' (meaning that the most significant bit will be at
the leftmost position in the bit vector.) THE BIT VECTOR
CONSTRUCTED WITH THE ABOVE CALL IS THE SHORTEST POSSIBLE BIT VECTOR
FOR THE INTEGER SUPPLIED. As a case in point, when `intVal' is set
to 0, the bit vector consists of a single bit is 0 also. When
constructing a bit vector with the `intVal' option, if you also
want to impose a size condition on the bit vector, you can make a
call like

Now, as you would expect, the bits in the bit vector will
correspond directly to the hex digits in your hex string.

ACKNOWLEDGMENTS:

The author is grateful to Oleg Broytmann for suggesting many
improvements that were incorporated in Version 1.1 of this package.
The author would like to thank Kurt Schwehr whose email resulted in
the creation of Version 1.2. Kurt also caught an error in my
earlier version of 'setup.py' and suggested a unittest based
approach to the testing of the package. Kurt also supplied the
Makefile that is included in this distribution. The author would
also like to thank all (Scott Daniels, Blair Houghton, and Steven
D'Aprano) for their responses to my comp.lang.python query
concerning how to make a Python input stream peekable. This
feature was included in Version 1.1.1.

With regard to the changes incorporated in Version 1.3, thanks are
owed to Kurt Schwehr and Gabriel Ricardo for bringing to my
attention the bug related to the intVal method of initializing a
bit vector when the value of intVal exceeded sys.maxint. This
problem is fixed in Version 1.3. Version 1.3 also includes many
other improvements that make the syntax better conform to the
standard idioms of Python. These changes and the addition of the
new constructor mode (that allows a bit vector of a given size to
be constructed from an integer value) are also owing to Kurt's
suggestions.

With regard to the changes incorporated in Version 1.3.1, I would
like to thank Michael Haggerty for noticing that the bitwise
logical operators resulted in bit vectors that had their bits
packed into lists of ints, as opposed to arrays of unsigned shorts.
This inconsistency in representation has been removed in version
1.3.1. Michael has also suggested that since BitVector is mutable,
I should be overloading __iand__(), __ior__(), etc., for in-place
modifications of bit vectors. Michael certainly makes a good
point. But I am afraid that this change will break the code for the
existing users of the BitVector class.

I thank Mathieu Roy for bringing to my attention the problem with
writing bitstrings out to a disk files on Windows machines. This
turned out to be a problem more with the documentation than with
the BitVector class itself. On a Windows machine, it is
particularly important that a file you are writing a bitstring into
be opened in binary mode since otherwise the bit pattern 00001010
('\n') will be written out as 0000110100001010 ('\r\n'). This
documentation fix resulted in Version 1.3.2.

With regard to Version 1.4, the suggestions/bug reports made by
John Kominek, Bob Morse, and Steve Ward contributed to this
version. I wish to thank all three. John wanted me to equip the
class with a reset() method so that a previously constructed class
could be reset to either all 0's or all 1's. Bob spotted loose
local variables in the implementation --- presumably left over from
a debugging phase of the code. Bob recommended that I clean up the
code with pychecker. That has been done. Steve noticed that slice
assignment was not working. It should work now.

Version 1.4.1 was prompted by John Kominek suggesting that if
reset() returned self, then the slice operation could be combined
with the reset operation. Thanks John! Another reason for 1.4.1
was to remove the discrepancy between the value of the
__copyright__ variable in the module and the value of license
variable in setup.py. This discrepancy was brought to my attention
by David Eyk. Thanks David!

Version 1.5 has benefited greatly by the suggestions made by Ryan
Cox. By examining the BitVector execution with cProfile, Ryan
observed that my implementation was making unnecessary method calls
to _setbit() when just the size option is used for constructing aBitVector instance. Since Python allocates cleaned up memory, it
is unnecessary to set the individual bits of a vector if it is
known in advance that they are all zero. Ryan made a similar
observation for the logical operations applied to two BitVector
instances of equal length. He noticed that I was making
unnecessary calls to _resize_pad_from_left() for the case of equal
arguments to logical operations. Ryan also recommended that I
include a method that returns the total number of bits set in aBitVector instance. The new method count_bits() does exactly
that. Thanks Ryan for all your suggestions. Version 1.5 also
includes the method setValue() that allows the internally stored
bit pattern associated with a previously constructed BitVector to
be changed. A need for this method was expressed by Aleix
Conchillo. Thanks Aleix.

Version 1.5.1 is a quick release to fix a bug in the right circular
shift operator. This bug was discovered by Jasper Spaans. Thanks
very much Jasper.

Version 2.0 was prompted mostly by the needs of the folks who play
with very long bit vectors that may contain millions of bits. I
believe such bit vectors are encountered in data mining research
and development. Towards that end, among the new methods in
Version 2.0, the count_bits_sparse() was provided by Rhiannon Weaver.
She says when a bit vector contains over 2 million bits and only,
say, five bits are set, her method is faster than the older
count_bits() method by a factor of roughly 18. Thanks
Rhiannon. [The logic of the new implementation works best for very
sparse bit vectors. For very dense vectors, it may perform more
slowly than the regular count_bits() method. For that reason, I
have retained the original method.] Rhiannon's implementation is
based on what has been called the Kernighan way at the web sitehttp://graphics.stanford.edu/~seander/bithacks.html. Version 2.0
also includes a few additional functions posted at this web site
for extracting information from bit fields. Also included in this
new version is the next_set_bit() method supplied by Jason Allum.
I believe this method is also useful for data mining folks. Thanks
Jason. Additional methods in Version 2.0 include the similarity and
the distance metrics for comparing two bit vectors, method for
finding the greatest common divisor of two bit vectors, and a
method that determines the multiplicative inverse of a bit vector
vis-a-vis a modulus. The last two methods should prove useful to
folks in cryptography.

With regard to Version 2.2, I would like to thank Ethan Price for
bringing to my attention a bug in the BitVector initialization code
for the case when both the int value and the size are user-
specified and the two values happen to be inconsistent. Ethan also
discovered that the circular shift operators did not respond to
negative values for the shift. These and some other shortcomings
discovered by Ethan have been fixed in Version 2.2. Thanks Ethan!

For two of the changes included in Version 3.1, I'd like to thank
Libor Wagner and C. David Stahl. Libor discovered a documentation
error in the listing of the 'count_bits_sparse()' method and David
discovered a bug in slice assignment when one or both of the slice
limits are left unspecified. These errors in Version 3.0 have been
fixed in Version 3.1.

Version 3.1.1 was triggered by two emails, one from John-Mark
Gurney and the other from Nessim Kisserli, both related to the
issue of compilation of the module. John-Mark mentioned that since
this module did not work with Python 2.4.3, the statement that the
module was appropriate for all Python 2.x was not correct, and
Nessim reported that he had run into a problem with the compilation
of the test portion of the code with Python 2.7 where a string of
1's and 0's is supplied to io.StringIO() for the construction of a
memory file. Both these issues have been resolved in 3.1.1.

Version 3.2 was triggered by my own desire to include additional
functionality in the module to make it more useful for
experimenting with hashing functions. While I was at it, I also
included in it a couple of safety checks on the lengths of the two
arguments bit vectors when computing their Jaccard similarity. I
could see the need for these checks after receiving an email from
Patrick Nisch about the error messages he was receiving during
Jaccard similarity calculations. Thanks Patrick!

Version 3.3 includes a correction by John Gleeson for a bug in the
next_set_bit() method. Thanks, John!

ABOUT THE AUTHOR:

Avi Kak is the author of "Programming with Objects: A Comparative
Presentation of Object-Oriented Programming with C++ and Java",
published by John-Wiley in 2003. This book presents a new approach
to the combined learning of two large object-oriented languages,
C++ and Java. It is being used as a text in a number of
educational programs around the world. This book has also been
translated into Chinese. Avi Kak is also the author of "Scripting
with Objects: A Comparative Presentation of Object-Oriented
Scripting with Perl and Python," published in 2008 by John-Wiley.

Take a bitwise 'AND' of the bit vector on which the method is
invoked with the argument bit vector. Return the result as a new
bit vector. If the two bit vectors are not of the same size, pad
the shorter one with zeros from the left.

Take a bitwise 'OR' of the bit vector on which the method is
invoked with the argument bit vector. Return the result as a new
bit vector. If the two bit vectors are not of the same size, pad
the shorter one with zero's from the left.

This is needed for both slice assignments and for index
assignments. It checks the types of pos and item to see if the
call is for slice assignment. For slice assignment, pos must be of
type 'slice' and item of type BitVector. For index assignment, the
argument types are checked in the _setbit() method.

Take a bitwise 'XOR' of the bit vector on which the method is
invoked with the argument bit vector. Return the result as a new
bit vector. If the two bit vectors are not of the same size, pad
the shorter one with zeros from the left.

This is merely another implementation of the methodcircular_rotate_left_by_one() shown above. This one does NOT use
map functions. This method carries out a one-bit left circular
shift of a bit vector.

For sparse bit vectors, this method, contributed by Rhiannon, will
be much faster. She estimates that if a bit vector with over 2
millions bits has only five bits set, this will return the answer
in 1/18 of the time taken by the count_bits() method. Note
however, that count_bits() may work much faster for dense-packed
bit vectors. Rhianon's implementation is based on an algorithm
generally known as the Brian Kernighan's way, although its
antecedents predate its mention by Kernighan and Ritchie.

The bulk of the work here is done by calling random.getrandbits(
width) which returns an integer whose binary code representation
will not be larger than the argument 'width'. However, when random
numbers are generated as candidates for primes, you often want to
make sure that the random number thus created spans the full width
specified by 'width' and that the number is odd. This we do by
setting the two most significant bits and the least significant
bit. If you only want to set the most significant bit, comment out
the statement in line (pr29).

Return a string of hex digits by scanning the bits from the left
and replacing each sequence of 4 bits by its corresponding hex
digit (this is a useful thing to do only if the length of the
vector is an integral multiple of 4)

Return the text string formed by dividing the bitvector into bytes
from the left and replacing each byte by its ASCII character (this
is a useful thing to do only if the length of the vector is an
integral multiple of 8 and every byte in your bitvector has a print
representation)

This method, contributed originally by Jason Allum and updated
subsequently by John Gleeson, calculates the position of the next
set bit at or after the current position index. It returns -1 if
there is no next set bit.

For a bit that is set at the argument 'position', this method
returns how many bits are set to the left of that bit. For
example, in the bit pattern 000101100100, a call to this method
with position set to 9 will return 4.

Read blocksize bits from a disk file and return a BitVectorobject
containing the bits. If the file contains fewer bits than
blocksize, construct the BitVectorobject from however many bits
there are in the file. If the file contains zero bits, return aBitVectorobject of size attribute set to 0.

For a one-bit in-place left non-circular shift. Note that
bitvector size does not change. The leftmost bit that moves
past the first element of the bitvector is discarded and
rightmost bit of the returned vector is set to zero.

For a one-bit in-place right non-circular shift. Note that
bitvector size does not change. The rightmost bit that moves
past the last element of the bitvector is discarded and
leftmost bit of the returned vector is set to zero.

Check if the integer value of the bitvector is a prime through the
Miller-Rabin probabilistic test of primality. If not found to be a
composite, estimate the probability of the bitvector being a prime
using this test.

Unpermute the bit vector according to the permutation list supplied
as the second argument. If you first permute a bit vector by usingpermute() and then unpermute() it using the same permutation list,
you will get back the original bit vector.

This function is meant to write a bit vector directly to a file
like object. Note that whereas 'write_to_file' method creates a
memory footprint that corresponds exactly to the bit vector, the
'write_bits_to_fileobject' actually writes out the 1's and 0's as
individual items to the file object. That makes this method
convenient for creating a string representation of a bit vector,
especially if you use the StringIO class, as shown in the test
code.

Write the bitvector to the file object file_out. (A file object is
returned by a call to open()). Since all file I/O is byte oriented,
the bitvector must be multiple of 8 bits. Each byte treated as MSB
first (0th index).