See how these tables look like: binomial table, log2binomial table, short_bitmaps, offset_class
You get the log2binomial table by ceiling the log2 values of the binomial table.
You get short_bitmaps array by iterating all possible numbers with the given number of 1s in it’s binary form.
offset_class array contains the offsets (in short_bitmaps) of the classes of ones.

Note: short_bitmaps and offset_class tables was generated with BLOCK_SIZE set to 4 for readability.

I have no idea what is the purpose of _List after the creation of these tables (it’s saved as rev_offset). The algorithm for filling sort_bitmaps and offset_class arrays are REALLY complicated, contains deep recursion, names written in Spanish.. so I skip describing it’s inner working.

todo

It iterates through the source data and counts the number of ones in 15 bit groups (see BLOCK_SIZE). In the iteration it always sets the next 4 bits (see C_field_bits) of array “C” with this number.

todo

While iterating the total number of ones is summed and … todo

Getting the 15 bit parts and setting the 4 bit parts in uint arrays is done by using the uint arrays as packed arrays.

Note:These are the functions for handling arrays as packed arrays (libcdsBasics.h):

About libcds

Libcds is a library which implements low-level succinct data structures to provide the building blocks of most compressed/succinct solutions. Libcds is written in C++.

There are 4 kind of succinct data structures in libcds which are based on bitmaps:

DArray

RG

RRR

SDArray

What is common in these structures?

Each structure can be built from a BitString. BitString is the data container of libcds which consists of the following:

size_t length;
size_t uintLength;
uint * data;

“length” is the count of bits (of the whole data).
“uintLength” is the count of uints at the memory where the “data” pointer points.
“data” is pointer to uints which hold the data.

size_t is at least 32 bit on 32 bit systems and at least 64 bit on 64 bit machines (because it’s the type of the size parameter of malloc). So you can hold a maximum of 64^2 bits in this structure on a 64 bit system.

How does RG work?

Basics

RG uses a superblock table where each element tells how many ones present till it’s corresponding data group. When you create this structure you have to define a factor which tells how many data blocks must be grouped together.

Building the RG structure

When you build an RG structure you have to define the bitstring and a factor.

BitSequenceRG::BitSequenceRG(const BitString & bs, uint _factor);

bs is the source data
_factor means that how many uints of bs will make a superblock

s is the number of bits in a superblock.
Rs is the array of superblocks.

The constructor function creates an uint array named “data” with length “integers”. The uints from bs copied here. The remaining space is filled with zeros.

Then it builds the array of superblocks by calling “void BitSequenceRG::BuildRank()”.

The superblock array (Rs) is filled up in this way:

The first element is 0 because rank is obviously 0 before the first bit of data.

The further elements are calculated as: the previous element + the number of ones in the previous block of “data” (with block size of “s” bits).

How does rank work on RG?

size_t BitSequenceRG::rank1(const size_t i1);

i1 is the index where we want the rank.

First it gets the element with index i1/s from the superblock array (Rs).
This will tell the number of ones till the block’s start (where the index maps).

Second it iterates through all uints in data from block’s start to just before the uint comes where the index maps. It counts ones in these uints and sums them. The uint where the index maps has to be masked before it can count the ones in it (because the remaining part must be dismissed). Than number of ones in it are counted and summed with the previous result.

This result is the rank.

How does select work on RG?

size_t BitSequenceRG::select1(const size_t x1);

We are looking for the index of the x1-th one in data.
x1 parameter is stored as x variable in this function.

First it does a binary search in the superblocks array (Rg) until it finds the greatest element which is less than x. This means that where this superblock maps (start of uint block in “data”) there are fewer ones than x.

Second it subtracts the found superblock element from x. So x will equal to the remaining ones which need to be counted from this block.

( in the example the found element is 0, so x won’t change )

Third it iterates the uints from this block and subtracts the number of ones in them from x. This goes until a block which has more ones than the current x. It means that the select result will be in this uint block. This block’s value is saved and future work is done on it.

Fourth it iterating through 8 bit parts of this uint (by shifting and substracing number of ones of these parts) until a 8 bit part has more or equal ones than the current x.

Fifth it iterates through this 8 bit part (bit by bit) by shifting and subtracts the current bit value by masking from x.

It returns the bit index where the last one has been found.

Note:RG uses popcount and popcount8 function where it counts the number of ones in a 8/32 bit variable. These functions use array “const unsigned char __popcount_tab[]”, which is a lookup table with 256 elements.

For a 8 bit value it gets the number of ones in step (be using the value as the index of the table). So this table contains the number of ones of every possible 8 bit value.

For a 32 bit value it gets the number of ones by making 4 x 8 bits from it (by masking) and summing their value in the lookup table.

Time and space requirement

n – data length in bits
factor – how many blocks to group together (for superblocks)

* It’s O(n) in the current implementation because it makes a copy of the original data, but it can be easily fixed.

Rank:
time(n, factor) = O(factor)

Select:time(n, factor) = O(log(n/factor)) + O(factor)

Problems and inefficiencies in RG

Every array elements are copied one by one. Zeroing is also done one by one. A standard library call should be used instead!

There are two elements which have the same meaning: constant “W” in cds-utils namespace and “b” a class variable in BitSequenceRG. They are set to 32 (the bit-length of uint). It is a bad practice to use two variables for the same purpose (maybe you forget to change both). In my notes I changed them to constant 32.

uint is used everywhere in the code. Because uint is not a standard C++ data type the code will not work consistently (or at all) across different platforms. On my 64 bit machine with Linux and GCC 4.8.2/Clang 3.3 uint is defined as unsigned int and it is a 32 bit data type.

On 64 bit platforms a 64 bit storage type (like uint64_t) should be used for performance

popcount functions are great however on modern processors it could be optimized to one opcode by using assembly.

Example 1

Example 2

What is select?

Example 1

Example 2

In programming sense it is a function with 2 arguments:

int select(void *bitmap, int numofones)
{
// ...
return index;
}

What is the use of these queries?

They are used in implementations of succinct data structures. With these structures data can be compressed in a way to be able to use in-place, without decompressing it first. By using these queries on these structures you can solve a lot of problems.

In these examples bitmap should be compressed of course by using a succinct data structure (like RRR) for size and an efficient rank/select implementation.

Example 1

You have a text which consists of words.

You want to get the nth word of this text.

You want to know how many words are in a range.

How can you accomplish this?

Make a bitmap which maps the words’ start position.
In this bitmap:
– 1 means a word’s start
– 0 means other

With select you can solve the first problem:

index = select(bitmap, n);

Now you can read the nth word from this index.

With rank you can determine the words number in a range (a,b):

numofwords = rank(bitmap, b) - rank(bitmap, a);

Notice:There should be a check here for the last word’s end (it can be out of range).

Example 2

There is a long line of containers at customs authority. Some are empty, others are filled with goods.

There are not enough officiers to check for all the containers (verifing if it contains what is written on it).Check every nth occupied container!

You have a bitmap which maps containers.
In this bitmap:
– 1 means a container is filled with goods
– 0 means a container is empty

With select you can solve this problem:

index1 = select(n);
index2 = select(2*n);
// ..

Example 3

There is a large prison with lot of cells. A cell is empty or occupied by one prisoner. The cells are indexed in lines. The security system failed on one passage and all cells on this line accidentally opened.

How many prisoners did escape?

You have a bitmap which maps cells.
In this bitmap:
– 1 means a cell is occupied by a prisoner
– 0 means a cell is empty

PBM P4 structure

The file starts with the ASCII section and ends with the binary section.

ASCII section

“P4” is the magic word which identifies the file format
The line starts with “#” is a comment (optional).
“8 4” is the dimension of the binary data
The ASCII section ends with exactly one whitespace after dimension

A character is a whitespace if isspace() returns true for it.
A comment line must end with CR of LF.
Any number of whitespaces could appear between the tokens.

Binary section

Contains the image data, where every bit corresponds to one pixel.
The example file consists of “@0@0” which is 4 bytes => 32 bits total.
Character code of “@” is 64, which is 01000000 in binary.
Character code of “0” is 48 which is 00110000 in binary.

“8 4” dimension means 4 lines of 8 bit binary data.

“16 2” dimension means 2 lines of 16 bit binary data.

BUT!

“5 4” dimension means 4 lines of 8 bit binary data, where only the most significant 5 bits are used.

“15 2” dimension means 2 lines of 16 bit binary data, where only the most significant 15 bits are used.

bitmapdd

This program creates a bitmap from a file (or device). It’s mainly used for creating a usage map of input but it can also do conversions.

You can download bitmapdd from GitHub:

$ git clone https://github.com/andmaj/bitmapdd.git

Scenario 1:

You have a file which consists of blocks of data. If a block is full of zeros than it’s free, otherwise it’s used. You want to make a bitmap from it where a bit in the file is 0 if the corresponding block is zero, otherwise it’s 1.

For example with block size set to 4:

$ bitmapdd --bs 4 --if input.dat --of output.dat

Scenerio 2:

Converting a text of zeros and ones to a binary file where every bit corresponds to one character in the original file.

$ bitmapdd --bs 1 --null 48 --if usagemap.txt --of usagemap.dat

Note: null byte has been set to 48 which is the code of character “0” in the ASCII character table.

usagemap.txt contains text:
001100000100000001000001

usagemap.dat will contain text:
0@A

Character

Decimal code

Binary code

0

48

00110000

@

64

01000000

A

65

01000001

bitmap2pbm

Creates a P4 type PBM image from a binary file. With this program you can visualize your binary (for example a usage map).

You can download bitmap2pbm from GitHub:

$ git clone https://github.com/andmaj/bitmap2pbm.git

How to use

For example creating an image of the first 10000 bytes of memtest binary:

$ head -c 10000 /boot/memtest86+-4.20 | bitmap2pbm --of memtest.pbm

You can view the image in Gimp.

fat2bitmap

Creates a bitmap from FAT file system free/used clusters. The bitmap is in text format so contains zero (character 48) and one (character 49) bytes.

A zero means that the cluster is free, a one means that the cluster is used.

How to create the image file

[UPDATE: 2018.12.11.]
I’m sorry for the late correction, the approval request of comments landed in the SPAM. Stefan Naumann and Wojciech Franczyk pointed out correctly:

“Hi. In the point 3 you are creating FAT filesystem on the disk image, but you should have it created only on the partition. This is corrupting the image. You can check it trying fdisk -l test.img after performing the point 3 – you will get no partitions.

To fix it we first need to map the partition to /dev:
sudo losetup –offset 1048576 -f test.img
offset value is the start sector of the partition [2048] multiplied by sector size [512] to get bytes.

And create FAT filesystem on the partition, not disk:
sudo mkfs.vfat /dev/loop0

I’m leaving the solution here because I had exactly this problem as I needed valid whole disk image (to boot it), not only the partition 🙂
Nice tutorial thought, thanks for that, It helped me. Cheers.”

Create a file filled with zeros:

$ dd if=/dev/zero of=test.img count=50 bs=1M

This command makes a 50 MB image file. Change the “count” argument for different size.

Preparations

Read the warning message and click “If you have read the above notice, please click here to find hardware version.“

My router is V1 because “Model: TL-WR1043ND Ver: 1.6” written on its back
Click “TL-WR1043ND V1”

Download the firmware by clicking “TL-WR1043ND_v1_130428”Warning: Do NOT download an other firmware version because in case of TP-Link there may be different steps to take.
If it doesn’t work here is a copy of it one my site: TL-WR1043ND_v1_130428.zip

Decompress the firmware

$ unzip TL-WR1043ND_v1_130428.zip

Check firmware checksum

$ sha1sum wr1043nv1_en_3_13_13_up_boot\(130428\).bin

It should return:
25c3c2bd86dba8bd4c68489489e28e580560bf6c wr1043nv1_en_3_13_13_up_boot(130428).bin

This firmware version also contains the boot code so we have to strip it first

Wait until “Rebooting …” appearsNote: If you have configured your router as an OpenFlow switch before using my guide than you have to pull out the LAN cable from WAN port now and plug into a LAN port (eg. LAN1).