Fast and Compact Web Graph Representations

Datasets

We tested our proposals using the Web graphs provided by the WebGraph project. We do not provide downloads for all the crawls, instead we would encourage you to cite their original source if you use them.

In particular, we tested the five
crawls showed in the next table. The "bpe fast" column corresponds to the bits per edge required by the
variant presented as Re-Pair CDict NoPtrs and the "bpe slow" corresponds to the one that mixes Re-Pair with Wavelet Trees. For more details about these variantes look into the documents section.

Crawl

Nodes

Edges

Plain size (MB)

bpe fast

bpe slow

Eu-2005

862,664

19,235,140

77

4.47

3.49

Indochina-2004

7,414,866

194,109,311

769

2.53

1.84

Uk-2002

18,520,486

298,113,762

1,208

4.23

3.16

Arabic-2005

22,744,080

639,999,458

2,528

3.16

2.42

Uk-union (2006-06-2007-05)

133,633,040

5,507,679,822

22,564

2.91

2.17

For testing our software in an independent maner, we provide the file for eu-2005 in our input format. The rest can be obtained from the WebGraph project and converted using the file Transform.java.