BitPostingIndexInputFormat needs a unit test

Details

Description

BitPostingIndexInputFormat which is responsible for splitting a bit posting structure across various map tasks. This is use in various scenarios:
* Reinverteding an inverted index into a direct index
* Inverted a link index
* Calculating lots of things on direct files very quickly.

However, the code to determine the split is very complex. It is very easy to get correct looking but incorrect results - e.g. splits overlap, or splits do not overlap, the last split is incomplete, the first split misses the first entry, etc.

We need some way of testing this code. Here are the cases that should be tested:
* Split a single file into a single split
* Split a single file into multiple splits with a trailing split
* Split a single file into multiple splits without a trailing split
* Split multiple files into one split each
* Split multiple files into multiple splits each, with trailing splits
* Split multiple files into multiple splits each, without trailing splits

Craig Macdonald
added a comment - 18/Feb/11 12:41 PM Tagging for 3.1. I have made some initial progress on this:
Split a single file into a single split - DONE
Split a single file into multiple splits with a trailing split - IN PROGRESS

Craig Macdonald
added a comment - 12/Mar/11 2:39 PM I can't get this unit test to pass - the issue is in BitPostingIndexInputStream's skipping ability. I have reproduced this within the new test for TREC-166 .