Lab

Abstract: We have seen hashing as a technique for obtaining
expected constant-time operations on finding or inserting elements in a set.
Hashing is also used to compute digital signatures for confirming the authenticity of messages.

In this assignment, you will apply a hashing scheme in pursuit of finding
occurences of one string in another,
using the Rabin–Karp algorithm.

That algorithm is discussed in Section 32.2 of our text, on page 990.

Introduction

Suppose we have two strings of interest:

target

is a relatively long string of length n.

query

is a relatively short string of length m.

We are interested in finding all occurrences of query in target.
For example, consider the queryaaba and
the targetaabaacaadaabaabaa.
The last character of the query occurs within the
target as shown below:

a a b a a c a a d a a b a a b a a
| | |
a a b a | |
a a b a |
a a b a

Notice that the second and third matches overlap. As discused in class,
the most straightforward algorithm is as follows:

for each i in the interval [0, n)
match = true
for each j in the interval [0, m)
if target[i+j] ≠ query[j]
match = false
end if
end for
if match
// definite match of query starting at index i of target
end if
end for

The worst-case complexity of this approach is
Θ(mn).

Rabin–Karp Algorithm

Instead of trying to match the query at each location in the target, the
Rabin–Karp Algorithm computes a rolling hash of a window
of m characters of the target. Near the beginning of the target, where
there may not be m characters in the window, the value 0 is
used for those
entries outside of the target.

We will use in particular the following hash function for a window
of size m in a character array, with the first character of
the window at index start:

The above function is similar to how Java computes .hashCode() for
a String, except that we confine the results to being strictly less
than 511 using mod (the % operator). The following
identities related to mod should be useful in this assignment:

(a + b) mod p = ( (a mod p) + (b mod p) ) mod p

(a × b) mod p = ( (a mod p) × (b mod p) ) mod p

You should investigate how large the intermediate results of addition
and multiplication can get before the final mod p is applied.

Applied to our running example, the rolling hash values at the end of each window within our target string are as follows:

a

a

b

a

a

c

a

a

d

a

a

b

a

a

b

a

a

97

38

254

306

214

430

337

153

73

368

92

224

306

214

429

306

214

Note that at the end of each aaba substring, the hash value
306 is produced.

Thus, each time the rolling hash produces 306, we
probably have a match of aaba in the target.

No other window produces the 306 hash value,
so we were lucky in this example: the hash value 306
produced matches only for our query.

The Rabin–Karp (RK) algorithm is essentially as follows:

Compute the RK hash for the query string of interest.

In our running
example, that hash value is 306, which can be computed by
invoking
hashFor("aaba".toCharArray(), 0, 4)

For each index i of the target string,
compute the rolling hash of the target that begins at index
i-m+1 and has length m.

In the example, the value for the first a is computed at indices
-3, -2, -1, and 0, for a window of length 4 that ends at the first index
of the target. The numerical values for characters outside the window are 0,
and the numerical value for a is its ASCII encoding of 97.

Thus, the window for a (the first character in our
target string) has hash value 97.

Moving one character to the right encounters another a. Its
hash value is at indices -2, -1, 0, and 1, which produces 31×97+97,
which is 3104, but taken mod 511 is 38.

Thus the window for a a (the first two characters in
our target string) has value 38.

However, if we compute the rolling hash by calling the hashFor method above,
the complexity is still
Θ(mn).

Your task is to compute that rolling hash incrementally,
as described below.

Your assignment

Update your repository as usual.

In the rabinkarp package of the labs source
folder, find and open the RK class.

The constructor accepts the value of m, which is the length
of the query string in this assignment.

The method int nextCh(char c) accepts a character supplied
from outside, and returns the rolling KR hash of it and the previous
m-1 characters.

You must complete the constructor and the nextCh method,
subject to the constraints stated below.

In terms of our running example, the following calls after instantiation of
a KR(4) object would return values as follows:

nextCh('a') → 97

nextCh('a') → 38

nextCh('b') → 254

nextCh('a') → 306

nextCh('a') → 214

nextCh('c') → 430

To receive full credit for this lab, your solution must:

Produce the correct hash results using an incremental approach,
taking constant (Θ(1)) time. This means your approach must
take less than Ω(m) time.

Suppose h is the hash value of a window of m characters.

Suppose character c is about to leave that window, and
character d is the next to be incorporated into the window.

Having trouble?

How to submit your work

Make sure all of your work is commited in your repository.

Note that we will run your solution against the JUnit test as it was
given to you. If you modify that test in some way, the results you see may
not correspond with the testing necessary to receive credit.

You can always
revert any changes you make to the testing code back to what we gave you in
the repository using Team…Revert, but be sure not to do that
to your source code as it will wipe away work you may not have yet committed.

Your work will be graded as of the due date for this assignment.

Submitting your work (read carefully)

Make sure your code passes the tests as we originally gave them to you.
It's those tests we will run to see if your code functions properly.

Make sure you have placed ticker.tick() calls appropriately
in your solution. Failure to do so will cost you points on this assignment.

Make sure your solution's ticks behave in a manner consistent with the
asymptotic complexity you should see for your solution.

You must commit and push all of your work to your repository. It's best to do this
from the top-most level of your repository, which bears your name and student ID.

You can tell that your code was pushed by logging into
bitbucket.org, clicking on
your repository, and seeing the
push in the associated log traffic.

Because of this, you have no excuse for failing to push your code. Check
and make sure it was pushed so your work is counted.