CompSci 261, Winter 2018, Homework 3

Suppose we are doing tabulation hashing of four-character strings,
using four tables of random numbers indexed by single characters.
Given a set of keys, define a "uniquely occurring character" to be a character that is in one of the strings at a certain position, but is not in any of the other strings in the same position.

Prove that, in every set of three four-character strings, at least two of the strings have a uniquely occurring character.

Find a set of four four-character strings none of which has a uniquely occurring character.

(This is the main insight needed to prove that tabulation hashing is 3-independent but not 4-independent.)

Suppose we have a cuckoo filter with $N$ cells and $b$ bits per fingerprint. In it, each key $x$ is represented by a $b$-bit fingerprint $f(x)$, which may be stored in either of two locations $h_1(x)$ and $h_1(x)\oplus h_2(f(x))$ (where "$\oplus$" represents the bitwise binary exclusive or operation, written as "^" in most programming languages). Define two keys $x$ and $y$ to be indistinguishable if they have the same fingerprint as each other and their fingerprints would be stored in the same two locations. Given $x$ and $y$, and assuming that $f$, $h_1$, and $h_2$ are all random functions, what is the probability that $x$ and $y$ are indistinguishable?

One way to make a 2-independent hash function is to choose a large prime number $p$ (larger than the possible range of key values), choose two random coefficients $a$ and $b$ modulo $p$, and define the hash function to be the function $h(x) = ((ax+b) \bmod p) \bmod N$.
Suppose we try shortcutting this step, and compute a simpler function $f(x) =(ax+b) \bmod N$.
Would $f$ be a good choice hash function? Explain why or why not.

Suppose that you are inserting a sequence of keys, one at a time, into a Bloom filter with 1000 cells, and that each key is mapped to four of these cells. Before adding each key $x$, you use the Bloom filter to test whether $x$ is already a member of the set, and if it says that $x$ is a member you stop the process (without inserting $x$ again).
What is the maximum possible number of keys that you can insert before you stop?