Computational Complexity and other fun stuff in math and computer science from Lance Fortnow and Bill Gasarch

Friday, March 04, 2005

Finding Duplicates

Here is an interesting problem given by Muthukrishnan during his
talk in the New Horizons workshop.

Start with an array A of n+1 entries each consisting of an integer
between 1 and n. By the pigeonhole principle there must be some i≠j
and a w such that A(i)=A(j)=w. The goal is to find w. Depending on A
there may be several such w, we want to find any one of them. The
catch is that you only get to use O(log n) bits of memory.

First a warm-up puzzle: Find w using only O(n) queries to entries of
A (remember you only get O(log n) space). Hint: Use pointer chasing.

Now suppose A is streamed, that is we get in order
A(1),A(2),…,A(n+1) and then get another pass etc. How many
passes do you need to find a w?

You can find w in n+1 passes just by trying each possible value for
w. With a little work you can use O(log n) passes doing a binary
search.

Muthukrishnan asks whether the number of passes needed is
Ω(log n) or O(1) or something in between.

12 comments:

"First a warm-up puzzle: Find w using only O(n) queries to entries of A (remember you only get O(log n) space). Hint: Use pointer chasing."Ans: create a linked list kind of strucutre where there are n+1 nodes, a) value at node i, v[i]=A[i]b) pointer from node i to node v[i].check for loops in the linked list by using pointer chasing, starting from node n+1.

The following is a partial answer to the problem above.[ I was attending the New Horizons workshop in Kyoto, Japan last week; after Muthu's talk I wasdiscussing the following with Muthu. Jun Tarui (tarui+at-mark+ice.uec.ac.jp)]

Assume n is odd.Consider a restricted class of inputs such that A(1), A(2), ..., A((n+1)/2) are distinct andA((n+1)/2 + 1), ..., A(n+1) are distinct.A solution for this class of inputs with s bits of memory and r stream-passesimplies a two-party protocol for the following communication gamewith r rounds and s communication bits in each round.

Alice and Bob get size-(n+1)/2 subsets A and B of {1,...,n} respectively and the task is to find and agreeon some w that is in the intersection of A and B.

This task is equivalent to the monotone Karchmer-Wigderson game for the Majorityfuntion of n variables (Alice and Bob get a max term and a min term).A protocol with r rounds and s bits per round for this game correnponds toa monotone (i.e. AND/OR) circuit computing the Majority function with depth at most r and fan-in at most 2^s.Hastad's size lower bound for such circuits imply the claimed bound on the number of passesabove for s=O(log n).

When n is even, consider the complexity of any function that outputs 1 if the number of 1'sis more than n/2 and outputs 0 if it is less than n/2. (end-of-proof-sketch)