List Difference

November 20, 2012

I receive several emails from my readers each week, and I try to answer every one of them (except those that begin “I really admire your blog. You are obviously a great programmer. It has nothing to do with your recent exercise, but I am a student just learning to program, and I need help with my Java program that doesn’t work. Can you look at this program and tell me what is wrong? And they attach 400 lines of code without even bothering to tell me what is the task, or providing sample input and output, or even their name). I love to hear from my readers; your emails sustain me, and keep me writhing.

I had an email from a reader yesterday asking why I didn’t assign list differencing as a task when I assigned list intersection and union. The answer is: because I forgot!

So today we look at list differencing. Given two lists, their difference is the items in the first list that are not in the second list. For our sample lists A containing 4, 7, 12, 6, 17, 5, and 13 and B containing 7, 19, 4, 11, 13, 2, and 15, the difference A − B is 12, 6, 17, and 5 and the difference B − A is 19, 11, 2, and 15.

Your task is to write three list differencing functions, one each with time complexities O(n2), O(n log n), and O(n), and perform timing tests to confirm the expected time complexities. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution in the comments below.

Like this:

Related

5 Responses to “List Difference”

A Python version. After last weeks exercise, this was not too difficult. A full version can be found here.

def diff1(a, b):
"""Loop over a and check if ai not in b"""
return [ai for ai in a if ai not in b]
def diff4(a, b):
"""Make heaps of a and b. Copy to output elements that are i na but not in b"""
ha = list(a)
hb = list(b)
res = []
heapq.heapify(ha)
heapq.heapify(hb)
while ha and hb:
if ha[0] < hb[0]:
res.append(heapq.heappop(ha))
elif ha[0] > hb[0]:
heapq.heappop(hb)
else:
heapq.heappop(ha)
heapq.heappop(hb)
return res + ha # ha can contain a tail
def diff6(a, b):
"""Make set of b and loop over a and check that ai not in set(b)"""
setb = set(b)
return [ai for ai in a if ai not in setb]
def diff99(a, b):
"""Make sets of a and b and subtract"""
return list(set(a) - set(b))
"""
diff1 O(n2) coeff: 4.01e-06 ratio: 27 alt: O(n3) Loop over a and check if ai not in b
diff4 O(nlogn) coeff: 6.06e-07 ratio: 5 alt: O(n) Make heaps of a and b. Copy to output elements that are i na but not in b
diff6 O(n) coeff: 1.38e-08 ratio: 4 alt: O(nlogn) Make set of b and loop over a and check that ai not in set(b)
diff99 O(nlogn) coeff: 2.07e-08 ratio: 2 alt: O(n) Make sets of a and b and subtract
"""

Hi,
the algorithm complexity is measured in “list operations”, I gather. For example, sets in Python are implemented as (open-addressed) hashes. This
may make their instantiation much more complex than the creation of a linked list.
Interesting problem, though.