Duplicates in arbitrary instances for collections

Tom Murphy <amindfv <at> gmail.com>
2013-01-20 20:12:17 GMT

Every QuickCheck tutorial makes the point that, given a property on
some type, the Arbitrary instance for that type is written in such a
way that nearly all of the interesting edge-cases of that type are
tried. So e.g. for numeric types, positive, negative, and zero numbers
are tried. For collections, I think the case of duplicate elements is
insufficiently covered.
Say we have a list difference function:
badDiff :: Eq a => [a] -> [a] -> [a]
badDiff l r = filter (\e -> e `notElem` r) l
And a test for it:
prop_wrong :: [Int] -> [Int] -> Bool
prop_wrong a b = (a `badDiff` b) == (a \\ b)
`quickCheck prop_wrong` passes about 80% of the time!
Obviously it shouldn't:
> badDiff [1,2,3,3,3] [2,3]
[1]
> (\\) [1,2,3,3,3] [2,3]
[1,3,3]
But even with maxSuccess at 1000, this property passes at least 2/3 of
the times it's run (on my machine, where (maxBound :: Int) is large).
The reason is that `arbitrary` for [a] is just
`sized $ \n -> choose (0,n) >>= \k -> sequence [arbitrary | _ <- [1..k]]`
It would be trivial to duplicate values in the collection a certain
percentage of the time (possibly sometimes adjacent). Is there a good