I have a DataFrame that is built from concatenating multiple CSV files. First, I filter out all values in the Name column that contain a certain string. The result looks something like this (shortened for brevity sakes, actually there are more columns):

Name ID
'A' 1
'B' 2
'C' 3
'C' 3
'E' 4
'F' 4
... ...

Now my issue is that I want to remove a special case of 'duplicate' values. I want to remove all ID duplicates (entire row actually) where the corresponding Name values that are mapped to this ID are not similar. In the example above I would like to keep rows with ID 1, 2 and 3. Where ID=4 the Name values are unequal and I want to remove those.

Instead of using apply() I have also tried using transform(), however that gives me the error: AttributeError: 'int' object has no attribute 'ndim'. An explanation on why the error is different per function is very much appreciated!

Haha. People often do not discover it -- I think I answer transform questions at least weekly.
–
Dan AllanJul 30 '13 at 15:44

I've seem several before :) and still I forget...
–
Andy HaydenJul 30 '13 at 15:46

Hey Dan, thanks! I'll have to try your code tomorrow but so far it makes sense to me. I didn't know about nunique() so thanks for pointing that out as well. Actually, could you briefly describe why you're using transform() as opposed to apply()?
–
MatthijsJul 30 '13 at 17:28

2

apply would return a shorter Series, with one entry per group. Instead, we want a Series of the same length as the original one, with each group's entire contents mapped to True or False as a block. Then we can use that boolean Series to mask the original Series. See the documentation for more.
–
Dan AllanJul 30 '13 at 20:39