Top Posts & Pages

Twitter

Blog Stats

7,557 hits

Grouping Problem (Part 1)

This problem actually intrigued me on a certain conference a couple weeks ago, where all the participants have to queue in groups to get their room key. As usual, in order to distributed people as even as possible, the event organizer provide seven different tables based on alphabetical order. Alas, my first name, R, happen to be grouped with S and T. There were a chaotic since it turns out that a lot people swarming the table, where as the other lucky letter not. Does the organizer didn’t pay enough attention to do this mundane grouping? How do we solve it using a simple spreadsheet trick?

First of all, we have to embrace that names in any culture might have different distribution in terms of their first letter. For instance, when a certain population would probably have a large number people named Muhammad, and it is as rare as winning a lottery to find a girl named Xena, or Qony. So that to make an evenly group names would be a bit pain.

However, there’s a rule of thumb that is the distribution would follow about 64% A-M and 36% N-Z. According to this data, if you want an even split, the best choice is A-K (49.4%) and L-Z (50.6%). In addition, this problem also similar to partition problem which the task of deciding whether a given multi-set S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2.

I tried to use spreadsheet tool to explain this problem. I collected a sample of names in Google, and run a simple grouping. Pivot table were utilized to quickly count the number of suffix and group them orderly. After that, we build a parameter to divide it, which is to come in sense was the average value. So if we would like to divide into two tables for example, there will be around 40-sized group each.

Then it will flag the suffix using this logic.The result is when you tried to group it with only two tables, it will give you the lowest deviation, whilst serving four tables would be chaotic.

Of course this is not the optimal solution since it won’t be able to see the distribution of suffix in the first place, so that in 4-group case, it forced to be pooled in the last group. How do we achieve near optimal solution then? See you in part 2.