Efficient sorting into categories with choices

I have 8 categories, and about a hundred items. Each item can fit into 1, 2, 3 or 4 of these categories (and we know which ones each item can fit into). The task is to have 'the most even distribution' of the items among the categories, i.e. it is best if all the categories have something in them before putting more than one into a category. Once an item has been placed in a category, it cannot be repeated in another.

Example 1, with eight items
Categories labelled, A to H. Items are represented by X's.

In the above case (with eight items), 1 is better than 2 or 3. Basically, to fill a row as quickly as possible is the task, given that some items will only fit into some categories, and that you can't put the same item into two categories in the end!

Example 2, with ten items
Categories labelled, A to H. Items are represented by X's.

With 10 items obviously the best result is 1 full row and 2 left over. This can be determined directly from the number of items, so the best result for 20 is 2 full rows and 4 left over (20=2*8+4) and with 100 it's 12 rows and 4 over (12*8=96).

Now only one item (3) can go in C. Then there's only 1 in B(8) then 1 in D(9) so this leaves:
A:4 B:8 C:3 D:9 E:5 F:6 G: H:1
Item000 Categories: FG
Item002 Categories: EF
Item007 Categories: FG

To achieve the ideal result (1 full row and 2 left over) we just have to drop one of these in G and be careful not to pick FF for the remaining 2. Final result A:4 B:8 C:3 D:9 E:2,5 F:6,7 G:0 H:1 (one of 6 possibilities (0 or 7 in G, then EF,EG or FG for the remaining 2)).

So now what do we complete the first row with? The choices for C are 2,9,10,12,16; for D 2,6,9,10,13; for G 4,10,12,13,17. This could be where the search starts.

On the other hand, given the speed of modern hardware we could just loop through all possibilities asking the simple question can we pick 2 A's, 2 B's, 2 C's... 2 H's in such a way as to leave four singletons at the end? With just 100 items and only 4 categories the time to perform the search could be considerably less than the time it would take a highly paid engineer to determine the "ideal" solution, particularly as we can bail out of the loop on the first occurrence of a 2*8+4 solution.

So in part the answer depends on the type of course you're on (if any). If it's a purely academic environment where the determination of the ideal solution is the goal regardless of how long it takes (undergrads are a free infinite resource we can waste as much of as we like) then that determines one solution. On the other hand if it's a business environment where time equals money then of all the results that are "good enough" the cheapest (i.e. quickest) could be best.

If the test data is horribly skewed so that 50% of the items appear in A,B,C the program may need to be able to cope with that kind of possibility.