There are as many boxes as documents, i.e. each document goes into its own separate box (of the smallest possible size so as to minimize waste of cardboard).

What is the best way to decide on the size of the boxes, so as to minimize the total amount of (surface area) of cardboard used?

I'm not necessarily looking for the analytical solution to this problem.
Two ideas I've had:

a) Pick $l_1$ and $l_2$ values at random and calculate the total surface area of cardboard. Guess the values again, see if the total surface area is smaller, and so on and on.

b) A more analytical approach where I'm computing $l_1$ and $l_2$ value in say $1$mm increments and I calculate the total surface area for each combination of box lengths between say ($150$mm, $151$nmm,$860$mm) and ($858$mm,$859$mm,$860$mm).

What would you suggest is the most practical way of going about solving this?

BTW, I'm great with Excel, less so with Mathlab, etc. I can program well in Ruby if that helps in any way.

$\begingroup$Do all the boxes have the same height? Since you have to have one box of the largest size in any event, why doesn't having one box of this size minimize the amount of cardboard? There must be information you're not telling us.$\endgroup$
– saulspatzJul 7 '18 at 23:57

$\begingroup$Yes, we can assume all boxes have the same height. I'm not entirely sure I've understood your second question...I'm assuming there will need to be at least one box of the largest size, i.e. 860mm length so that the largest square document can fit inside it.$\endgroup$
– pizuJul 8 '18 at 9:59

$\begingroup$Why can't we simply put all the documents in the largest box?$\endgroup$
– saulspatzJul 8 '18 at 13:10

$\begingroup$We're trying to minimize the waste. If a document can be put in a small or medium sized box rather than the biggest one, then that's a lot less waste.$\endgroup$
– pizuJul 9 '18 at 16:18

$\begingroup$FYI If someone is having a similar challenge. I found a couple of solutions... 1. My favourite one is such that I pick three box sizes value at random e.g. (160, 350, 860) then I compute the total area of cardboard used for the entire set of 1200 documents. I repeat the above step and note down the total area if it's smaller than for the previous guess. I find that 0.5-1 million of guesses get me to the right answer. The ruby code is actually quite simple.$\endgroup$
– pizuJul 9 '18 at 16:19

3 Answers
3

As Erwin points out in his blog post, you can model this as a network. I would take that approach, in part because it requires no specialized software. Per Erwin's post, you have 384 distinct paper sizes. Create one node for each, and let $s_i$ be the paper size for node $i$ and $n_i$ the count for size $i$. For each pair of nodes $i < j$, draw an arc from node $i$ to node $j$ whose cost is $s_j^2 \sum_{k=i+1}^j n_k$. This arc represents the cost (surface area) of putting all pages with sizes between $s_{i+1}$ and $s_j$ into boxes of size $s_j$.

You can now iterate over the graph using either two or three nested loops (since you limited yourself to three or four box sizes). Start at node 1 and look at each possible successor node (outer loop), each possible successor to that node (inner loop), each possible successor to that node (nested loop if you are allowing four sizes), recognizing that you must take the arc from the node in the innermost loop to node 384. You sum the lengths of the selected arcs, then compare the sum to the best solution so far. If it's shorter, update the best solution. Finally, note that you can break off any inner loop if its cumulative sum equals or exceeds the best some so far, since adding more (positive) arc costs cannot reduce the sum.

I would not advocate brute force in general, but with a maximum of four box sizes, and given the speed of a contemporary PC, this should be rather doable (and, again, requires no special software, other than a compiler/interpreter for some programming language).

$\begingroup$I'm going to amend this, because I think it can be more efficient. Assume a limit of four box sizes. Create a layered network with six layers. The first and last layers contain just start and end nodes. The second layer has 384 nodes, one for each size. The third layer has 383 nodes (omits the smallest size), the fourth has 382 nodes (omits the two smallest sizes), and the fifth has 381 nodes. Arcs and arc weights are as above, with arcs from any node to all nodes in the next layer that are bigger sizes. The shortest path from start to end is the winner. For three box sizes, drop one layer.$\endgroup$
– prubinJul 17 '18 at 14:51

$\begingroup$Sorry -- even what I wrote in the comment is a little overblown. You know you have to use the largest box size (to hold the largest documents), so that sixth layer and the end node are redundant. The fifth layer will contain just one node, for the largest document size, and that will be the terminus. Also, layer four will omit the largest size (coming in layer five), layer three will omit the two largest sizes, etc.$\endgroup$
– prubinJul 17 '18 at 17:45

$\begingroup$Last comment to myself (I hope): I coded this in Java and ran it. My results confirm Erwin's answers as far as total size of boxes used. On a decent PC (Intel quad core processor, 12 GB of RAM), the case of 3 box sizes took 318 ms. (including reading the data in), while 4 box sizes took 518 ms. This is without using parallel threads.$\endgroup$
– prubinJul 17 '18 at 18:59

As expected the objective deteriorates: we waste more space. Interestingly the smallest boxes are the same as for the previous case.

Alternative methods includes a network approach (see the answer by Paul Rubin) or a Dynamic Programming algorithm. Somehow I like the set partitioning model: the constraints can be written very compactly (and actually make intuitive sense).