I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.

Monday, August 22, 2011

What if you go bigger? Here some results with two of the leading commercial LP solvers (using default settings) compared to GAMS:

LP solution times are a little bit more unpredictable (in this case this is positive: somewhat quicker than thought by extrapolating from first picture in more-on-network-models.html), but GAMS execution time (including generation time) is still nicely linear.

I am starting to work on a problem that looks like a shipping problem I worked on earlier: garments (shirts) are sent from China to stores in the US either directly in a number of packages containing standard contents or through a US warehouse. The cheapest is to send the standard boxes from China but as there are just a few standard configurations, stores will need to be supplied with additional shirts from the US warehouse. Shirts come in different sizes.

The basic model looked like:

All quantities with bars are parameters (constants). Even though this is a simple model it was not completely trivial to get good solutions:

The model as presented is an MINLP. We linearized to make it a MIP.

The data set was rather large because of a large number of stores.

We could preprocess the demand data as some stores have the same demand; this reduced the size by about a third. We added weights to the objective to represent how many times a demand pattern was used.

In practice over-supplying a store was expensive, so we can fix OverShipped(Store,Size)=0.

It was not practical to solve this model in one swoop. We found good solutions by first finding a good content for a single standard box, and then (after fixing the content of this first box) finding a good proposal for a second standard box. Adding more standard boxes was not reducing the objective by much.

Of course as discussed in http://yetanothermathprogrammingconsultant.blogspot.com/2011/07/scheduling-of-tv-advertisement-theory.html my real models are almost always much more complicated than shown in these stylized bin packing examples. In those cases a MIP-based solution algorithm can be even more worthwhile than a (simple) heuristic as there is more room for improvement. In the real case we dealt with a complicated multi-objective model where our model seems to outperform the heuristic in almost all objectives. This is the case even after the heuristic was improved using some fine-tuning after looking at our solutions.

The performance of GAMS looks very good. GAMS execution time scales slightly better than the LP solver used to solve the problem:

GAMS almost scales linearly with the number of nonzero elements to be generated. In this case that means linearly with the number of nodes (because the number of nonzeroes is equal to 10*nodes in these models).

The main difference with the earlier model is that we replaced:

x.up(i,j) = capacity(i,j);

by

x.up(arcs) = capacity(arcs);

This actually saves some time and memory. In the first case we set many x.up’s to zero even though they are not used in the model. These variables with non-default bounds will need to be created and use up memory. In the second case, we only touch variables that are really used.

This small change makes a lot of difference for larger models. For n=5k we see GAMS execution time decrease from 3.775 seconds to 0.156 seconds. If things get large one needs to pay attention to detail!

AMPL Version

A direct translation to AMPL can look like:

set nodes; set source within nodes; set sink within nodes; param capacity {(i,j) in (nodes cross nodes)}, default 0;

When we run a few different instances we see we do something causing non-linear scaling:

It is clear this is not a good approach. We need to do something about this set arcs. In AMPL we don’t really need to calculate this set, we can directly populate this from the data by replacing in the data file:

param: capacity := n1 n32 87 n1 n37 44 n1 n124 38

…

by

param: arcs : capacity := n1 n32 87 n1 n37 44 n1 n124 38 …

Now we get the set arcs basically for free, and we get linear scaling:

The total AMPL generation time vs. LP solution time is here:

This is very close to the GAMS timings given that this was on a different machine.

GLPK

The open source tool GLPK can handle many AMPL linear models. However, in some cases it is substantial slower. This model demonstrates this behavior. We run the model as: