I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.

The first approach to find the permutation with the longest schedule is to enumerate all permutations. GAMS has a built-in option to generate permutations of a set. It is not completely intuitive, in part because the way GAMS stores sets. All sets have fixed ordering with respect to a pool of set elements (the universe). To create a different ordering, an extra index is used. E.g. for 3 jobs we have:

The option > creates the permutations. Of course the syntax is not very readable: one would expect a function with a name somehow related to ‘permutation’. One of the major goals when using a modeling system is to produce readable models, and that is certainly not achieved with this syntax.

The result is:

---- 13 SET allp store all permutations

INDEX 1 = p1

job1 job2 job3

job1 YES job2 YES job3 YES

INDEX 1 = p2

job1 job2 job3

job1 YES job2 YES job3 YES

INDEX 1 = p3

job1 job2 job3

job1 YES job2 YES job3 YES

INDEX 1 = p4

job1 job2 job3

job1 YES job2 YES job3 YES

INDEX 1 = p5

job1 job2 job3

job1 YES job2 YES job3 YES

INDEX 1 = p6

job1 job2 job3

job1 YES job2 YES job3 YES

Although I don’t like the syntax, it is actually quite fast. If we set n to 10 and disable the display statement – it would take a long time to print all these permutations – we can generate all 10!=3,628,800 permutations in less than 2.5 seconds. This compares quite good to the following R session:

would hold exactly (for the places where it matters – if there is slack in some machine this relation may not hold).

This model would look much simpler with a constraint programming formulation. There we could have used constructs like all-different and max to simplify things. Many modern modeling languages support constraint programming (unfortunately GAMS does not). Also note that AMPL using Cplex indicator variables allows this to be formulated quite elegantly (see http://yetanothermathprogrammingconsultant.blogspot.com/2009/07/formulation-ciminaibi.html). GAMS does not directly support indicator variables in the language, but instead some option files must be used.

Monday, April 23, 2012

I was trying to demonstrate some simple scheduling models. The permutation flow shop has an obvious first solution: run jobs in the order given in the problem data: job1,job2,job3,…. We compared that with the optimal solution:

The model I presented is fairly simple:

Someone came up with the question: “what would the picture look like for the worst permutation?”.

I think this is actually more difficult to model. We assume in several places that jobs are automatically pushed to the left and if we just start to maximize this won’t work anymore.

Of course just trying all solutions would be feasible in this case. We have 10 jobs so 10! = 3,628,800 permutations (just type 10! in google).

Thursday, April 19, 2012

Come on guys, you really can do better than this! OK, what to do as a user? My simplest suggestion – using GAMS – is to add a bound on the objective variable and solve again:

z.lo=−1e10; solve m minimizing z using mip;

Now at least we see:

MIP status(3): Model was proven to be infeasible.

If the model was actually unbounded you would have seen large numbers in the solution, making it easy to find out what was wrong.

Note: Please don’t tell me “dual infeasible” is a good return code. That is just another name for the same thing, and just as useless for a modeler. If you are working on a model you really want to know if the model is infeasible or unbounded. I understand it is an easy way out for the algorithm-developer to return something like “dual infeasible” (I have heard some users saying that is just laziness, and I understand that sentiment). The return code should really relate to the model and not what is the easiest for the algorithm-developer to implement.

Not everyone agrees with these opinions (see the comments).

The model was fairly small. If we compare the time both the user needed to email me the question and the model, for me to look at it and respond (say in total half an hour) and the time the solver would need to spend to get a better diagnostic (let’s say 0.1 seconds) then we can say a huge efficiency gain is possible here.

For a different client we needed to run a randomized algorithm that solves many small MIP models. They are so small that using multiple threads inside the MIP solver does not give much performance boost (much of the time is spent outside the pure Branch & Bound part – such as preprocessing etc.). However as the MIP problems are independent of each other we could generate all the necessary data in advance and then call the scenario solver (http://www.gams.com/modlib/adddocs/gusspaper.pdf). This will keep the generated problem in memory, and does in-core updates, so we don’t regenerate the model all the time.

When running the algorithm with n=100,000 MIP models we see the following performance. Note that besides the MIP models there is also a substantial piece of GAMS code that implements other parts of the algorithm.

Implementation

number of MIP models

solve time

rest of algorithm

total time

Traditional GAMS loop (call solver as DLL)

100,000

1068 sec

169 sec

1237 sec

Scenario Solver

100,000

293 sec

166 sec

459 sec

To get more performance I tried to run the scenario solver in parallel. That is not completely trivial as the solver has a number glitches (e.g. scratch files with fixed, hard coded names). I also run parts of the GAMS algorithm in parallel, but some parts had to be done in the master model after merging the results.

Implementation

number of MIP models

Worker threads

parallel sub-problem time

rest of algorithm (serial)

total time

Parallel + Scenario Solver

100,000

4

116 sec

67 sec

183 sec

The implementation does not win the beauty contest, but it could be developed quickly. For these larger problems you always have to watch out for performance bottlenecks. One wrong line in the GAMS model and the performance drops to hours of computation time. In addition the GAMS tools here are not very refined, but if implemented correctly quite effective. As an example consider how I merge the results of the sub jobs. Each sub job writes a results file (GDX file) into its own directory and the master model will read those and merge them together so we have a single solution set for further processing:

Sunday, April 1, 2012

Although GAMS can be used from its IDE, it still has a command line interface. This can be used when calling GAMS from a batch (.cmd) file.

Recently I was asked to speed up running a number of scenarios by exploiting an 8 core machine. As the NLP (actually MCP) models used serial solvers we cannot just use THREADS=8 like when we use MIP solvers such as Cplex or Gurobi. One way to handle this is to write some batch files:

As we can see here the GAMS syntax around its PUT facility is very ugly (this is one of the parts of GAMS that really urgently needs some redesign). However, ignoring the ugly syntax, it actually generates batch files of the form: