I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.

Sunday, June 28, 2015

There is a very tight limit on the length of these labels: 63 characters!! This sounds like a strange limit, and it surely is. It is also way too small to handle many datasets from sources like databases and spreadsheets. As these labels are really data, and come form different data sources we do not always have control over their format. That means sometimes we cannot read (otherwise correct) data. In those cases we have to spend time and effort to devise workarounds. In some cases we can truncate strings (e.g. using LEFT(column_name,63) in an SQL query). Note that this would still make it more difficult to put the solution back into the database. And in some cases we see that truncation will not yield unique names. This happens when some names are only different after 63 characters. In that case we have a real problem. This is not just a theoretical possibility: I received just last week a spreadsheet that showed this issue. In another application based on a MySQL database, I am forced to use ugly short codes instead of long descriptive names for the same reason.

So we can conclude: the 63 character limit is just inadequate. A tool like GAMS is supposed to help making a modeler more productive, and this limit is really not doing that. What should the limit be? I have heard that the GAMS people are pondering to make this limit 255 characters. This is much better, but I would argue that this is still the wrong approach. With software we typically have a trade-off between development cost and effort and productivity gains for the end-user. COTS (Commercial Off-the-shelf) software often tips the scale in favor of the user: there are just many users. I.e. making things easy for the programmer should be a secondary concern. So I would argue to spend a little more effort by the developer to implement a really long label functionality (e.g. up to 2^31-1).

If we look at the table below, we see what kind of limits are imposed by other software (often used as data source for GAMS models):

I was worried. But luckily it looks like this was misinformation. The new version of the article says:

“Correction: An earlier version of this article incorrectly assumed that Microsoft acquired the rights to R Project from Revolution Analytics. As Revolution Analytics chief community officer David Smith has since stressed, "R is owned by the R Foundation," not Microsoft.”

Somehow I find these pictures not very helpful in my modeling work. One reason is that problems that don’t display this structure may be very well decomposable. The ordering of rows and columns is very important to detect this structure visually. E.g. I often use a time index t as last index in my variables and equations. As the last index runs the fastest, that will not give these nice pictures. Also modeling systems will likely export all variables x() before y() – i.e. ordered by variable (equation) name. In general you will need to reorder rows and columns to make these pictures meaningful.

Wednesday, June 24, 2015

has an interesting test function for a large system of nonlinear equations:

This function originates from:

Here it has a fixed starting point:

Solving a triangular system

This system of nonlinear equations actually forms a triangular system. This means a good preprocessor can solve this just by solving small 1x1 problems. To be precise: first solve for x1, then for x2 etc. Note that these tiny 1x1 problems are possibly nonlinear. In this case they are for sure nonlinear. We write the model as:

I.e. 10k equations and 10k variables. Note that x(i-1) for the first i is zero in GAMS.

Wednesday, June 17, 2015

It is well known large big-M values can cause big headaches for MIP solvers. There are many examples where something like M=9999999999 is used, leading to very wrong results. In some cases we can use a MIP model to calculate tight values for big-M constants. In this case I have a model where the MIP model to calculate the tightest bound is actually using the big-M itself:

This is of course not a useful approach. Luckily there are a few alternatives:

Keep (1) and most of (2), and we end up with a case where it is possible to find a bound by just looking at the data and so some simple calculations (no MIP is needed). From the problem (a design problem) we know this is a really good bound.

More sophisticated we can relax only equations (3), and solve a MIP to find a slightly tighter bound. We need to do this for several yk's. But the models are small and solve fast (we can even solve them in parallel). That is the approach I am going to take.

In GAMS we can write something like:

eq3(k)$(not relax(k)).. y(k) =L= <expr>;

to turn on or off an equation based on set relax. This allows us to re-use this equation unaltered in both this preprocessing step, in the real model and in some reporting step.

Thursday, June 4, 2015

On some models symmetry poses a real problem for solvers. Here is an example of a sports related scheduling problem. We achieve a speedup of 35x by adding constraints that break a number of symmetries in the model. The results here are with Gurobi.

Modern solvers have also built-in facilities to detect and do something about symmetry. E.g. Gurobi has the option Symmetry which can be set to 2 (aggressive). I never had much luck with this. Indeed the results are:

Cplex shows very similar behavior:

Note that Cplex provides a large number of settings for the symmetry option:

That is probably just an educational tool to teach the user a lesson about combinatorial explosion: finding the right settings for your model.

Often solver suppliers say this is a solved problem:

This is an example that shows the fixes are at least insufficient. The algorithms do not handle this problem always in a satisfactory way.