I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.

A well known example of symmetric data is the portfolio optimization problem. If the variance-covariance matrix is large and dense it certainly helps to consider only half of it. Here is a way to exploit a symmetric Q matrix which can be used in many QP models with x’Qx in the objective:

Even if Q is not symmetric we can make it symmetric by something like q(i,j) = 0.5(q(i,j)+q(j,i)).

Of course one could argue that MIP solvers and their presolvers are getting smarter and we don’t need this anymore. This is indeed sometimes the case for the better solvers. However, for larger instances we should not only consider the solver but also time spent in generating the model. Often these tricks can make a significant impact on overall performance and total turnaround time of jobs.

Some computational evidence is here regarding a large portfolio QP model (synthetic dataset with 2K instruments):

As you can see Cplex does not really care whether QP is full or triangular (the small difference in time is related to reading the Q data: there is more data when Q is full). However GAMS is twice as fast when we use a triangular Q. Aside from this it is noted that GAMS is not very fast compared to Cplex on these models. The processing of the nonlinear terms is relatively expensive.

For completeness I want to mention that for this portfolio optimization problem there is an alternative formulation using the mean adjusted returns directly. This formulation leads to a QP with a diagonal Q matrix. The timings for the same data set are:

GAMS benefits greatly from this easier formulation, but also Cplex is faster.

Abstract: In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

The first line exports the parameter from a GDX file to a tab delimited text file. The tool gdx2txt is a little bit more convenient than using GAMS PUT statements and it is also faster (using the same precision more than 5 times as fast).

The second line calls the BCP utility to import this text file. The hints setting the batch size and requesting a table lock increase the performance.

The total turn around time for this test data was 35 seconds. About 11 seconds for GDX2TXT and 24 seconds for BCP. So the database import is only 2.5 times as slow compared to my time to generate the CSV file. That is very fast indeed. The table had four columns: three GAMS indices (converted to type NCHAR(4)) and a value (type REAL). No (SQL) indices or keys were used against the table so the inserts could be done at optimal speed. Also the db file sizes were extended so they did not need to grow during the inserts. As a result, in practice the BCP timings may be a bit slower than shown here.

Note: dropping all rows by “DELETE FROM testdb.dbo.result” can take a long time (1.5 minutes on my machine). Much faster is “TRUNCATE TABLE testdb.dbo.result”.

A further improvement is that we fix guest 1 to table 1. This reduces the symmetry in the model.

So which model performs better? Of course this is only one data set, so we need to be careful not to read to much into this, but here are the results with GAMS and Cplex 12.3:

original model

reformulation

equations

278

1536

variables

1531

902

binary variables

1530

84

optimal objective

226

226

solution time

216 seconds (reported in paper: 2 seconds)

0.4 seconds

iterations/nodes

3502605 iterations, 93385 nodes

6943 iterations, 180 nodes

It looks like the performance of the original model is much slower than reported in the paper. I don’t know the reason for that, may be the GAMS model contained a few tricks not mentioned in the description of the mathematical model (may be related to symmetry when comparing guest j against guest k). Another reason may be that in the paper it is mentioned that some serious server hardware is used while I am running on one thread on a laptop.

Of course this model can also be formulated as a Constraint Programming model. In http://hakank.org/minizinc/wedding_optimal_chart.mzn a single integer variable Tables[guest] for each guest is used indicating the table number. Furthermore using CP constructs one can reduce the model to basically one block of equations:

Note that the CP formulation is probably incorrect w.r.t. to the minimum number of known people at a table (parameter b).

Update: The CP model is fixed. The advantage of using modeling languages is that other people can read the model. If this would be some C++ code I would not have bothered to try to understand it. Also it is often argued that a modeling language helps in maintaining models as fixes are easier to identify and implement than in a traditional programming language.

Wednesday, February 8, 2012

I got a new question. Since AMPL does not allow command like "if.... then...", my question is :

how to convert the following sentence into a constrain which is acceptable by AMPL.

For example,

set i set j

param a (i,j)

var b (i,j)

#constrain

if b (i,j) > 0, then a (i,j) < 30

If you are using a modeling systems such as AMPL, one of the first concepts you need to understand is the role of parameters and variables. As long as you don’t understand this, there is probably very little you can actually write down correctly.