I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at erwin@amsterdamoptimization.com.

Wednesday, January 27, 2016

In http://pyeasyga.readthedocs.org/en/latest/examples.html a small example of a knapsack problem is shown. Here it has two constraints instead of the usual one (hence multi-dimensional). Note that we are maximizing the objective. The code to solve this with a genetic algorithm is:

When I run it I actually get a somewhat worse objective:

When solving the same problem as a MIP:

we get as solution:

Notes:

Even for some really small examples from the documentation, meta-heuristics find sub-optimal solutions.

I often see something like: “problem is NP-hard so we must use a meta-heuristic.” I don’t think this is always true.

MIP solvers have the advantage that they provide optimal solutions for small instances. For large difficult problems they can give a good solution and in addition a measure of how good this solution is (the gap). Meta-heuristics do not have this property.

Some commercial MIP solvers actually use heuristics extensively inside their algorithms to help find good solutions quickly.

I have implemented a number of heuristic algorithms for some nasty problems, but I always try to start with a MIP implementation to get a baseline (both wrt performance and solution qualities).

Wednesday, January 20, 2016

In Pivot a table: GAMS, SQLite, R and Python we described a pivoting operation on a small example. In the comments it was mentioned that for larger tables or dataframes, especially tydir is very fast. Here we try to confirm this using an artificial example:

df2<- dbGetQuery(db,"select tj1.i as i, tj1.value as j1, tj2.value as j2, + tj3.value as j3, tj4.value as j4, tj5.value as j5,+ tj6.value as j6, tj7.value as j7, tj8.value as j8,
+ tj9.value as j9, tj10.value as j10
+ from (select i,value from p where j='j1') as tj1,
+ (select i,value from p where j='j2') as tj2,
+ (select i,value from p where j='j3') as tj3,
+ (select i,value from p where j='j4') as tj4,
+ (select i,value from p where j='j5') as tj5,
+ (select i,value from p where j='j6') as tj6,
+ (select i,value from p where j='j7') as tj7,
+ (select i,value from p where j='j8') as tj8,
+ (select i,value from p where j='j9') as tj9,
+ (select i,value from p where j='j10') as tj10
+ where tj1.i=tj2.i and tj1.i=tj3.i and tj1.i=tj4.i and tj1.i=tj5.i
+ and tj1.i=tj6.i and tj1.i=tj7.i and tj1.i=tj8.i
+ and tj1.i=tj9.i and tj1.i=tj10.i")

Conclusion

The methods that work on in-memory datastructures are faster than SQL but depending on the SQL formulation we can come close. The big advantage of the pivot methods in R and Python is that they require much less typing (and less thought).

Sunday, January 17, 2016

We want to cut it so we can create final product. This final product consists of combos of wood types A,B,C and D of the same length.

The final product can not be shorter than 30.

In the post it was requested to produce a cutting scheme such that left over raw material of length less than 30 (waste) is minimized.

If we would just use this objective (minimize waste) the optimal strategy would be “don’t cut anything”. Therefore our first approach is to use a slightly different objective: create as much final product as we can. Later on we will discuss some refinements.

As usual there is more than one way to skin a cat, here is just one way of doing that. First introduce our basic “assignment” variables and the index sets they operate upon:

The assignment structure follows from these fundamentals.

The first equation is used to make sure that exactly one raw piece is used for each wood type in a final product. We introduce a binary variable yj to keep track which j’s are actually created (typically we have more j’s in the model than needed: we don’t know this number in advance).

Equation (2) deals with the length of the final product combo. Note that Lj does not depend on the wood type w. So all pieces in the combo have the same length. Note that this final product combo length is actually a decision variable.

Inequality (3) simply makes sure δw,i,j=0 ⇒ xw,i,j=0.

Finally inequality (4) deals with raw material lengths. This Li is a constant.

The data set we will use is:

This means in the model we do not want to use all combinations (w,i) but rather a subset wi(w,i). A complete model setup can look like:

The model equations are:

Notes:

The variables Lj are declared integer, just to get nicer solutions.

The order equation helps producing nicer reports but also helps in speeding up things.

We solved with Cplex with some extra options:

Threads to allow parallel B&B

Mipemphasis to indicate we want to go to optimality

Priorities on the discrete variables to help Cplex a bit

The solution time was < 2 minutes (to proven optimality).

The final results look like:

The rows represent the raw material and the columns are the final product combos. Each column should have exactly four entries (one from each color or wood type). The columns are ordered by length and the smallest length is 30. On the right we verify that we have never exceeded the raw material length. In this solution we have used up all the yellow (wood type C) raw material.

Objective

The proposed objective in the original post is:

(1) Minimize Waste

where waste is left over cuts with a length < 30. The problem with this objective is that “doing nothing” would be an optimal solution.

In our model we used the objective:

(2) Maximize total length of final product

Actually this is not such a bad choice. We can see we use up the yellow wood type C (this has a smallest amount of raw material) without any waste.

If we know the demand for the final product we can use the following model:

(3) Minimize Waste Subject to meeting demand

If we don’t know demand in advance and the final product is made for inventory, we can use a different approach. We assign dollar values to both raw material lengths and final product lengths. Of course a raw material length length of < 30 gets a zero value (or may be a small scrap value). Now we can do:

(4) Maximize Total Value

We can refine this further by assigning different values to different lengths, so we can favor longer lengths.

Here we just use two different display styles to show the same data. Later on we will see that data can also be actually stored in these two formats. This applies to database tables and R or Python dataframes.

To store the loc variable in an SQLite database we can use the following in the GAMS model:

Note that the table and resulting dataframe are in “long” format. To select the x and y coordinates we need to do something like:

> x <- dbGetQuery(db,"select k,level as x

+ from loc

+ where c='x'")

> y <- dbGetQuery(db,"select k,level as y

+ from loc

+ where c='y'")

> x

kx

1 k1 0.8934478

2 k2 0.6683234

3 k3 0.1372328

4 k4 0.1388778

> y

ky

1 k1 0.8934479

2 k2 0.3176426

3 k3 0.7863548

4 k4 0.3156270

In the following sections we will try to get x and y as columns in one single dataframe.

3. Pivot in SQL

A big advantage of SQL is that we can do some data manipulation steps before creating and populating the dataframe. In this case we want to do a pivot operation where the level column becomes two columns x and y. Of course we can do that in SQL. There are many ways to implement this, and here is an example:

Note: There is an SQL PIVOT construct. As of now it seems supported only by SQL Server and Oracle.

4. Pivot in R: reshape2

The pivot operation can also be performed in R. Here we use the dcast function of the reshape2 package:

> library(reshape2)

> df1<-dbGetQuery(db,"select k,c,level from loc")

> df1

k clevel

1 k1 x 0.8934478

2 k1 y 0.8934479

3 k2 x 0.6683234

4 k2 y 0.3176426

5 k3 x 0.1372328

6 k3 y 0.7863548

7 k4 x 0.1388778

8 k4 y 0.3156270

> df2<-dcast(df1,k~c,value.var="level")

> df2

kxy

1 k1 0.8934478 0.8934479

2 k2 0.6683234 0.3176426

3 k3 0.1372328 0.7863548

4 k4 0.1388778 0.3156270

5. Pivot in R: tidyr

The tidyr package is a successor to reshape2. The function we use to pivot our data is called spread:

> library(tidyr)

> df1<-dbGetQuery(db,"select k,c,level from loc")

> df1

k clevel

1 k1 x 0.8934478

2 k1 y 0.8934479

3 k2 x 0.6683234

4 k2 y 0.3176426

5 k3 x 0.1372328

6 k3 y 0.7863548

7 k4 x 0.1388778

8 k4 y 0.3156270

> spread(df1,c,level)

kxy

1 k1 0.8934478 0.8934479

2 k2 0.6683234 0.3176426

3 k3 0.1372328 0.7863548

4 k4 0.1388778 0.3156270

6. SQLite in Python

In Python we use packages SQLite3 and of course pandas.

There is no built-in “list tables” function so we use the master table of the SQLite database. The table loc looks like:

The SQL “pivot” query is long so we can use a multi-line string:

7. Pivot in Python: pandas.pivot

Pandas has a sophisticated pivot method:

The output is not exactly the same as with the SQL approach, but it is close. We just need to do one more step to make the index labels k a normal data column:

Conclusion

When we move data from GAMS –> SQLite –> dataframe, we often need to make some data manipulation steps, some of which may not be totally trivial. But we have powerful and flexible tools at our disposal: SQL and R/Python. In this case we showed that a pivot operation to transform a “long” table into a “wide” one can be done either at the SQL level when importing the data into R/Python or directly in R/Python.