Optimize a Data
Center
is a challenging programming problem presented in the Qualification Round
of the Hash Code 2015.
Among 230 teams, What’s in a name? from École normale
supérieure ranked first in the qualification, but third
in the final round. By challenging, I mean it is not possible to come up with
a deterministic polynomial-time optimal answer. I am not in a position to
either provide a rigorous proof of its complexity or its reduction to a known
NP-hard problem. But in this blog post I will investigate the following
question: Can we provide an optimal solution using integer
programming? In practice,
that would allow us to come up with an optimal solution to small-sized
problems. Without further ado, let’s start with the problem definition.

The Problem

Here, I will present a brief summary of the actual
problem.
A data center is modeled as rows​ of slots ​in which servers can be
placed. And some of the slots are known to be unavailable.

Each server is characterized by its size and capacity​. Size is
the number of consecutive slots occupied by the machine. Capacity is the total
amount of CPU resources of the machine (an integer value).

Servers in a data center are also logically divided into pools. ​Each
server belongs to exactly one pool. The capacity of a pool is the sum of the
capacities of the available ​servers in that pool.

The guaranteed capacity ​of a pool is the minimum capacity it will have
when at most one data center row goes down. Given a schema of a data center
and a list of available servers, the goal is to assign servers to slots within
the rows​ and to logical pools ​so that the lowest guaranteed capacity​ of all
pools is maximized.

Consider the following data center schema and a list of available servers. For
simplicity, it is assumed that server capacities are equal to server sizes.

Following layout is a solution to the above given problem. Here different
pools are denoted in distinct colors.

Preliminaries

Before modeling the IP (Integer Program), I will start with stating the
problem input.

​denotes the number of rows in the data center

denotes the number of slots in each row of the data center

denotes the number of unavailable slots

denotes the number of pools to be created

denotes the number of servers to be allocated

and denote th server’s respectively size and capacity

and denote th unavailable slot’s respectively row and
slot indices

While modeling the formulation, I will need to provide constraints to avoid
placing servers to unavailable slots. Rather than doing that, Atabey
Kaygun hinted me to represent the data in
blocks. That is, instead of , , and , I will transform
this data into a single lookup table called that denotes the size
of the available slots at th row and th block. For instance,
consider the following layout:

Here blocks will look as follows:

The Integer Programming Model

For simplicity, I will adopt the following index notation:

denotes row indices ()

denotes block (not slot!) indices (varies per row)

denotes server indices ()

denotes pool indices ()

Given that denotes the available blocks, the IP model can be
defined as follows:

Avoiding Minimax Constraint

The presented IP model contains a minimax objective, which to the best of my
knowledge is not tractable by popular linear programming optimizers, such as
CPLEX
or lpsolve. But I have a trick in my pocket
to tackle that. Let’s assume that we know the optimal objective, say .
Then we can model the entire IP as follows:

What this model states is this: I am not interested in the optimization
objective, return me the first found feasible solution. That is, the optimizer
will return us the first variable set the moment it finds
a feasible solution satisfying
constraints.

Now things are getting interesting. If we can find bounds to , than we
can use these bounds to bisect the optimal ! For the lower bound, we
know that . The upper bound is a little bit tricky, but we
can come up with a quite loose bound: . (I will not go into details of how to come up with a stricter upper
bound.) So by picking we can bisect the optimal
guaranteed capacity.

The Solver

For testing purposes, I wrote a simple Python
script that reads an input
problem file and calls CPLEX iteratively. A sample output of the script is as
follows: