Sunday, May 30, 2010

Simulating a Queue in R

In the GCaP class earlier this month, we talked about the meaning of the load average (in Unix and Linux) and simulating a grocery store checkout lane, but I didn't actually do it. So, I decided to take a shot at constructing a discrete-event simulation (as opposed to Monte Carlo simulation) of a simple M/M/1 queue in R.
We can make use of a lot of conveniences in R to accomplish such a simulation. For example, we don't have to worry about random number generation, we can simply use the rexp() function for an M/M/1 queue. It may not be the fastest code on the planet but it is guaranteed to be reliable. We also have the ease of integrating PDQ (Pretty Damn Quick) for analytic comparison, as well as the nice statistical analysis and plotting capabilities available in R.

Simulation Variables
As usual, we start with a list of the necessary variables for the simulation and its instrumentation.

t.end

Next, we need to write the R code to perform the actual M/M/1 simulation of arrivals into and departures from the queue.

Simulation Loop
This code meant to be pedagogic so, I haven't bothered to do anything spiffy like pre-allocating the Exp variates, for example. I based it on the example in Mac MacDougall's book Simulating Computer Systems (an oldie but a goodie), rather than the example in the more recent Introduction to Scientific Programming and Simulation Using R book, because I think there's a bug in their R code, but I didn't spend any time trying to find it. Also, that code is not instrumented.

Instrumented Metrics
Here, we collect the instrumentation data to form some well-known performance metrics. They correspond to the definitions given in class.

u

Queue Length
This is a plot of instantaneous queue length à la load average data. This is what queueing fluctuations look like. As I point out in class, they're responsible for the usually complicated math seen in queueing-theory textbooks that can make your head hurt.

The box shown as the red dashed line has the same area as that under the stair-step curve. Since the box has the same width (in time) as the curve, its height of 1.7845 represents the time-averaged queue length based on a sample of 100 time steps out of the total of 100,000 steps. As you'll see in the next section, the height of the box approaches to the steady-state value of Q = 3.00 predicted by PDQ as the simulation time is increased.

PDQ Model
For analytic comparison, we also include the corresponding PDQ-R model in the same script using the online manual for reference.

Yes, these few lines are equivalent to the above simulation code with instrumentation, and it's guaranteed to be in steady state. Running PDQ, even in R, is essentially instantaneous. The simulation will take longer, but given the plethora of MIPS/core available today, especially on laptops, running simulations in R is entirely feasible.

Results
Finally, we can compare the simulated M/M/1 queue with the corresponding PDQ results. As usual, it's best to break them into inputs and outputs.

Within the expected limits of precision, we can conclude that the simulation reached steady state during the specified 105 time-steps.
No doubt, I'll go into more detail about doing simulations in R during the upcoming GDAT class in August.

Roughly, for any m servers: a new arrival needs to check if there's already a waiting line, in which case join the tail (assuming FCFS priority). If not, then see if there is an available server. If so, start service, otherwise form the HOL.