Thursday, May 13, 2010

Security Screening: Discrete Event Simulation with Arena

Simulation is a powerful tool in the hands of Operations Research practitioners. In this article I intend to demonstrate the usage of a discrete event process simulation, extending on the bottleneck analysis I wrote about previously.

A few days ago I wrote an article demonstrating how you could use bottle neck analysis to compare two different configurations of the security screening process at London Gatwick Airport. Bottleneck analysis is a simple process analysis tool that sits in the toolbox of Operations Research practitioners. I showed that a resource-pooled, queue-merged process might screen as many as 20% more passengers per hour and that the poor as-is configuration was probably costing the system something like 10% of its potential capacity.

The previous article would be good to read before continuing, but to summarize briefly: Security screening happens in two steps, beginning with a check of the passenger's boarding pass followed by the x-ray machines. Four people checking boarding passes and 6 teams working x-ray machines were organized into 4 sub-systems with a checker in each system and one or two x-ray teams. The imbalance in each system was forcing a resource to be under utilised, and Dawen quite rightly pointed out that by joining the entire system together as a whole such that all 6 x-ray machines effectively served a queue fed by all 4 checkers, a more efficient result could be achieved. We will look at these two key scenarios, comparing the As-Is system with the What-If system.

The bottleneck analysis was able to quantify the capacity that is being lost due to this inefficiency, but as I alluded, this was not the entire story. Another big impact of this is on passenger experience. That is, time spent waiting in queues in the system. In order to study queuing times, we turn to another Operations Research tool: Simulation, specifically Process-Driven Discrete Event Simulation. Note: There may be an opportunity to apply Queuing Theory, another Operations Research discipline, but we won't be doing that here today.

Discrete Event Simulation

Discrete Event Simulation is a computer simulation paradigm where a model is made of the real world process and the key focus is the entities (passengers) and resources (boarding pass checkers and x-ray teams) in the system. The focus is on discrete, indivisible things like people and machines. "Event" because the driving mechanism of the model is a list of events that are processed in chronological order, events that typically spawn new events to be scheduled. An alternative driving mechanism is with set timesteps as in system dynamics, continuous simulations. Using a DES model allows you to go beyond the simple mathematics of bottleneck analysis. By explicitly tracking individual passengers as they go through the process, important statistics can be collected like utilisation rates and waiting times.

During my masters degree, the simulation tool at the heart of our simulation courses was Arena from Rockwell Automation, so I tend to go to it without even thinking. I have previously used Arena in my work for Vancouver Coastal Health, simulating Ultrasound departments and there are plenty of others associated with the Sauder School of Business using Arena. Example. Example. Arena is an excellent tool and I've used it here for this artilce. I hope to test other products on this same problem in the future and publish a comparison.

In the Arena GUI you put logical blocks together to build the simulation in the same way that you might build a process map. Intuitively, at the high level, an Arena simulation reads like a process map when in actuality the blocks are building SIMAN code that does the heavy lifting for you.

The Simulation

Here's a snapshot of the as-is model of the Gatwick screening process that I built for this article:

Passengers decide to go through screening on the left, select the boarding pass checker with the shortest queue, are checked, proceed to the dedicated x-ray team(s) and eventually all end up in the departures hall.

An X-Ray team is assumed to take a minute on average to screen each passenger. This is very different from taking exactly a minute to screen each passenger. Stochastic (random) processing times are an import source of dynamic complexity in queuing systems and without modelling that randomness you can make totally wrong conclusions. For our purposes we have assumed an exponentially distributed processing time with a mean of 1 minute. In practice we would grab our stop-watches and collect the data, but we would probably get arrested for doing that as an outsider. Suffice it to say that this is a very reasonable assumption and that exponential distributions are often used to express service times.

As in the previous article, we were uncertain as to the relationship between throughput of boarding pass checkers and throughput of x-ray teams. We will consider three possibilities where processing time for the boarding pass checker is exponentially distributed with an average of: 60 seconds (S-slow), 40 seconds (M-medium), 30 seconds (F-fast) (These are alpha = 1, 1.5 and 2 from the previous article). In the fast F scenario, our bottleneck analysis says there should be no increased throughput What-If vs. As-Is because all x-ray machines are fully utilised in the As-Is system. In the slow S scenario there would similarly be no throughput benefit because all boarding pass checkers would be fully utilised in the As-Is system. Thus the medium M scenario is our focus, but our analysis may reveal some interesting results for F and S.

We're focused here on system resources and configuration and how they determine throughput, but we can't forget about passenger arrivals. The number of passengers actually requiring screening is the most significant limitation on the throughput of the system. I fed the system with six passengers per minute, the capacity of the x-ray teams. This ensured both that the x-ray teams had the potential to be 100% utilised and that they were never overwhelmed. This ensured comparability of x-ray queuing time.

I ran 28 (four weeks) replications of the simulation and let each replication run for 16 hours (working day). We need to run the simulation many times because of the stochastic element. Since the events are random, a different set of random outcomes will lead to a different result, so we must run many replications to study the possible results.

Also note that I implemented a rule in the as-is system, that if more than 10 passengers were waiting for an x-ray team the boarding pass checker would stop processing passengers for them.

Results

Scenario M - Throughput Statistics

First let's look at throughput. On average, over 16 hours the what-if system screened 18.9% more passengers than as-is. The statistics in the table are important. Stochastic simulations don't given a single, simple answer, but rather a range of possibilities described statistically. The average for 4 weeks is given in the table, but we can't be certain that would be the average over an entire year. The half width tell us our 90% confidence range. The actual average is probably between one half-width below the average and one above.

Note: I would like to point out that this is almost exactly the result predicted analytically with the bottleneck analysis. We predicted that in this case the system was running at 83.3% capacity and here we show As-Is throughput is 4728.43/5621.57 of What-If throughput = 84.1%. The small discrepancy is probably due to random variation and the warm-up time from the simulation start.

But what has happened to waiting times?

The above graph is a cumulative frequency graph. It reads as follows: The what-if value for 2 minutes is 0.29. This means that 29% of passengers wait less than 2 minutes. The as-is value for 5 minutes is 0.65. This means that 65% of passengers wait less than 5 minutes.

Comparing the two lines we can see that, while we have achieved higher throughput, customers will now have a higher waiting time. Management would have to consider this when making the change. Note that the waiting time increased because the load on the system also increased. What happens if we hold the load on the system constant? I adjusted the supply of passengers so that the throughput in both scenarios is the same, and re-ran the simulation:

Now we can see a huge difference! Not only does the new configuration outperform the old in terms of throughput, it is significantly better for customer waiting times.

What about our slow and fast scenarios? We know from our bottle-neck analysis that throughput will not increase, but what will happen to waiting times?

Above is a comparison between as-is and what-if for the fast scenario. The boarding pass checkers are fast compared to the x-ray machines, so in both cases the x-ray machines are nearly overwhelmed and the waiting time is long. Why do the curves cross? The passengers that are fortunate enough to pick a checker with two x-ray machines behind them will experience better waiting times due to the pooling and the others experience worse.

This is a bit subtle, but an interesting result. In this scenario there is no throughput benefit from changing, there is no average waiting time benefit from changing, but waiting times are less variable.

Finally, we can take a quick glance at our slow S scenario. We know again from our bottleneck analysis that there is no benefit to be had in terms of throughput, but what about waiting times? Clearly a huge differenence. The slow checkers are able to provide plenty of customers for the single x-ray teams, but are unable to keep the double teams busy. If you're unlucky you end up in a queue for a single x-ray machine, but if you're luck you are served immediately by one of the double teams.

Summary

To an Operations Research practitioner with experience doing discrete event simulation, this example will seem a bit Mickey Mouse. However, it's an excellent and easily accessible demonstration of the benefits one can realize with this tool. A manager whose bottleneck analysis has determined that no large throughput increase could be achieved with a reconfiguration might change their mind after seeing this analysis. The second order benefits, improved customer waiting times, are substantial.

In order to build the model for this article in a professional setting you would probably require Arena Basic Edition Plus, as I used the advanced feature of output to file that is not available in Basic. Arena Basic goes for $1,895 USD. You could easily accomplish what we have done today with much cheaper products, but it is not simple examples like this that demonstrate the power of products like Arena.