Website Scaling, Part 1: What It Means and How to Get There

People doing load testing for the first time often become confused about the definition of "concurrent users." I can hear them thinking: The specification says the site needs to handle 1,000 concurrent users. What does that really mean? Hint: It doesn't mean 1,000 automatons blindly submitting pages as fast as possible.

By Martin Heller
01/17/12 5:00 AM PT

The stories of websites that fell over and died when they got unexpected traffic are legion. A recent example of what not to do would be Target's introduction of a low-priced Missoni collection. The site attracted so much traffic that
Target.com was swamped for several hours on Sept. 13, 2011, leading would-be shoppers to face a cute but frustrating site-down page.

Some might argue that having so much traffic that the website went down was actually good for business. That might be the case if you're Target. If you want to make your own website fall over for publicity reasons, that's easy. Keeping it running during an onslaught of traffic to, you know, take orders and make money, is much harder.

In this two-part series on website scaling, I'll be exploring what it means and how to accomplish it. Part 1 discusses the metrics that apply to all websites, clarifies the true meaning of a "concurrent user," and covers the measures you can take to reach full scalability by defining performance-testing basics.

The '-ilities'

If you were a fly on the wall at a planning meeting for a large website, you might hear the participants talking about the "-ilities." This is shorthand for scalability, availability, reliability, and a bunch of other performance components that might not actually end in "ility." For example, there's also capacity, throughput and latency.

The point of worrying about these aspects is that you want to build a website that can handle your expected usage patterns and then some. That's because nobody expects the Spanish Inquisition (inside joke for
Monty Python fans). That is, nobody can anticipate their peak load for unforeseen conditions. For example, having your tech article slashdotted, having your website recommended in a national newspaper, or having the geniuses in marketing make a shrewd buy and run a great TV ad about it in prime time -- without bothering to warn you so that you can prepare for a bump in visits, logins and purchases.

An availability of 99.9 percent means that the website functions properly for end users 99.9 percent of the time. This number typically does not include planned maintenance periods. To guarantee the specified availability at the required load, you need to simulate the target concurrency and throughput, as well as verify that you consistently obtain correct responses for more than 999 out of 1,000 requests within the target response time window.

For purposes of actual site operation, concurrency refers to the number of users simultaneously logged in and active on the website.

Throughput is easier to understand. If you have a static or stateless page, you want it to be able to serve up some number of hits or page views per second. Scalability is a measure of how the website responds to increased load, and it is typically measured by adding virtual users (or increasing hits per second if the website is stateless) and measuring response times and error rates.

Response time is the time it takes the website to respond to the request. For most interactive applications, this will include the time it takes to send the request (initial latency), process it at the site (processing time), return the response (final latency), and paint the screen in the Web browser (display time). When the full response time is likely to be long enough to annoy the user,
a site can appear more responsive by displaying progress indicators.

Processing time might be broken down further into components from the Web server, application server, Web service, and data layer, should the Web application include these architectural elements. On a SQL database, the data processing time can usually be broken down even further.

Some applications care most about the completion of a state other than the full repainting of the client screen. For example, a stock trading application might care most about the time to complete a buy or sell transaction, which can be critical in a quick-moving market. Notification of the trader can be done later.

Response time almost always increases with concurrency and throughput. The question when you test a website is often this: How high can you take the load without making the response time (or responsiveness, depending on which matters most) high enough to annoy the users? In the end, you often have to balance capital and operational costs against performance under load.

Network utilization includes both local-area network (LAN) and wide-area network (WAN) bandwidth utilization. In a typical server farm, the private, internal LAN runs 100BASE-T (100 Mbps over twisted pair cables) or Gigabit Ethernet, and the WAN has multiple redundant high-speed telephone carrier line connections that might range from T1 (1.5 Mbps over copper) or T3 (45 Mbps) all the way up to OC-768 (40 Gbps over optical fiber).

Most serious servers have at least two network cards -- one for the private LAN and one for the public Internet. Depending on the configuration and the other traffic at the facility, bandwidth limitations can occur at the network card, at routers and firewalls, at the carrier lines, or elsewhere in the Internet route to the facility. When measuring network utilization, look at the data volume, data throughput and data error rate.

Server utilization includes CPU, RAM, disk I/O, and disk space utilization. Depending on the hardware and software architecture, utilization can be limited by any of the four factors, although in some cases one will affect the others. For example, a server with overcommitted RAM will incur a large number of page faults per second, which in turn will cause high CPU and disk loads and create a large swap file which might fill a disk.

Concurrent Users

People doing load testing for the first time often become confused about the definition of "concurrent users." I can hear them thinking: The specification says the site needs to handle 1,000 concurrent users. What does that really mean? Hint: It doesn't mean 1,000 automatons blindly submitting pages as fast as possible.

Actual users of your application typically log in, look at a few things, go for coffee, come back, type a few things, take a phone call, read a few more things, head out to lunch, and eventually log off or their session times out. You get the picture. You can create realistic profiles for different kinds of users that include think time between Web actions. A manager will have a completely different usage profile from a technical support representative.

Also, not every registered user of your site will be logged in 24/7. In a business intranet site, certain classes of users may log in at 9:00 a.m. local time, and log out at 5:00 p.m. local time. (The difference between local time and server time will, of course, vary with time zone). Other classes of users might log in intermittently throughout the day when their devices come into a WiFi coverage area.

Concurrent virtual users are the number of virtual users from the point of view of your performance testing tool. For example, you might have a load test that ramps up to 1,000 virtual users. This is typically a larger number than the concurrent application users, especially if login and logout are part of the usage profile and sessions time out during the load test. In any case, a stateful application only sees users with active sessions, whether or not they are logged in.

If you have a stateless application, you should not even try to test concurrency. Instead, test throughput -- typically in terms of the number of page views per hour the application can handle without errors, and without excessively or annoyingly high response time.

Performance Testing Basics

There are at least six types of performance tests: baseline, load, stress, soak, smoke and isolation. The first three should always be done.

A baseline test measures the performance of an unloaded system. A load test loads the application up to the specified concurrency or throughput, or slightly higher, to demonstrate conformance with the performance requirement. A stress test loads the application until it breaks to determine the capacity of the system, the headroom available, and the rate-limiting component of the system.

A soak test runs the system at load for an extended time to determine whether performance degrades over time, as sometimes happens when the application leaks resources. A smoke test briefly exercises the portion of the application that has changed. An isolation test exercises a suspect part of the application to determine the cause of a problem.

As mentioned earlier, different kinds of users perform different kinds of activities at different paces. You need to understand your user profiles before you can create test scripts.

You will need to simulate your production environment. It's rare that you get to use the production hardware for performance testing except prior to site acceptance and launch. After launch, however, the application will still need periodic testing to validate the performance as the application and environment evolve.

Using your user profiles, you can construct and verify your scripts with your load-injection tool. You also need to set up your instrumentation for key performance indicators, based on the metrics described in the "-ilities" section. Some unified performance-testing tools include their own integrated instrumentation, as well as load injection.

On a Windows system, adequate monitoring and reporting can be done with the system Performance Monitor and Resource Monitor tools. On a Linux system, you can use the built-in top, vmstat, ps, free, iostat, sar, mpstat and pmap commands, or add a monitoring tool such as Nagios, FAN or Cacti. Depending on your Linux desktop choice, you may be able to run KDE System Guard or Gnome System Monitor.