Random Samples and Statistical Accuracy

Random Sampling Overview

This article explains how random sampling works. If you want to skip the article and quickly calculate how many people you need for your random sample, click here for an online calculator.

If you are collecting data on a large group of employees or customers (called a "population"), you might want to minimize the impact that the survey will have on the group that you are surveying. It is often not necessary to survey the entire population. Instead, you can select a random sample of employees or customers and survey just them. You can then draw conclusions about how the entire population would respond based on the responses from this randomly selected group of people. This is exactly what political pollsters do - they ask a group of people a list of questions and based on their results, they draw conclusions about the population as a whole with those often heard disclaimers of "plus or minus 5%."

If you are simply looking at one large group of people as a whole, the process of determining a random sample is pretty straightforward. You will need to know how many people are in the entire group (e.g. the total number of employees) and how "accurate" you want your results to be (see "Statistical Confidence" below). When you survey a portion of a population, there will be some margin of error in the results, but when the margin of error is reduced to just a few percentage points, it often becomes of little concern.

If your population consists of just a few hundred people, you might find that you need to survey almost all of them in order to achieve the level of accuracy that you desire. As the population size increases, the percentage of people needed to achieve a high level of accuracy decreases rapidly.

In other words, to achieve the same level of accuracy:

Larger population = Smaller percentage of people surveyed

Smaller population = Larger percentage of people surveyed

Employee Surveys:Should I use a random sample?

For employee surveys, most organizations are too small for random sampling to be useful. For large companies (e.g. tens of thousands of employees), random sampling can be an option to consider when conducting an employee survey. Keep in mind, however, that many of the most critical employee engagement or employee satisfaction problems are often found in small subgroups within the organization. Random sampling can make it difficult or impossible to identify these hidden pockets of discontent since there won't be enough employees selected within those small groups to measure local employee attitudes.

Stratified Random Sampling

More often than not, you will not only want to examine the results from the overall population, but also understand the differences between key demographic subgroups within the population. For example, you might want to understand the differences between different groups of employees, like senior managers vs. regular employees. If you plan to look at distinct subgroups such as these, you should perform a stratified random sample. In a nutshell, this means you will need to select a separate random sample from each of the subgroups rather than just taking a single random sample from the entire group. The process is slightly more time consuming and will require you to survey a greater number of people overall, but this technique can be very valuable.

If you want to conduct a stratified random sample, think carefully about the single most relevant demographic division that can be made between people within your population. It is probably not practical to conduct a stratified random sample on more than one demographic category as the process becomes much more complex and you will ultimately end up needing to survey almost the entire population if any of the subgroups are very small. For example, if you wanted to look at employee survey results and by level and job function, you would need to look at each level/function combination and you might find very small numbers of employees within some of these areas.

Statistical Accuracy - Confidence and Error

In order to understand random sampling, you need to become familiar with a couple of basic statistical concepts.

1. Error - This is that "plus or minus X%" that you hear about. What it means is that you feel confident that your results have an error of no more than X%.

2. Confidence - This is how confident you feel about your error level. Expressed as a percentage, it is the same as saying if you were to conduct the survey multiple times, how often would you expect to get similar results.

These two concepts work together to determine how accurate your survey results are. For example, if you have 90% confidence with an error of 4%, you are saying that if you were to conduct the same survey 100 times, the results would be within +/- 4% of the first time you ran the survey 90 times out of 100.

If you are not sure what sort of error you can tolerate and what level of confidence you need, a good rule of thumb is to aim for 95% confidence with a 5% error level.

Error is also referred to as the "confidence interval" and Confidence is also known as "Confidence Level." In order to avoid confusion, these concepts will simply be referred to as "Error" and "Confidence" in this article.

Performing a Stratified Random Sample

If you are performing a stratified random sample, there are a couple of additional steps that you need to take.

1. Determine the size of the smallest subgroup in your population. For example, if you want to look at males vs. females and there are fewer females, then this is the group you want to look at.
2. Calculate the number of people required to achieve your desired error level and level of confidence for this subgroup.
3. Calculate what percentage of people that you will need to survey within this subgroup (number of people to survey divided by total subgroup size).
4. Finally, calculate the number of people in each of the other subgroups that are needed to achieve this same ratio (multiply the percentage from step 3 by the size of each of the other subgroups). This is how many people you will need to survey within each group.

Remember, a larger group means a smaller percentage required to get the same level of accuracy. That is why we start with the smallest group and work our way up. The results you get from the larger groups should actually be even more accurate than the results from the smallest group, but you can at least be sure that each group meets your minimum accuracy requirements.

Do not calculate the number of people required to achieve the desired error level and level of confidence for each subgroup. While this might seem tempting since it would mean surveying fewer people from the larger groups, it will distort your overall results. It is important that each subgroup is proportionately represented. If you survey 75% of the people from a smaller group and only 25% of the people from a larger group, then the overall results for the entire population will be skewed in favor of the smaller group since they will be disproportionately represented. You might find this rather restrictive, especially if your subgroups vary greatly in size. While it might be OK to fudge a little around the edges, it is critical that you not disregard the importance of this fact. Alternatively, if the groups are not proportionally represented, adjust the final results to get proportionately weighted results from each group.

Final Steps - Putting it All Together

Once you have determined how many people you need from either your population as a whole or from each subgroup within your population, you simply need to determine a way to randomly select the specified number of people from each group. There are many wrong ways to go about this. Whatever technique you use, be sure that you really are selecting people at random and not accidentally giving preference to anybody for any reason. An easy and fast way to randomly select people is to use MS Excel. The steps to make the random selection are as follows:

1. Copy and paste a list of every person in the group into a single column. You can use names, email addresses, employee numbers, or whatever.
2. In a second column, fill the entire column with Excel's "Randomize" function. The exact value of each cell should be "=rand()" (do not include the quotation marks). Only fill the cells next to where you pasted the group info in step #1.
3. Sort both columns by the "Randomize" column. It does not matter whether you sort them in ascending or descending order.
4. Scroll down to the row number of the group size. Everybody from this row up is a part of your sample (see important note below regarding response rates).

For a sample Excel spreadsheet that illustrates how this would look, click here.

Adjusting for Estimated Response Rate

This last and very important step might require a bit of guesswork. At this point, you have figured out how many responses you need from your population or from each subgroup within your population. If every one of those people were to respond to your survey, then you would be all set; however, in reality, many of the people you have randomly selected will not complete your survey. You will need to estimate what percentage of people you expect to respond. Response rates can vary widely depending on the population and the nature of the survey. You can use past experience, your knowledge of the population, and the nature of the survey itself (longer surveys will have lower response rates) to come up with your best estimate. You will then need to figure out how many people you need to ask to complete the survey in order to get your desired number of responses. For employee surveys, a typical response rate is usually around 70% - 80%. For customer surveys, response rates are usually much lower - often 5% - 10% or even less.

Once you have come up with your best estimate of the response rate, just divide the number of people needed by the response rate percentage to figure out how many people you need to ask to complete the survey. For example, if you determined that you need 500 people to respond to your survey and you estimate that 75% of people will complete the survey, you will need to ask 667 people to complete the survey in order to get 500 responses (500 / 0.75 = 667).

It is worth noting that there might be some skewing of your results based on the fact that you are conducting an internet-based survey. Only people with access to the internet and who are comfortable filling out an online survey will respond. If you were conducting a survey of internet usage, this might be of particular importance. For most (non-academic) surveys, this is not a major concern. You will need to determine for yourself whether the survey medium might have an effect on your survey results.