5 users are ok in qualitative testing, but not sufficient for statistically significant results. But of course, what is the point of proving that the new task can be carried out significantly faster?
–
giraffAug 22 '11 at 9:23

I don't have a reference for this, but I would think that these numbers would depend on the size of the user base.

For qualitative testing you need to have "typical" users. So if you have 3 roles your users can take you need at least 3 users - one for each role. In reality you'd want more than one, but this is your absolute minimum.

For quantitative testing you need a significant proportion of your user base. I don't know what that number would be, but if you take your user base and 10% as a figure you'll need 100 users. However, this might be unrealistic - there might be no way you can manage that many users, or if you've got a small user base it would produce a very small number.

For quantitative testing, it's possible to be more explicit about the effect of the sample size on your results but the number of users you need depends on the particular tests or analyses you are considering (examples could be determining the proportion of participants successfully completing a task, estimating the average time-on-task, comparing two versions with a questionnaire like the SUMI or the SUS…) It's therefore difficult to give a rule of thumb that would be useful for all situations but there are techniques to find out the sample size you need in a given situation.

Now, if you don't want to go over all this trouble and actually estimate things like confidence intervals and statistical power, there are still two important conclusions to remember.

The first one is that the precision of the estimate and therefore the number of users you need to achieve a given level of precision do not depend on the size of your user base, at least as long as this user base is much larger than your test sample. The second one is that the bigger your sample size is, the smaller the improvement you can expect from additional test users will be. Thus, going from 10 to 110 is a huge improvement, going from 1000 to 1100 not so much.

That's why opinion polls often have samples of about 1000 participants, even when the population of interest includes several millions people. In fact, the sample size for a pre-election poll will typically be very similar in countries with 5, 80 or 200 millions inhabitants. As long as your sample is random and the population is much larger, it does not matter if you are asking only 1%, 0.1% or 0.00001% of the total number of voters.

Both of these conclusions are still true for other things than percentages: for example comparisons between ratings on a satisfaction questionnaire or analyses of the time it takes to complete a task. If you want to go futher, one good starting point is Jeff Sauro's website http://www.measuringusability.com/