Statistics: Power from Data!

Organizing data

Stem and leaf plots

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

A stem and leaf plot, or stem plot, is a technique used to classify either discrete or continuous variables. A stem and leaf plot is used to organize data as they are collected.

A stem and leaf plot looks something like a bar graph. Each number in the data is broken down into a stem and a leaf, thus the name. The stem of the number includes all but the last digit. The leaf of the number will always be a single digit.

Once you have decided that a stem and leaf plot is the best way to show your data, draw it as follows:

On the left hand side of the page, write down the thousands, hundreds or tens (all digits but the last one). These will be your stems.

Draw a line to the right of these stems.

On the other side of the line, write down the ones (the last digit of a number). These will be your leaves.

For example, if the observed value is 25, then the stem is 2 and the leaf is the 5. If the observed value is 369, then the stem is 36 and the leaf is 9. Where observations are accurate to one or more decimal places, such as 23.7, the stem is 23 and the leaf is 7. If the range of values is too great, the number 23.7 can be rounded up to 24 to limit the number of stems.

In stem and leaf plots, tally marks are not required because the actual data are used.

Each morning, a teacher quizzed his class with 20 geography questions. The class marked them together and everyone kept a record of their personal scores. As the year passed, each student tried to improve his or her quiz marks. Every day, Elliot recorded his quiz marks on a stem and leaf plot. This is what his marks looked like plotted out:

Table 1. Elliot's scores on the basic facts quiz last year

Stem

Leaf

0

3 6 5

1

0 1 4 3 5 6 5 6 8 9 7 9

2

0 0 0 0

Analyse Elliot's stem and leaf plot. What is his most common score on the geography quizzes? What is his highest score? His lowest score? Rotate the stem and leaf plot onto its side so that it looks like a bar graph. Are most of Elliot's scores in the 10s, 20s or under 10? It is difficult to know from the plot whether Elliot has improved or not because we do not know the order of those scores.

Try making your own stem and leaf plot. Use the marks from something like all of your exam results last year or the points your sports team accumulated this season.

The main advantage of a stem and leaf plot is that the data are grouped and all the original data are shown, too. In Example 3 on battery life in the Frequency distribution tables section, the table shows that two observations occurred in the interval from 360 to 369 minutes. However, the table does not tell you what those actual observations are. A stem and leaf plot would show that information. Without a stem and leaf plot, the two values (363 and 369) can only be found by searching through all the original data—a tedious task when you have lots of data!

When looking at a data set, each observation may be considered as consisting of two parts—a stem and a leaf. To make a stem and leaf plot, each observed value must first be separated into its two parts:

A teacher asked 10 of her students how many books they had read in the last 12 months. Their answers were as follows:

12, 23, 19, 6, 10, 7, 15, 25, 21, 12

Prepare a stem and leaf plot for these data.

Tip: The number 6 can be written as 06, which means that it has a stem of 0 and a leaf of 6.

The stem and leaf plot should look like this:

Table 2. Books read in a year by 10 students

Stem

Leaf

0

6 7

1

2 9 0 5 2

2

3 5 1

In Table 2:

stem 0 represents the class interval 0 to 9;

stem 1 represents the class interval 10 to 19; and

stem 2 represents the class interval 20 to 29.

Usually, a stem and leaf plot is ordered, which simply means that the leaves are arranged in ascending order from left to right. Also, there is no need to separate the leaves (digits) with punctuation marks (commas or periods) since each leaf is always a single digit.

Using the data from Table 2, we made the ordered stem and leaf plot shown below:

The organization of this stem and leaf plot does not give much information about the data. With only one stem, the leaves are overcrowded. If the leaves become too crowded, then it might be useful to split each stem into two or more components. Thus, an interval 0–9 can be split into two intervals of 0–4 and 5–9. Similarly, a 0–9 stem could be split into five intervals: 0–1, 2–3, 4–5, 6–7 and 8–9.

The stem and leaf plot should then look like this:

Table 5. Number of drives to work in 10 days

Stem

Leaf

0(0)

0 0 1 2 3 3 4

0(5)

5 5 7 7 8 9 9 9

Note: The stem 0(0) means all the data within the interval 0–4. The stem 0(5) means all the data within the interval 5–9.

Prepare an ordered stem and leaf plot. Make a brief comment on what it shows.

Redraw the stem and leaf plot by splitting the stems into five-unit intervals. Make a brief comment on what the new plot shows.

Answers

The observations range in value from 10 to 39, so the stem and leaf plot should have stems of 1, 2 and 3. The ordered stem and leaf plot is shown below:

Table 6. Laps swum by Britney in 30 days

Stem

Leaf

1

0 8 9

2

0 1 2 2 4 4 4 5 5 6 6 6 7 7 7 7 8 8 8 8 8 9 9

3

1 1 2 9

The stem and leaf plot shows that Britney usually swims between 20 and 29 laps in training each day.

Splitting the stems into five-unit intervals gives the following stem and leaf plot:

Table 7. Laps swum by Britney in 30 days

Stem

Leaf

1(0)

0

1(5)

8 9

2(0)

0 1 2 2 4 4 4

2(5)

5 5 6 6 6 7 7 7 7 8 8 8 8 8 9 9

3(0)

1 1 2

3(5)

9

Note: The stem 1(0) means all data between 10 and 14, 1(5) means all data between 15 and 19, and so on.

The revised stem and leaf plot shows that Britney usually swims between 25 and 29 laps in training each day. The values 1(0) 0 = 10 and 3(5) 9 = 39 could be considered outliers—a concept that will be described in the next section.

Prepare an ordered stem and leaf plot for the data. Briefly comment on what the analysis shows.

Answer

In this case, the stems will be the whole number values and the leaves will be the decimal values. The data range from 56.3 to 65.7, so the stems should start at 56 and finish at 65.

Table 8. Weights of 30 students

Stem

Leaf

56

3

57

58

4 4 9

59

0 0 2 3 8

60

0 2 4 5 7 8 9

61

1 2 4 4 5 6 7 9 9

62

1 2 3 7

63

64

65

7

In this example, it was not necessary to split stems because the leaves are not crowded on too few stems; nor was it necessary to round the values, since the range of values is not large. This stem and leaf plot reveals that the group with the highest number of observations recorded is the 61.0 to 61.9 group.

An outlier is an extreme value of the data. It is an observation value that is significantly different from the rest of the data. There may be more than one outlier in a set of data.

Sometimes, outliers are significant pieces of information and should not be ignored. Other times, they occur because of an error or misinformation and should be ignored.

In the previous example, 56.3 and 65.7 could be considered outliers, since these two values are quite different from the other values.

By ignoring these two outliers, the previous example's stem and leaf plot could be redrawn as below:

Table 9. Weights of 30 students except for outliers

Stem

Leaf

58

4 4 9

59

0 0 2 3 8

60

0 2 4 5 7 8 9

61

1 2 4 4 5 6 7 9 9

62

1 2 3 7

When using a stem and leaf plot, spotting an outlier is often a matter of judgment. This is because, except when using box plots (explained in the section on box and whisker plots), there is no strict rule on how far removed a value must be from the rest of a data set to qualify as an outlier.

When you assess the overall pattern of any distribution (which is the pattern formed by all values of a particular variable), look for these features:

number of peaks

general shape (skewed or symmetric)

centre

spread

Number of peaks

Line graphs are useful because they readily reveal some characteristic of the data. (See the section on line graphs for details on this type of graph.)

The first characteristic that can be readily seen from a line graph is the number of high points or peaks the distribution has.

While most distributions that occur in statistical data have only one main peak (unimodal), other distributions may have two peaks (bimodal) or more than two peaks (multimodal).

Examples of unimodal, bimodal and multimodal line graphs are shown below:

General shape

The second main feature of a distribution is the extent to which it is symmetric.

A perfectly symmetric curve is one in which both sides of the distribution would exactly match the other if the figure were folded over its central point. An example is shown below:

A symmetric, unimodal, bell-shaped distribution—a relatively common occurrence—is called a normal distribution.

If the distribution is lop-sided, it is said to be skewed.

A distribution is said to be skewed to the right, or positively skewed, when most of the data are concentrated on the left of the distribution. Distributions with positive skews are more common than distributions with negative skews.

Income provides one example of a positively skewed distribution. Most people make under $40,000 a year, but some make quite a bit more, with a smaller number making many millions of dollars a year. Therefore, the positive (right) tail on the line graph for income extends out quite a long way, whereas the negative (left) skew tail stops at zero. The right tail clearly extends farther from the distribution's centre than the left tail, as shown below:

A distribution is said to be skewed to the left, or negatively skewed, if most of the data are concentrated on the right of the distribution. The left tail clearly extends farther from the distribution's centre than the right tail, as shown below:

Centre and spread

Locating the centre (median) of a distribution can be done by counting half the observations up from the smallest. Obviously, this method is impracticable for very large sets of data. A stem and leaf plot makes this easy, however, because the data are arranged in ascending order. The mean is another measure of central tendency. (See the chapter on central tendency for more detail.)

The amount of distribution spread and any large deviations from the general pattern (outliers) can be quickly spotted on a graph.

A stem and leaf plot is a simple kind of graph that is made out of the numbers themselves. It is a means of displaying the main features of a distribution. If a stem and leaf plot is turned on its side, it will resemble a bar graph or histogram and provide similar visual information.

Prepare an ordered stem and leaf plot for the data and briefly describe what it shows.

Are there any outliers? If so, which scores?

Look at the stem and leaf plot from the side. Describe the distribution's main features such as:

number of peaks

symmetry

value at the centre of the distribution

Answers

A test score is a discrete variable. For example, it is not possible to have a test score of 35.74542341....

The lowest value is 4 and the highest is 67. Therefore, the stem and leaf plot that covers this range of values looks like this:

Table 10. Math scores of 41 students

Stem

Leaf

0

4

1

8 9

2

3 4 6

3

1 2 4 5 5 7 9

4

0 1 2 3 4 5 5 8 9

5

0 0 0 1 1 2 3 4 4 5 5 6 7 7

6

0 2 3 5 7

Note: The notation 2|4 represents stem 2 and leaf 4.

The stem and leaf plot reveals that most students scored in the interval between 50 and 59. The large number of students who obtained high results could mean that the test was too easy, that most students knew the material well, or a combination of both.

The result of 4 could be an outlier, since there is a large gap between this and the next result, 18.

If the stem and leaf plot is turned on its side, it will look like the following:

The distribution has a single peak within the 50–59 interval.

Although there are only 41 observations, the distribution shows that most data are clustered at the right. The left tail extends farther from the data centre than the right tail. Therefore, the distribution is skewed to the left or negatively skewed.

Since there are 41 observations, the distribution centre (the median value) will occur at the 21st observation. Counting 21 observations up from the smallest, the centre is 48. (Note that the same value would have been obtained if 21 observations were counted down from the highest observation.)