Frequency distributions, Characteristics of the main...

12.4. Frequency distributions

Now let's take a more detailed look at the individual procedures that make up the basic data analysis: the procedures for calculating frequency distributions ( frequency distribution ) and cross tabulation tables ( cross-tabulation ). After that, we'll show you how, using these procedures, we test the statistical hypotheses ( hypothesis testing ) about relationships and differences.

Let's start with the calculation of frequency distributions. It allows you to give an answer, for example, to the following questions:

- what is the number and share of loyal (loyal) consumers of the brand from the number of all its consumers?

- what is the number and proportion of the representatives of the population under study, well, medium, little and not at all knowledgeable about the new product of the firm?

- what is the market share of heavy, medium, light users and non-users of the product?

- Is it significant whether these shares measured during the survey differ from some fixed values, outlined by the leaders of the company?

- what is the distribution of the income of consumers of a particular brand? Is it true that it is biased towards relatively low incomes?

In the software package SPSS , the calculation of the frequency distributions is performed by the Frequencies command (menu Analyze> Descriptive Statistics → Frequencies ).

Example 12.1

Distribution of answers from former clients of the fitness center

Consider the distribution of answers from people who stopped attending the fitness center, to the question of how long they usually spent there when they visited it (Table 12.5).

Table 12.5. Distribution of respondents' answers to the question: "How much time did you usually spend in the fitness center?", H

Values ​​

Response options, h

Frequency (frequency with which the value met)

Percent (percentage of the number of all values)

Valid Percent (percentage of the number of allowed values)

Cumulative Percent (percentages allowed by the cumulative total)

Valid (valid values)

, 50

1

, 5

, 5

, 5

1.00

15

7.0

7.1

7.5

1.50

34

15.9

16.0

23.6

1.75

4

1.9

1.9

25.5

2.00

75

35.0

35.4

60.8

2.20

1

, 5

, 5

61.3

2.25

1

, 5

, 5

61.8

2.30

1

, 5

, 5

62.3

2.50

26

12.1

12.3

74.5

2.75

1

, 5

, 5

75.0

3.00

39

18.2

18.4

93.4

3.50

5

2.3

2.4

95.8

4.00

8

3.7

3.8

99.5

5.00

1

, 5

, 5

100.0

Total

212

99.1

100.0

Missing (Missing values)

System

2

, 9

Total (total)

214

100.0

We see that 214 respondents were interviewed. Two of them did not appreciate the typical duration of their stay at the fitness center. This is reflected in the corresponding column of the data table with the inscription System - system pass. Two hours were usually spent in the fitness center of 75 respondents, which amounted to 35.0% of the total number of respondents, or 35.4% of the number of respondents to the question.

The data shown in the table becomes clear due to the frequency diagram (Figure 12.7), which is also possible in the Frequencies ( Charts tab).

>

Fig. 12.7. Schedule of frequency distribution of respondents' answers to the question about their time in the fitness center, h

Knowing the frequency distribution, we can calculate the statistical characteristics of the studied variable, i.e. answers to a certain question of the questionnaire. There are three types of these characteristics:

o characteristics of the main trend in the values ​​of the indicator : mod, median, mean;

o characteristics of the form of distribution of the values ​​of the indicator : asymmetry, kurtosis.

Characteristics of the main trend in the responses

To identify the main trend in the answers to the question, it means to generalize how respondents responded generally, what values ​​this variable usually takes. For this, three characteristics can be used: mode, median and mean. SPSS can calculate any of these characteristics for any numeric variable. Which of these characteristics can really be used depends on what kind of data (nominal, ordinal, interval or proportional) we are dealing with (Table 12.6).

Table 12.6. Indicators that can serve as characteristics of the main trend depending on the type of scale

Data Scale Type

Characteristics of the main trend in the answers

Fashion

Median

Average value

Nominal

+

Ordinal

+

+

Interval

+

+

+

Proportional

+

+

+

Here are the results of calculating these values ​​(Table 12.7) in the SPSS program complex (in the Frequencies command in the Statistics subcommand of the Mean , Median, Mode ).

Table 12.7. Statistical characteristics of the main trend in respondents' answers to the question about their time at the fitness center

Fashion

(Mode)

Median

(Median)

Average value

(Mean)

How much time did you usually spend in the fitness center? (h)

2.0

2.0

2.2

Fashion is the variant of the answer that occurred more often than others (the value of the variable that it takes more often than the rest of its values). On the frequency chart, this value corresponds to the highest peak. For example, in Fig. 12.7 the mode is 2.00 (hours). Thus, the mode does not reflect the frequency of choice of other variants of the answer, i.e. little informative. Therefore, it can be considered a good characteristic of the main trend only for nominal characteristics. Indeed, for them other, more informative characteristics of the main trend are not applicable.

Median is the value that divides the sample ordered by increasing the studied variable into two equal parts: one half of the observations lies below the median and the other half is higher. Suppose first that the number of observations is odd, for example, 101. Then the median will be called the 51st order in the ordered series. If the number of observations is even, for example 100, then the median is calculated as the average of the two values ​​of the ordered series - 50th and 51st. In the first case, the median coincides with the value of the variable "middle" of the respondent (51st), and in the second - with the average of two values ​​"middle" pairs of respondents (50th and 51st).

Actually, it is not necessary to renumber all the responding respondents to calculate the median. It is enough to find out on the basis of the distribution of answers, where the middle respondent or middle a couple of respondents. To do this, you need to know the answer to 50% of the respondents in the Cumulative Percent column - the percentages allowed by the cumulative total (see Table 12.5).

Let's explain this procedure with the example of the above table. The number of respondents who answered the question is even (212). Judging by the data of the last column of the table, 25.5% (the nearest to 50% fewer) of the number of these respondents gave answers to 0.5, 1.00, 1.50 and 1.75. And the answers 0.5, 1.00, 1.50, 1.75 and 2.00 have already given 60.8 (the closest to 50% is the greater number). It does not matter to us who of the 212 respondents who responded will be enlisted in the pair, which was discussed above, but in this case it is clear that they both chose the answer 2.00. And half the sum of "twos", of course, is also equal to "two", that is, the median is 2.00.

Note one nuance associated with the concept of the median. In some cases, if in the middle of an ordered series there are many coinciding values, i.e. data concentrated, the researchers prefer to use not the usual, but the so-called refined median (in the Frequencies command in the Statistics subcommand, the Values ​​are group midpoints , Figure 12.8).

In our example, "two hours answered 75 respondents.

The idea of ​​this calculation is as follows. 212 respondents answered the question about the length of stay in the club as follows:

54 respondents said they had been at the club for less than two hours;

o 75 respondents - exactly two hours;

o 83 respondents - more than two hours.

Fig. 12.8. Selecting options for calculating the refined median

If you number all respondents by increasing the length of their stay in the club, then "middle" a pair of respondents standing in an ordered row at 106th and 107th places will be located closer to the end of the group of respondents "two hours" than to the beginning. Let us explain what has been said by the following figure (Figure 12.9).

Fig. 12.9. A diagram illustrating the idea of ​​calculating a refined median

From the beginning of the group 2 h from 75 respondents to middle pairs is located 105 - 54 = 51 respondent (105 - 54), and after this pair to the end of the group 105 - 83 = 22 respondents (105 - 83). In other words, the refined median is stronger attracted values ​​that are greater than the values ​​that are smaller. Therefore, the refined median should be somewhat larger than two hours. In this case, its value is 2,076 hours. We will not give the calculation algorithm, since it is rather complicated [2].

The median, as already noted, is meaningless to consider if the variable is nominal. It serves as a good characteristic of the main trend in responses if the measurement is made on an ordinal scale, when, for example, the difference between the variants of answers No. 1 and No. 2 may be quite different than the difference between the variants of answers No. 2 and No. 3. Recall that in ordinal scales, the value of the values ​​has no meaningful meaning, it is important only that one of them is larger, less than the other, or the values ​​coincide. For example, this is due to the fact that if it were a question of a place to which a respondent would put in his preferences a certain kind of candy, then one of the respondents might well have been a monogamous person, i.e. love only one grade of chocolates, put them in first place; sweets, put them on the second, third, etc. places, he can almost equally not love and not eat and only at the request of the interviewer ranked. Therefore, for ordinal scales, the median's advantage over the average value (which we will soon turn to be considered) is undeniable: the median does not take into account the magnitude of the values ​​of the studied variable for the respondents in the row to the right and left of the "middle" pairs of respondents or middle of the respondent. Only the total number of these and other values ​​is taken into account.

This property makes the median useful as an additional characteristic for both interval and proportional scales, especially if there are answers in the data that differ sharply from the main mass, so-called emissions ( outliers ), i.e. The values ​​of the variable are far from their main mass. (How emissions are determined, we will discuss in the next subsection.) For example, if income distribution is measured, it is useful to know the income level of the respondent in the middle of the welfare series. At the same time, it does not matter that a small number of very rich people got into the sample, whose income in the case of calculating the arithmetic mean will create the illusion of a higher prosperity in the whole population under study.

The average value is calculated by the formula

(12.1)

where n is the number of respondents who answered the question; Xi is the response, called the i-th respondent.

In the example we are considering, the average stay time of the respondents in the fitness center was 2.2 hours.

Using the mean value as a characteristic of the main trend in the answers makes sense only when using interval or proportional scales, i.e. when the difference between the values ​​of 1 and 2 is the same as between 2 and 3, etc.

At the same time, for such scales, the calculation of the mean value is sometimes supplemented by calculating the median. For example, in the example of income distribution, the average value is equal to the income that would have been obtained if all respondents had divided their incomes and divided them equally. The situation is quite fantastic. For example, if it turns out that an oligarch with an income two or three orders higher than all other respondents got into the sample, the average income for all respondents will increase substantially. But this increase can hardly be called a reflection of the main trend in the incomes of the representatives of the studied population.

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.