How to use Likert scales effectively

Questionnaire surveys using Likert scales are one of the most popular research designs in language education research. They are simple and straightforward, and when done properly, they can produce lots of very useful information about language teaching and learning. The problem is that, once you have administered the questionnaire, there’s no way of going back to the respondents and asking follow-up questions, or clarifying the wording of your questions.

To help avoid potential problems, in this post I discuss three topics, which the students I advise sometimes find challenging. Specifically, we will look how we can elicit information using:

Questionnaire surveys are simple, straightforward, and efficient, provided you avoid common mistakes. (Photo by Lukas on Pexels.com)

What are Likert items?

Likert items (and scales; see below) are very good at measuring constructs like beliefs and attitudes. A Likert item consists of a statement followed by a list of possible responses. The list is bivalent and symmetrical, and the responses are often anchored to numerical descriptors. Here’s an example:

Example 1The next Doctor Who should be cast as a female role.1=Strongly Agree, 2=Agree, 3=Not sure, 4=Disagree, 5=Strongly Disagree

Likert scales must be bivalent and symmetrical

In this example, the item consists of a statement and five response options. The list of options is bivalent: This means that it extends in the directions of both ‘agreement’ and ‘disagreement’. The responses are symmetrically arranged around a neutral value (‘not sure’). In some Likert items, the neutral value might not be explicitly stated (scroll down to see why), but the list is still symmetrical. Note however, that scales with percentages, or scales ranging from ‘never’ to ‘always’ are not Likert scales.

Some statisticians argue that the response options such as the one shown above are evenly spaced (equidistant), or that we can at least pretend that they are. This makes intuitive sense, and it is necessary for running a number of useful statistical tests. For instance, we can then calculate the mean, or weighted average, of the responses.

However, to do this, you would need to make three assumptions. Specifically, you must assume that:

psychological constructs can be measured with precision;

such precision can be linguistically mapped; and

all respondents will interpret the descriptors in a similar way.

All this seems a lot like wishful thinking to me, and I would rather adjust research methods to reality, rather than visa-versa.

It is much safer to treat the information that these scales produce as ordinal data for the purposes of analysis. This means that we can calulate the frequency of each response (e.g., how many people strongly agree), we can add responses (e.g., how many people express some form of agreement, whether it is strong or moderate), and we can calculate percentages. We can also calculate the central tendency using the median, and the spread of responses, using the total and interquartile ranges of responses (here’s how to do this). For most research in language education, this is quite enough.

Likert items produce data that are easy to both use and abuse (Photo by Pixabay on Pexels.com)

Likert items work best in groups

Likert items are very sensitive to wording

Like most quantitative methods, Likert items can efficiently generate lots of data. On the other hand, these data can be misleading, because the questions are very sensitive to the wording of the items.

For example, there is strong empirical evidence showing that support for free speech in the US is much higher when the questions contain the word ‘forbid’ rather than ‘not allow’ (a phenomenon known as the ‘forbid/allow asymmetry’). Even though the words are logical opposites, they elicit different responses: Participants generally object to ‘forbidding’ free speech, but they are less strongly opposed to ‘not allowing’ some forms of expression.

To moderate for the effect of item wording, it is best to use several variants of the same item in a questionnaire, and derive a composite score from the responses. Here’s one way to do this:

A cluster of such related items, which probe the same underlying construct, produces a Likert scale. The items that make up a Likert scale, by the way, don’t need to be presented together in your questionnare. In fact, you might find it better to spread them out across the questionnaire, so that their sequencing does not influence responses. In Example 2, I have used random numbers before each item, to simulate spread in a larger questionnaire.

Preparing a scale for analysis

To ensure that all the items in the Likert scale measure the same construct, it is necessary to pilot the scale with a relatively large number of participants. We can then calculate the internal consistency of the scale (using a metric called Cronbach’s alpha), and eliminate any items that do not work well. Looking above, item 12 is suspect, because it seems to measure behaviour rather than attitudes. If piloting suggests that the scale works better without it, then we would have to remove the item from the final version of the questionnaire (or, depending on when we found out, we might have to remove the data it produced from our calculations).

Deriving a composite score

Deriving the score of a Likert scale involves three steps. First, we reverse any negatively worded items. Number 17, above, is negatively worded, so we will code ‘Strongly disagree’ as 4, and so on. Next, we remove from the scale any item that systematically generates different responses from the others (see previous paragraph). Finally, we add the score that was produced – in this case, a number ranging from 4 to 16. Alternative techniques, like assigning each response a value from 0 to 3, or from -2 to +2 are also fine, but it is important to be transparent about what you did, so make sure you document every step of the process and report it in your ‘methods’ section.

When designing a questionnaire, how many options should you give respondents? (Photo by rawpixel.com on Pexels.com)

Using forced choice

Most commonly, Likert items contain five (or seven) options, which are arranged around a neutral response such as ‘neither agree or disagree’. This beautifully symmetric format can give rise to the ‘central tendency bias’, which is what happens when participants systematically select the uncontroversial middle option. This might happen because of respondent fatigue, or sometimes it is a deliberate strategy by respondents who want to avoid committing to an opinion. Either way, such responses give very few usable insights, so we may want to discourage them.

One of the simplest ways to counteract the central tendency bias is to use scales with an even number of responses. In Example 2, above, I used forced-choice (or ipsative) items. These are items from which the ‘neutral’ option has been removed, leaving an even number of options. This forces participants to either agree or disagree with the statement. Forced-choice items with a small number of responses are very effective in eliciting attitudes that participants might otherwise feel inclined to suppress.

Further reading

I have written extensively about Likert scaling in this blog. Some relevant posts are:

If you arrived here while preparing for a student project, I wish you good luck with your work. You may want to use the social sharing buttons at the end of the post to share this content with other students who might find it useful. If you have any other questions that I might be able to answer, feel free to ask by posting a comment or using this form.