I'm not sure what's to be gained from using negative numbers, what's your logic for considering this? Interesting that you've chosen to avoid the neutral middle...thus forcing a choice.
–
Roger AttrillJul 5 '11 at 15:15

The respondent will never see the negative numbers. In the survey form they will be simply as options 1,2,3,4. However, like I mentioned, each question is categorized into 8 different impact areas. The impact area will get a final calculated score based on the responses. i.e if EVERYONE answers "strongly disagree" for all the questions in one area the avg score for that impact area will be '-2' if I use negative numbers and '1' if I use 1,2,3,4 as the scores. I would like to know which one is more correct mathematically and statistically.
–
VishadJul 7 '11 at 6:49

4 Answers
4

Don't show the numbers to the test takers; it will only confuse them. But you may want to use them internally.

Balanced keying (using an approximately equal amount of positively and negatively keyed items) is often used in psychometry and is considered good practice. It allows you to approach a topic from different perspectives.

Some sample items (measuring extraversion):
+ keyed: Feel comfortable around people; Make friends easily; Am skilled in handling social situations.
– keyed: Have little to say; Keep in the background; Don't like to draw attention to myself.

With a four-point scale, you'll probably find that option 3 will collect most of the 'neutral' or 'don't know/don't want to tell/don't understand' responses that would otherwise go into the middle option. For analysis, you can just assign scores 1 (strongly disagree) to 4 (strongly agree) for positively keyed items, and reverse them for negatively keyed items (4 for strongly disagree and 1 for strongly agree). If you allow to skip items, it makes sense to use a scale with a mean of 0 so you can count any non-answered questions as 0 while still using simple addition to calculate the total score.

As for the number of categories in your Likert scale: fewer categories allow for faster decisions and faster completion times of the questionnaire, more categories are slower but people will (subjectively) feel that they can express their feelings more precisely.

Addition:

Something to keep in mind with Likert scales is that they are ordinal scales, not interval scales. It often works quite well though if you assign numerical values to the categories and use them to calculate a total score. There are alternative methods available for working with polytomous items and calculating totals, such as item response theory, taking into account individual item characteristics, but they're not as clear and practical as simple addition.

The main difference between a scale scored from 1 to 4 (or a shifted version: -2.5, -1.5, +.5, +1.5) and your proposed idea of -2 to +2 is the width of the middle interval, between disagree and agree. With a normal (small) middle interval it takes 3 disagrees to offset 1 strong agree (3*2+4=10; mean 2.5); with a larger middle interval it only takes 2.

With Likert scales people (especially introverts) tend to avoid the extreme categories (I did a research project on this, and it was striking how often for the same question people would answer 'yes, absolutely!' in natural language while in the Likert scale questionnaire they didn't choose 'strongly agree'). If you give the extreme categories less weight, that could influence the totals.

Here's an example: total scores for a 5-question extraversion scale calculated with both methods and plotted against each other (for 2700 people):

The scores don't fit perfectly and the differences are statistically significant, but for most of the scores it doesn't make a large difference. The maximum difference on the total score here is only 2.5 (and if you adjust for the different ranges only 2). Looking at the score distribution (range adjusted again) you see the same pattern:

So in short: if you want to use a coding scheme -2/-1/+1/+2, go ahead; it probably won't matter for the end results. The extreme response categories will count a little less, and the range of possible scores differs from the other scheme. Results will differ, but you can expect the difference to be small. (However, I'd go with the simplest scheme.)

Short answer (can't post pictures in comments): could be quite useful in practice, I'm not sure if I like it. It basically compares the mean score from your results with another normal distribution (combining the standard deviation from your results with a chosen cut-off point).
–
MarielleJul 20 '11 at 14:20

yeah i dont understand how and why they choose 80% as a benchmark. Any idea or links on how to choose an appropriate benchmark in case I wish to apply this method? Also, why dont you like it?
–
VishadJul 20 '11 at 14:43

The 80% part as a cut-off makes sense to me, for most kinds of satisfaction surveys. If you ask people to agree/disagree with 'I'm satisfied with this product', an average score of 3 on a 5-point scale is not a good result, you'll need at least 4.
–
MarielleJul 20 '11 at 23:00

Building on Marielle's excellent answer, I just want to add that you should be careful when assigning numerical values to a scale that is still basically qualitative in nature. That doesn't mean you shouldn't, you should just be careful how you use those values. When you have a value, you can calculate things like percent changes between two measurement periods.... but these percent changes, and many other statistics, are sensitive to what numbers you arbitrarily assign to the scale.

As an example, an insurance company was considering awarding a special bonus for x% improvement in customer satisfaction scores for each agent. They used a mean of 1->5 satisfaction scores to rank the employees, until somebody pointed out that going from a 1 to a 2, or going from utterly dismal to less dismal, was a 100% increase, while going from a 4 to a 5, from great to flawless, was only a 25% increase. Arguably, the second employee was better deserving of a bonus, but they only looked 1/4 as good on paper. Basically, their use of numerical scales and means wasn't working for what they were trying to do.

Because of these difficulties, I prefer to avoid assigning numerical values to responses if I can get away with it - but I can't always.

+1 Lies, damn lies and statistics...wish I was the first to say that.
–
Ray MitchellJul 6 '11 at 20:43

Thanks for your response. Please see my response to @Roger Attrill above. i wont really be calculating improvement or change, just a score. the problem with the negative scores is that the distance between options 2 and 3 is is two units and the distance between others is one unit. But my logic for considering the negative scores is that I wanted the "disagree" responses to subtract from the score and be able to show a negative final average score if most people answer this way. not sure if its mathematically correct though. @Marielle @Gary Franceschini
–
VishadJul 7 '11 at 7:10

Neither is more "correct" from a mathematical perspective. I see your point that you are varying the difference in scores between each verbal anchor, and that probably would be more likely to raise questions than the other method. But "agreement" is not a fundamentally numeric measure and any numerical assignment is arbitrary. People don't necessarily see these verbal anchors as equally spaced from each other, and they don't necessarily see their mental "0" point in the same place as others would. Have you thought about reporting top-2-box (Agree, Strongly Agree) percentage?
–
JonathanJul 7 '11 at 17:51

This is all just a question about reporting - while it's best to decide on a reporting method before you run the survey, you can always try all three methods after you have your data (1->4 mean, -2 -> +2 mean, T2B%) and see how they vary.
–
JonathanJul 7 '11 at 17:56

A Likert scale gains nothing from the use of negative numbers. In fact, you are likely introducing a bias into your responses because the negative numbers will create an emotional response - driving users to selecting the most moderate of the positive responses (partially due to central tendency bias).

A couple of things:
1/ Your scale is lacking the neutral value in your Likert scale (Neither agree nor disagree) as most such scales are based on five points.
2/ You can add values to each scale item but indeed, you should not display these to your user / respondent at all as they could induce a perspective.
Often, such values can be negative in survey methodologies (market research, opinion polls, ...):
Strongly disagree (2)
Disagree (1)
Neither agree nor disagree (0)
Agree (-1)
Strongly agree (-2)

The above is generally used to analyse the data as a numerical variable rather than a closed question (thus allowing the analyst to apply Mean and such other statistical calculations)