Measuring SLOs: The Importance of Scaling

A while ago, I was assisting a career center team with one of their assessment projects and the topic of scaling came up. We had a really fun and important conversation about the design of the scales (ok, call me an assessment nut!). We wrestled with questions, such as:

What specifically are we trying to measure?

What would be the appropriate scale to use?

What about the intervals on the scale?

What about the anchor points, and how are they defined?

Does the question match the scale?

Will the scale deliver interpretable information?

The team recognized how important it is to answer these questions in advance to launching the project and collecting the data on the student learning outcome (SLO). This highlighted the importance of scaling.

The scale(s) used for an SLO project will directly influence how data is reported; this is why it is important to get it dialed in. The scale essentially standardizes into units multiple levels of the construct being measured. Careful scale construction allows you to make important interpretations from the data. Poorly designed scales can jeopardize your efforts. What I share here are just some ideas; there are courses on this topic and some academicians make this their specialty!

Let’s say we want to measure “career confidence.” Our hypothetical SLO is that we are measuring the career confidence of students after engaging in career counseling and learning how to make career decisions and plans. (Note: We have defined career confidence as being confident of one’s career decisions and career plans). We could develop a self-report rating scale that might look like this:

Using the scale below, rate your level of career confidence as of today:

0 ------------- 1 ------------- 2 ------------- 3 ------------- 4

0 = I don’t feel confident with my career decisions and/or career plans. I feel lost and I need help in deciding what to do with my major and my career.1 = I feel slightly confident of my career decisions and/or career plans. I feel confused between several options, and I need help in deciding what to do with my major and my career.2 = I feel confident of my career decisions and career plans. I know what I want to do, but I have questions about how to make it all happen. I could use some help.3 = I feel very confident of my career decisions and career plans. I know what I want to do and how to make it all happen. I may need some assistance with some skills to help me out.4 = I feel extremely confident of my career decisions and career plans. I know what I want to do and how to make it all happen. At this time, I do not feel like I need any assistance.

Notice how at each level there is sufficient description that defines (operationalizes) that level. This is done to help the rater (in this case, a student who has participated in career counseling) to answer the question.

Also notice that the scale starts at zero: this is done with the assumption that when one feels no confidence, that state should be measured as zero confidence. Frequently, scale developers start scales at “1,” but mathematically, the scale actually starts at zero because the construct of interest theoretically also has a “zero point.” Any value above zero and below 1 is still interpretable. This allows one to interpret the data more accurately because the construct is anchored at zero, “no confidence.” What happens if you run this scale in a pre-post design, assessing career confidence when students initially arrive for their first appointment of career counseling and then again, after three sessions? The change in scores would then reflect student learning (they feel more confident with their career decisions and plans).

Scales that have intervals that are close for some levels, but that are further apart for other levels introduce interpretation “noise.” Additionally, the mid-point of the scale ought to represent a theoretical mid-point of the construct. For example, the scale on the left, below, has three potential issues:

It starts at “1” and therefore may not account for the construct’s actual zero state.

The second and third levels (“not very important” and “somewhat important”) are measuring virtually the same thing

The mid-point of the scale (No. 3) conceptually is very close to No. 2 but very far from No. 4.

All of these issues make interpreting data with this scale a potential challenge! The scale on the right attempts to resolve these concerns. Small differences in scale development can yield big interpretative gains.

What if we want to measure “appropriate professional attire” at a career fair? Since we are measuring an SLO, we are observing the students with whom we’ve worked. Let’s assume we have run a two-session “Dress for Success” workshop, complete with a fashion consultant and a practice “Dress for Success” event. We know the student participants and, as they check in for the actual career fair, we rate their professional attire. Notice the deliberate levels of the scale below, and the attempt to create “equal” intervals along the scale.

Rate each “Dress for Success” workshop participant’s career fair attire, using the scale below:

One last point: By using scales without definitions to operationalize each level, you run the risk of missing out on interpretable data. For example, researchers frequently use seven-point scales because of the variability in the data. Variation is great, for sure, but what often happens is that only the upper and lower ends of the scale are defined, and the rater (student or observer) has to figure for themselves what the other levels might mean. A rating of “4” for you may mean something entirely different than my rating of “4,” and therefore the data is not the same! Even with a mid-point definition, the other non-defined values have sufficient variability in meaning that the data will be difficult to adequately interpret.

To remedy this, define each level carefully. Test out the levels. Have colleagues try the scale and then run the data: Does it do what you want? Refine the scale and try it again! There’s nothing wrong in letting your rater know what each level means; this is why rubrics are such good assessment tools! I’ll have more about rubrics in a future column.