Researchers have an ethical responsibility for conducting high quality research. High quality research mandates that the instrument be developed professionally. Developing a questionnaire is not an easy task and it takes much time, thought, and effort by the researcher. I recently collaborated with two other senior researchers to develop a 23-item questionnaire. We worked intensively for two months on the questionnaire, conducted three separate pilot studies, and made nine drafts before we finally settled on a final copy of that questionnaire. If you develop your own questionnaire, be sure to allow plenty of time for planning, reviewing literature, and revising the instrument. Allow at least a month for developing a high-quality instrument.

The researcher has the responsibility to create an instrument in such a manner so that will be understandable to whomever will be taking the instrument. Anything that causes a participant to give a wrong answer on the instrument is called error. Researchers must be very careful to make the instrument as clear as possible to avoid error. This means that the researcher must spend considerable time carefully developing the instrument, revising the instrument, pilot testing the instrument, and correcting the instrument.

It is important to make the instrument as clear as possible so that no additional explanations, directions, or comments should be necessary beyond what is present on the questionnaire itself. This means the directions are clear, the items are well written, and the formatting is professional so that participants understand where to respond.

The first step in developing an instrument is to consider the population under study. A questionnaire for children in Primary 5 will be different than a questionnaire for university students, even if it is measuring the same variable, such as intrinsic motivation. To ensure that the instrument is understandable to the population, the instrument must have appropriate directions, clear and understandable language, and concepts that the participant can understand. For example, a researcher might be interested in whether a primary school teacher gives lessons on phonemic awareness. However, "phonemic awareness" is a technical term that a teacher might not understand. Because of this, a questionnaire item cannot read "Do you teach phonemic awareness?" Instead, the item should use language that the teacher will understand, such as "Do you teach lessons about the individual sounds within words?"

It takes time and effort by the participant to complete the instrument. When filling the instrument, participants get tired and rushed and this can also be a source of error, particularly on the last items of the instrument. Therefore, keep the questionnaire as short as possible while still including enough items to adequately measure the variables of interest. The best way to keep the questionnaire short is to only write items that measure the variables from the Research Questions and Research Hypotheses. Any irrelevant questions should be canceled.

Oftentimes, researchers try to measure their participants' perceptions of a phenomenon, which does NOT result in a valid research study. People can report on their own demographic characteristics (also called biodata), attitudes, beliefs, knowledge, feelings, and behavior. However, people can rarely report other people's attitudes, beliefs, knowledge, feelings, and behavior with 100% accuracy because it is impossible for us to really know somebody else's thoughts. For example, a teacher might think she knows how well her students enjoy a lesson, but it is much better to ask the students themselves how they feel about the lesson. Because a teacher does not really know what the students are thinking, asking teachers to report on students is a huge source of error. Therefore, researchers should always ask a person directly about their own thoughts. If a person is asked to report on something they have not directly experienced, such as a teacher reporting on student interest, then the topic should be rephrased to include either the word belief or perception: "Teachers' beliefs of student interest" or "Teachers' perceptions of student interest", not "Student interest." A researcher must find the most direct way to measure a variable to avoid error.

To illustrate the point that the key variables must be directly measured, I conducted a little research study where I administered a questionnaire to 42 students who were working on their Masters degree in education. I asked the masters students about their beliefs of a number of well established psychological findings. For example, high quality research has strongly demonstrated that giving material rewards to students actually decreases their interest in a subject (see the Self Determination Theory website or click here to read a story that illustrates this point). In the questionnaire, the masters students were to respond Yes or No to five questions about whether giving a reward to students would increase or decrease their interest in an activity. For example, one question read: "A child will enjoy an activity more if they receive money for doing it."

Were the masters students accurate in their beliefs? On the question above, the correct answer is NO, children do not enjoy an activity more if they receive money for doing it. (A child may enjoy the money, but their enjoyment in the activity will not improve. Instead, their motivational focus is on the money, not on the activity.) However, only 18% of the masters students correctly answered this question! If most masters students (82%) do not have accurate beliefs about all educational phenomenon, then it is impossible for students, parents, teachers, and other educational stakeholders to accurately report on their beliefs of educational phenomenon!

Asking participants about their beliefs of educational variables has so much error that any conclusions drawn from these types of studies are not valid! Instead of asking people to report their beliefs or to report about somebody else, find a way to directly measure each variable. To do this, each item must reflect the construct definition of the variable that you developed in Developing Instruments. More information on how to directly measure a variable will be described below.

Summary

Allow plenty of time to revise, pilot test, re-revise, and re-re-revise the instrument.

Make every aspect of the instrument as clear as possible to the particular populations of individuals who will complete the instrument. This will help participants avoid mistakes when completing the instrument.

Keep the instrument as short as possible to avoid participant fatigue, but long enough to ensure adequate measurement of the variables. To achieve this, only include items that measure variables from the Research Questions or Research Hypotheses as developed in Writing Research Questions and Hypotheses.

Find the most direct way to measure the key variables.

Steps in Developing an Instrument

When developing a questionnaire, follow the following guidelines.

Identify the key variables. Examine the key variables from the research questions or research hypotheses that were developed in Identifying Key Variables.

Define the key variables. A good definition of each variable is necessary in order to measure that variable. This was done in Developing Instruments.

Review the literature, which was described in Reviewing Literature. Read the Instruments section to determine how they measured the variable, and try to find the actual items if possible. Use this information as a guide in developing your items.

Consider the population, which was developed in Writing Sample. The researcher must keep in mind who will be taking the instrument. Children in primary 5 will need to take a different instrument than adults, even if the variable is the same (i.e., motivation).

Write a draft. More guidelines on writing the draft are given below.

Revise the draft. Read the draft from the perspective of the participant. What might be confusing to them? What might they misunderstand? Give the draft to somebody who is unfamiliar with your study. Ask them to read over the instrument and point out items or directions that are confusing.

Pilot the draft. Give the instrument to a handful of people who are similar to the population. Also ask these individuals to inform you of any part of the instrument that they are confused about.

Revise the draft, Revise the draft, Revise the draft.

The following guidelines are specific to developing a questionnaire. Most of these suggestions apply to a checklist and an interview, although a few specific comments about developing a checklist will be described below.

Developing Personal Information Items

Most questionnaires typically start with questions about the participant. Personal information can also be called demographic characteristics or biodata. Consider only those items that are essential to getting a good understanding of your sample and that answer your research questions. In the interest of keeping the questionnaire as short as possible, extra questions that are unrelated to the research questions or hypotheses should typically be canceled.

Do not simply copy the personal information items from another questionnaire. Oftentimes, this information will be completely irrelevant to the purposes of your study. Instead, consider which variables are relevant and capture information important about your study. For example, I just reviewed a questionnaire that was going to be given to a sample of university students. The questionnaire asked participants to tick whether they were a student or staff at the university. Obviously, if the questionnaire is designed for students, this item is unnecessary! I have also seen other questionnaires designed for Junior Secondary Students. In the Age item, they were given the options of ticking 15-20; 21-25, and 26 and above. Some JSS students are under 15, and virtually all others will fall in the 15-20 range. These items do not reflect thoughtful preparation by the researcher.

When developing personal information items, keep the following considerations in mind.

All response options must be exhaustive. This means that every participant will be able to find their response in one and only one category. For example, the age options listed above are not exhaustive because a 14 year old has nowhere to circle. A better list would include Under 10, 10-11, 12-13, 14-15, 16-17, 18 or older.

All response options must be mutually exclusive. With only a few exceptions, all categories should be mutually exclusive, meaning that each option is unique from the others. With the following options for age: 10-15; 15-20, 20-25, where would somebody age 15 tick? Would they tick in 10-15 or 15-20? Make sure that there is only one space for a person to tick.

All response options must have equal intervals. With numerical items such as age, ensure that the intervals are equal. With the following options for age: 10-12, 13-18, 19-21, the first option has a range of 2, the second has a range of 5, and the third has a range of 3. This will provide biased results. Also ensure that the intervals are small enough to get valuable information. For example, the options 11-15, 16-20, for JSS students would not provide valuable information. Instead, for this sample, use smaller intervals: Under 10, 10-11, 12-13, 14-15, 16-17, 18 or older. However, this smaller interval would not work for a population of adults. Therefore, if the sample covers a wide range of ages, then you should use larger intervals: 20 or under, 21-25, 26-30, 31-35, 36-40, etc. You can always lump categories together once you have the data, but you can never get more specific information once you have administered the questionnaire. Therefore, smaller intervals are preferable to larger intervals.

Response categories must be specific. Oftentimes, I will see an item that reads like this: What is your socioeconomic status? Low, Medium, High. How is a respondent to know whether they have low, medium, or high socioeconomic status? A participant needs to have clear information about how they are to respond. Variables like socioeconomic status require thoughtful consideration by the researcher. One way to measure socioeconomic status is by the type of house that a person lives in. What is your living situation? I live with relatives, I rent, I own land, I have a personal home made of mud, I have a personal home with 1-2 bedrooms, I have a personal home with 3 or more bedrooms. This might not be the best measure of socioeconomic status for your purpose, but participants are clear on how to answer this question.

Avoid the social desirability problem. To measure parents' educational level, some questionnaires ask, What is your mother's educational level? Illiterate, Literate. This item has two problems. First, it is ambiguous. What does 'illiterate' mean exactly? The mother cannot read a single word, or she cannot read a textbook? Second, this item is subject to the 'social desirability problem.' Participants want to put themselves and their family in a positive light when responding. This is called social desirability - they want to appear desirable to others. Because of this, participants might not give an accurate answer to avoid appearing undesirable. This is sometimes done on purpose, but other times done unconsciously. Who wants to admit that their parents are illiterate? However, rephrasing the question makes the response both clear and avoids the social desirability problem: What is the last level of schooling that your mother completed? None, Primary, Secondary, Tertiary.

Writing Questionnaire Items

This is the real substance of the questionnaire. Examine the list of variables with their accompanying construct definitions and operational definitions from the beginning of Developing Instruments. Each variable requires a number of items (typically between 4 to 10 but this can vary based on the variable and logistical considerations) that measure the variable directly and are directly related to the construct definition of the variable. For example, imagine a research study measuring classroom attendance. What is the most direct way to measure classroom attendance? Is it items on a questionnaire that ask participants whether they agree that people should attend class regularly? Or is it the record of classroom attendance from the school? As the construct definition of attendance would likely be "number of days that the student attends class," the latter method of measurement is most direct, and therefore most desirable.

As a second example, consider the variable intrinsic motivation, defined as "The motivation for a behavior that is based on the inherent enjoyment in the behavior itself." A researcher might be tempted to write an item that says, "I work hard in school" to measure intrinsic motivation. However, this item does not directly relate to the construct definition presented earlier. The researcher might argue that if a pupil enjoys going to school, then they will then work harder. However, there is a difference between "working hard" and "enjoying" something. A student might work hard in school because his father will beat him if he performs poorly, but that student might thoroughly hate school. This student would respond "Strongly Agree" to the item, but have low intrinsic motivation in reality. If a participant ticks Strongly Agree, that means they are high on the construct definition, and a participant who ticks Strongly Disagree should be low on the construct definition. Therefore, all items must center around the construct definition - enjoyment in the case of intrinsic motivation. A good item for intrinsic motivation might read: "I enjoy going to school."

Keep in mind the analogy of an examination: people who score higher on an exam should have a higher knowledge of the subject, whereas people who score lower on the exam should have less knowledge of the subject. Likewise, people who tick "strongly agree" should be higher on the variable of interest than those who tick "strongly disagree." If it is possible for a participant to be high on the variable, yet tick "disagree," then it is a bad item.

Remember that good questionnaire items must:

Measure the variable directly

Directly relate to the construct definition

When developing questionnaire items, AVOID:

Causes of the variable. For example, consider attitude toward educational psychology. A cause of attitude toward educational psychology might be how one feels about the course lecturer. You may be tempted to write an item: "I enjoy the educational psychology lecturer." It is possible for somebody to dislike educational psychology but like the lecturer, so this a BAD item. All items reflect the variable itself

Effects of the variable. Because a student who has a positive attitude is likely to attend class more, you may be tempted to write an item that says, "I always attend educational psychology class." It is possible for somebody to strongly agree to always attending educational psychology, but not like educational psychology class. Again, this is a BAD item

Double-barreled items. Keep every item focused on only one point. You may be tempted to write an item, "I enjoy educational psychology because it is relevant to teaching." There are two points to this item: I enjoy educational psychology and it is relevant to teaching. What would somebody tick if they like educational psychology, but not think it is relevant to teaching? This is a BAD item. Each item should only have one point

It is also possible to write items that are reverse coded. This simply means that the items are the reverse, or the opposite, of what is truly desired. If questions continually read, "I enjoy going to school," "I like going to school," "Going to school is fun for me," etc., then participants might start to tick SA, SA, SA without thinking about their responses. Therefore, in many cases, it is good to also write items that mean the opposite: "I do NOT like going to school." This keeps participants alert when responding, and it also prevents the problem of acquiescence bias, the tendency to agree with every statement.

As an illustration of developing questionnaire items, consider the variable meaningful reading from the teachers' beliefs of literacy development study. The construct definition of meaningful reading was: "The belief that young children should both read meaningful literature and have meaningful literature read out loud to them (books that are enjoyable, relevant, and are not necessarily part of the curriculum) because it helps improve their literacy skills." The operational definition was: "Meaningful reading will be measured by teachers' agreement to four self-report statements of their beliefs on meaningful reading on a six-point Likert scale." The four items based off of this definition were:

Teachers should read texts to their pupils that are not in the textbook or curriculum.

Teachers should give pupils class time to read books that are not part of the curriculum.

Parents can help their young children learn to read before entering school by reading out loud to them.

A teacher should read a story out loud to their class every day.

Notice how these questions are directly related to the construct definition. Indeed, some of the questions might seem a little repetitive. This is good. When questions vary too much, then the questions oftentimes get off-topic and introduces error.

PracticeBelow, I present some bad questionnaire items. Read over each item and consider what might be wrong with the item.

Where are you from?

Does this mean where you grew up? Where your tribal home is? Where you are living now? The item must be clear so as to not be misunderstood. This item should be rephrased to be more specific, such as, What is your geographical zone of origin?

What are the problems associated with the 6-3-3-4 system of education?

This problem is biased because it assumes that there are problems with the 6-3-3-4 system. What if the person thinks the system is ok? Instead, the item should read: What do you think about the 6-3-3-4 system of education?

How many times did you revise your thesis? a) Very many b) Many c) Few d) Very few

o This item is too vague. What does "very many" mean? What is "very few" to me, such as revising my thesis five times, might be "very many" to you. Ensure that response categories are specific to get similar responses across participants. This item should be revised to How many times did you revise your thesis? a) 0 b) 1-2 c) 3-4 d) 5-6 e) 7-8 f) 9-10 g) 11-12 h) 13-14 i) 15+

I have a high level of introjected motivation in maths.

Would the participants know what introjected motivation is? The words are too complex for the typical participant. Ensure that the question is worded in a way that is understandable to the population who will complete the questionnaire. This item should be rephrased to: I study in maths because I want to make my parents proud.

I teach all of my required lectures.

How many people would admit that they do NOT teach all of their required lectures? If you want to measure whether a lecturer attends all of their lectures, there are two possible options. First, you could ask students who would probably give a less biased judgment of how often a lecturer attends class. Second, you could ask the lecturer to specifically indicate how many hours they spend lecturing for the entire semester. For each item, consider whether a participant would answer the question honestly. Sometimes participants may blatantly lie, other times the participant might unconsciously lie. For example, a lecturer might honestly think they give all of their required lectures, but in actuality they may not.

I enjoy school because the teachers are nice.

This item is called a double-barreled item. This means that the same item has multiple points. What should a student tick if she enjoys school, but the teachers are not nice? What if the teachers are nice, but the student still does not enjoy school? This should really be two separate items: I enjoy school.The teachers at my school are nice.

What level are you in university? a) 100 b) 300

What will a 200-level student tick? A 400-level student? Item options must contain all possible responses.

Tips for Writing Questionnaire ItemsWhen writing the items on a questionnaire, keep the following tips in mind:

The item must be necessary by directly measuring a key variable in a Research Question or Research Hypothesis.

Oftentimes, participants will read the items quickly and may skip over key words. Therefore, I always try to highlight any negative word, such as NO or NOT by typing them in all capital letters, underlining the word, and/or bolding the word. Negative words are important because they completely change the meaning of the item, and skipping over this word will make the participant respond the opposite. By highlighting the negative word, it is more difficult for it to be skipped, reducing potential error.

Consider whether the participant can and/or will answer the question honestly.

Ensure that the question is worded in a way that is understandable to individuals in the population under study.

The item must be clear so as to not be misunderstood and every participant must have the same understanding of the item.

Avoid double-barreled questions that have multiple points contained within the same item.

Ensure the item is not biased or leading toward a specific response.

Item options must contain all possible responses.

Ensure that response categories are specific to get similar responses across participants.

Write twice as many items per variable as you think are necessary. Many items might be canceled, so it is good to have extras to fill the gap.

Tips for Writing Questionnaire OptionsOftentimes, educational researchers use Likert Scale fformat for responding, which asks participants to indicate their level of agreement to statements. Typically, the options are various levels of Strongly Agree, Agree, Agree Somewhat, Disagree Somewhat, Disagree, or Strongly Disagree. Some people also use a Neutral category. A Neutral category is good for participants who do not have an opinion about a particular statement. A researcher's personal preference and/or the variable under study determine whether a Neutral or No Opinion category is used. I generally personally prefer not to use a Neutral because I typically prefer that my participants think about the statement and make a choice. I feel that giving a Neutral option allows participants to be lazy and not think about an item.

The Likert Scale is not the only scale that is available for responding, though, and many research studies require that other continuous options are used. For example, I have given the following directions:

Please read each statement carefully and respond how similar to you the statement is on the following scale from 1 to 6. 1=Very Different, 2=Different, 3=Somewhat Different, 4=Somewhat Like, 5=Like, 6=A Lot Like. A sample item was: When I have free time at home, I read for fun.

On a typical day, how much time do you spend reading the following materials? Use this as your guide: A=Never, B=1-15 min, C=16-30 min, D=31-45 min, E=46-60 min, F=1-2 hours, G=3 hours+. Items included Lecture notes, Newspaper/Magazine, etc.

How often do you use the following religious media? a) Every Week, b) Every Month, c) Occasionally, d) Never. A sample item was: Watching religious television.

During the examination for one of your education classes, the person sitting next to you asks for an answer. For each situation listed below, indicate how willing you would be to help that person by sharing your answer: 1=Definitely NOT Help, 2=Might Help, 3=Will likely Help, 4=Definitely Will Help. A sample item was: The student was sick throughout the term.

Rate the importance of each influence on your choice of becoming a teacher. Use the following scale: 1=Not at All, 2, 3, 4, 5, 6, 7=Extremely. A sample item was: I am interested in teaching.

The point here is that you should not just limit yourself to Strongly Agree, Strongly Disagree. There are plenty of options available for developing questionnaire items. Think about what the best response options will be, and the best format for presenting those options. Notice that in the last directions, I did not label any numbers except the first (Not at all) and the last (Extremely). However, participants had seven different numbers they could circle. Not all numbers in the response scale must have a label assigned to them.

A questionnaire can also use a "Pick your top choices" option. For example, read the following item:
Please circle the four (4) biggest reasons why reading is important.

To pass exams

For entertainment

For personal development

To improve my vocabulary

To pass the time

To earn a certificate

To learn about new things

Learn about different people

To relax

To stimulate my mind

To be well educated

To read a good story

To increase my knowledge of God

The number of people who tick each option can be tallied and then a table can present these reasons in order from most frequent to least frequent. This type of item is good for a descriptive study.

Checklists like this can also be used to calculate the frequency of behaviors. For example, checklists are often used to determine whether somebody has a mental illness. A list of symptoms of depression, such as fatigue, loss of appetite, and feelings of sadness, are listed. Participants tick all symptoms they have experienced within a specific timeframe (perhaps six weeks, six months, etc.). If a person has experienced a certain number of symptoms, then they are classified as being depressed.

Likewise, participants could also tick the number of behaviors they have done in other domains. A researcher interested in cheating might present a list of various types of malpractices (e.g., copying an assignment, bringing in a cheat sheet to an exam, writing answers on their body, etc.) and ask participants to tick which malpractices they have engaged in within a specific period of time (i.e., 1 year, in university, etc.). Instead of ticking, participants could circle either "Yes" or "No." Checklists are also useful for researchers when they are observing participant behavior. I explain a bit more about collecting observations at the end of this page.

Revising Questionnaire Items

Once you have finished developing questionnaire items, you need to spend substantial time revising those items even before giving it to somebody else to review for you. Here are some tips for revising the items:

Read the questionnaire through once. For every item, write which key variable the item is related to. When you are finished, cancel each item that is not related to a key variable.

Read the questionnaire through again. Compare each item to the construct definition. Does the item content directly relate to the construct definition? If the item does not relate to the construct definition, if it measures a cause of the variable, or if it measures an outcome of the variable, cancel the item.

Read the questionnaire through a third time. This time, imagine you are a participant from the population. Read each item from the participant's point of view. Consider the following questions:

Does the item contain difficult or unclear terminology?

Is there enough information in the item to give an appropriate, clear response?

Is the wording loaded or biased?

Read the questionnaire through a fourth time. Correct spelling and grammar mistakes. (If you get a typist to type the questionnaire, you will need to do this again when you received a typed copy.)

Collecting Observational Data

Oftentimes, educational researchers are interested in actual participant behaviors. A major topic of interest to educational researchers is teaching methods in the classroom. For example, does a teacher use good questioning techniques? Does a teacher appropriately engage all students? What is a teacher's classroom management style? These types of variables are most directly measured by watching a teacher in the classroom and recording observations of the teacher's behavior.

Just a reminder that observation is best used when measuring behavior. Using observation to measure feelings and attitudes can be tricky. For example, a person might be tempted to observe whether a person drinks a Maltina to determine their attitude toward Maltina, the logic being that they have a positive attitude toward Maltina if they drink it and have a negative attitude toward Maltina if they don't drink it. However, there are many reasons why a person might drink Maltina and have a negative attitude toward it: they were thirsty and it was the only beverage available, they wanted to be polite, etc. Likewise, there are many reasons why a person might not drink a Maltina but they may still like it: they were not thirsty, it was too expensive, they were about to go on a long car trip, etc. In general, behaviors are best measured by observation while feelings and attitude are best measured by self report questionnaire or checklist.

Just as variables measured by questionnaire first need to be operationally defined, so too do variables measured by observation. Once the operational definition is given, make a list of behaviors to be observed. If the variable to be observed is type of questions that teachers ask, then the researcher must consider the types of questions of interest? Perhaps a list of relevant questions may include:

Yes/No questions

Asking students to repeat a statement that was just said by the teacher

Question requiring one or two-word answer

Open-ended question requiring longer answer

Make the categories as clear and specific as possible so each behavior can clearly be classified. For example, the following categories are not clear enough:

Good Question

Bad Question

What makes a question good or bad? These categories need to be clarified so it will be obvious in which category a given question should be placed.

Once the list of behaviors has been developed, then the researcher needs to determine how the behaviors will be observed. Consider the following questions:

Where and when will the behaviors be observed? To ensure that you are comparing mangoes to mangoes, participants need to be observed in the same context. It would be unfair to observe one teacher who is giving exams to another teacher who is giving a full lesson. Likewise, if observing children's social interactions, it would be unfair to compare one child playing football, another child doing chores around the house, and another child who just woke up from a nap. Therefore, parameters need to be set on where and when behaviors will be observed.

How long will participants be observed? Will each teacher be observed once or multiple times? For twenty minutes? Ten minutes? A teacher who is observed for ten minutes will obviously ask fewer questions than a teacher who is observed for twenty minutes.

How will behaviors be recorded? There are three options for recording behaviors:

Duration. Behaviors are recorded for the length of time that behaviors occur. For example, if a researcher is interested in social interactions, they might record the length of time that the child spends interacting with a specific child. To illustrate, a stopwatch is started when the participant approaches Child A. When the participant leaves Child A, then the stopwatch is stopped and the length of time is recorded. When the participant approaches Child B, the stopwatch is started again, etc. Likewise, a researcher might record the amount of time that a teacher spends assisting individual students on their assignment.

Frequency Count. The researcher tallies the number of times a behavior occurs. For the teacher-question example given above, it would be most logical to tick the number of times that the teacher asks each type of question. Likewise, a researcher might tick the number of times that a teacher beats a student in a particular school.

Interval. In interval recording, a researcher sets a stopwatch for a specific interval of time, perhaps 2 minutes. Then every time the stop watch goes off, the researcher records what behavior the participant is engaged in. For example, every 2 minutes the researcher may tick what a teacher is doing: asking a question, answering a question, lecturing, writing on the chalkboard, disciplining a student, etc. Likewise, a researcher may also tick what a football player is doing on the pitch every two minutes: dribbling, passing, playing defense, resting, etc.

Once these questions have been answered, then the researcher should develop an observation scale, a form where the observer records behaviors. Observation schedules are for the researchers' and research assistant's use, so they do not have to be quite as detailed as questionnaires. Example observation schedules are given below.

Duration Observation Schedule

Frequency Count Observation Schedule

Interval Observation Schedule

Once the observation scale has been developed, it needs to be piloted a few times to ensure that the categories are clear, the categories are sufficient (i.e., there are not any more behaviors that should be added), and the observational guidelines are effective.

When the final observation schedule is finished, then all research assistants who will be observing in the study need to be thoroughly trained on using the schedule. First, the observation schedule should be discussed so the research assistants understand how to use the observation schedule and understand each of the categories. Then, each research assistant should do practice observations. It is best to ask all research assistants to observe the exact same behavior so disagreements on how a specific behavior should be classified can be discussed.

Final Considerations in Item Development

Keep the following additional considerations in mind when developing an instrument.

Ensure that the instrument is suitable for your population. If your population will consist largely of individuals who are illiterate, it is unrealistic to expect them to complete a questionnaire. If only one or two people in the sample are illiterate, then it is best to have a research assistant read the questionnaire word-for-word to the person and record the participant's responses. It is also unrealistic to expect young children to self-report on their own behavior. In this case, it would probably be best to have a parent or guardian report on their behavior.

How will this information be used in data analysis? If the item does not relate to a variable in a research question or hypothesis, then it needs to be canceled. If the item doesn't directly relate to the construct definition then it needs to be canceled. Participants only have limited time and attention, so make judicious use of this by only including items that are relevant.

All participants must provide the same sources of information. Sometimes a researcher will propose that both questionnaires and interviews will be conducted. Oftentimes the interviews are with illiterate participants. However, for quantitative data analysis, all participants must provide the exact same information. Therefore, the items on the questionnaire should be identical in every way to the items on the interview - except that the interview is oral whereas the questionnaire is paper/pencil. If different participants provide different types of information, then it is impossible to conduct quantitative analysis on both types of data collection. (However, this point does not apply to qualitative research.)

Ensure that all items answer a research question or hypothesis. Oftentimes, I will read student projects that ask both teachers and students to complete a questionnaire or interview. In most cases, one of these instruments is completely unnecessary and waste the researcher's time and money, as well as the participant's time. Every single item on every single instrument must be directly related to a research question or hypothesis. If the data received will not help answer one of these, then the data will not be analyzed and the data is useless.

Translation of instruments requires a back-translation process. Sometimes, an instrument will need to be translated into multiple languages. An instrument should always be administered in a language that the participant is comfortable in to avoid errors in responses due to the participant not understanding the language of the questionnaire. If an instrument needs to be translated into multiple languages, a researcher must take special care to ensure that the questionnaire in English is identical to the questionnaire in Yoruba, Hausa, Igbo, etc. Bilingual people well know that a word in English might have slightly different meanings than the word that it is typically translated into in Hausa. To ensure that the English version of the questionnaire is as close as possible to the Hausa version, back-translation is necessary. To do this, get an expert speaker in Hausa to translate the instrument from English into Hausa. This is called forward-translation. Once the forward-translation is finished, then a completely different Hausa expert who has never seen the English version should translate the Hausa version back into English. Then the two English versions are compared. If original English instrument matches the back-translated English version, then the Hausa translation is adequate. However, this is rarely the case on the first translation attempt. Instead, the back-translated English version oftentimes has slightly different meanings on some items. The two Hausa translators should be brought together to discuss the nuances of those difficult items in the Hausa version and a revision to the Hausa version should be made. Then a third independent Hausa expert should translate the revised Hausa instrument back into English. Again, the English original and English back-translated instrument should be compared, problem items discussed by the translators, revisions made, etc. The back-translation process is finished when the English original and English back-translation match.