Measurement in Evaluation
By Moira Ragan
“…not everything that can be counted counts, and not everything that counts can be counted.” – William Bruce Cameron
Assessment before improvement
Though my background is in scale development—and much of the attention given to rigor in item writing is in the context of achievement and cognitive testing—item writing and instrument development are essential components in evaluation. If you can’t measure something, you can’t assess programs and their effectiveness, answer client questions, meet funder needs or inform program improvement. Isn’t that, after all, why we evaluate?
Item writing and instrument development require planning and expertise. It is often an iterative process that can be time consuming. While it might seem obvious, writing items takes effort and work. So, when should evaluators use existing instruments, adapt existing or start from scratch?
Like a true statistician, my answer is: It depends.
Find, adapt or start over?
First, what are you measuring—attitudes about something, feedback on program components, knowledge? Are you building your understanding of the theoretical foundations of a particular program and its intended outcomes?
Second, how are you collecting data—self-report surveys, cognitive tests, interviews, observations?
Third, how confident do you feel writing your own items and protocols?
If you don’t have experience developing instruments, it is often advisable to use established, psychometrically-sound instruments to collect data on constructs of interest when possible (e.g., self-efficacy, attitudes toward something). There are often a great deal of options in the literature that you can either use or adapt. (Adapting instruments is a conversation for another day—just remember, psychometric evidence doesn’t transfer to instruments “based” on others!)
Starting from scratch can be daunting and isn’t always necessary or appropriate—but sometimes you need to. Though you might find versions you can use as starting points, you will have to develop your own questions for anything specific to the program you are evaluating. Good items are fundamental to meaningful interpretation of data, inferences, actionable recommendations and—of course—validity.
Rules of thumb
Be clear. Avoid ambiguity, redundancies and overly lengthy questions. Write clear directions.
Check your readability. Be sure to write an appropriate reading level (even for adults—a good rule of thumb is 6th grade per Flesch-Kincaid) and avoid jargon or unusual, unnecessarily difficult vocabulary. Not everyone knows abstruse, recondite vernacular…
Check for bias. Be culturally responsive.
Write clear objectives and items to measure them.
When applicable, pick appropriate response options. Match response options to the type of question (e.g., magnitude, frequency) and try to remain consistent across items.
Draw a blueprint and get a review. Make sure your items are measuring what you intend to measure. Back it up with a review by peers, experts and/or clients.