As assessment theory and practice have evolved in the last 15 years, assessment designers and psychometricians have developed measurement models and practices that help us illuminate examinee processing of test items and other assessment activities. These developments include cognitive-diagnostic modeling (e.g., Rupp, Templin, & Henson, 2010), models of examinee thinking (e.g., Leighton & Gierl, 2007), and principled approaches to assessment design, development, and implementation (e.g., Ferrara, Lai, Reilly, & Nichols, 2016). The common goal of illuminating examinee thinking is enhance test score interpretation by uncovering why examinees responded successfully or unsuccessfully to items on an assessment in addition to whether they responded successfully. Test item development is as much art as it is science—maybe a “dark art,” as John Bormuth (1970) called it long ago. Principled approaches to design, development, and implementation shed light on examinee thinking and bring more science into the art of test item development. How so?
The most widely known principled approaches—Evidence-Centered Design, Cognitive Design Systems, Assessment Engineering, the BEAR Assessment System, and Principled Design for Efficacy—share common, foundational elements. These elements focus sharply on examinee thinking, with the goal of enhancing the validity of inferences we make from test scores about what examinees know and can do as well as their degree of achievement and competence. Table 1 from our chapter, Principled Approaches to Assessment Design, Development, and Implementation: Cognition in Score Interpretation and Use, makes the common elements plain.
Foundation and Organizing Elements of Principled Approaches to Assessment Design, Development, and Implementation and their Relationship to the Assessment Triangle
|Framework Elements||Assessment Triangle Alignment|
|Ongoing accumulation of evidence to support validity arguments||Overall evidentiary reasoning goal|
|Clearly defined assessment targets||Cognition|
|Statement of intended score interpretations and uses||Cognition|
|Model of cognition, learning, or performance||Cognition|
|Aligned measurement models and reporting scales||Interpretation|
|Manipulation of assessment activities to align with assessment targets and intended score interpretations and uses||Observation|
From Ferrara et al. (2016). Used with permission.
The dual focus on cognition and gathering evidence to support intended score interpretations and uses enables assessment designers to illuminate examinee thinking. For example, specifying a model of cognition, learning, or performance helps guide design and development of assessment activities and ensures that assessment activities are aligned with the model of thinking and measurement models that undergird a test’s score reporting scale.
Test development organizations are implementing these principles into standard practice, probably incrementally rather than by tearing down current practices and retooling to implement new ones. Some tools of principled approaches—for example, task models and templates—may be in wide use as a way of providing more detailed item specifications than in the past. My co-authors and I have provided a framework for integrating principled practices into existing practices. We describe the framework, Principled Design for Efficacy (PDE), in our chapter. PDE is not intended as a competitor or replacement for more widely known models like ECD. Rather, we offer it as a framework to make incremental insertions into current practices, rather than tearing down and rebuilding. Figure 3 illustrates how assessment designers and program managers can build principled practices into existing ones.
Figure 3. Conventional processes (white boxes) and processes based on principled approaches (foundational elements are numbered in the three grey boxes with grey background) for assessment design, development, and implementation showing overlap and differences. From Ferrara et al. (2016). Used with permission.
There is general agreement in the testing field that illuminating examinee cognition improves assessment design and development, the quality of assessments and the information about examinees that they provide, and enhances evidence for validity arguments to support intended score interpretations and uses. The Handbook on Cognition and Assessment shows us how to get there.
Bormuth, J. R. (1970). On the theory of achievement test items. Chicago: The University of Chicago Press.
Ferrara, S., Lai, E., Reilly, A., & Nichols. (2016). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The Handbook on cognition and assessment: Frameworks, methodologies, & applications (pp. 41-74). Malden, MA: Wiley.
Leighton, J. P., & Gierl, M. J. (2007). Defining and evaluating models of cognition used in educational measurement to make inferences about examinees’ thinking processes. Educational Measurement: Issues and Practice, 26(2) 3-16.
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: theory, methods, and applications. New York: Guilford Press.