The Purpose of Cognitive Diagnostic Assessment
“A wise fish never goes anywhere without a porpoise.”
Mock Turtle to Alice, Alice’s Adventures in Wonderland ,
I use this quote to start off my class on test construction because
the purpose of an assessment is the very first and very last thing you
should think about when building a test. It is important to think
about it in the beginning, because unless the design team is all on
the same page about the purpose of the assessment, they will produce
pieces that do not fit together well. The last part of the test
construction process should be a validity study which ensures that the
newly created assessment is actually suitable for the purpose to which
it is to be put. Failing to nail down the purpose early in the design
process invites the
creep to be a part of your design team.
The problem for most assessment design projects is that when the team
gets partway through the design process, somebody will have a
brilliant idea about a second purpose the assessment can be used for.
After all, as long as you are going to all the effort and expense of
building a new assessment, why not …? Stop! This is
potential trouble. At the very least the new purpose will require an
extra validity study as now both purposes have to be validated. At
worst, it can seriously dilute the collection of tasks on the test. A
biblical quote comes to mind:
No one can serve two masters. Either
you will hate the one and love the other, or you will be devoted to
the one and despise the other. (Mathew 6:24, NIV).
A similar principle holds for assessments with multiple purposes, one
of the purposes will benefit at the expense of all the others.
The problem for cognitively diagnostic assessment is that it is often
the second purpose, grafted on to an assessment whose primary purpose
is a high-stakes selection or placement decision. It is natural that
people who do poorly in such an exam would want additional diagnostic
information about where they fell short. So it seems like
retrofitting a diagnostic report onto the high-stakes assessment would
be a natural benefit to examinees.
Here is where the two purposes come into play. If the primary purpose
is a high-stakes selection or placement decision, then the
overwhelming need of the test is high reliability. Usually this is
accomplished by doing some kind of pretest and then looking at the
biserial correlation between the item score and the total test score.
Items with a low correlation are deemed to have low reliability and do
not make it to the final test form.
But, these low reliability items might be exactly the items that are
good at providing good differential diagnosis. In particular, if the
purpose of diagnosis is to determine whether an examinee is lacking
in Skill 1 or Skill 2, and the presence
of Skill 1 and the presence of Skill 2 are
moderately to strongly correlated in the population, items that
provide good differential diagnosis between Skill 1
and Skill 1 are likely to have lower bisearial
correlations with the total score. Therefore, the best items for
diagnostic assessment get purged from the assessment by the test
High-stakes assessment also puts demands on test security which is not
such a stringent requirement. In particular, for a high-stakes
testing program, the items need to be periodically replaced and new
forms created to keep examinees from studying the specific test items
instead of the general construct measured by the assessment. This
brings about the need for equating the forms. In this case adding
diagnostic feedback to each of the items is very expensive, because the
feedback needs to be reauthored for each new form of the assessment.
An alternative would be a two-stage procedure. The first stage is
the original high-stakes exam. Examinees who are not happy with their
scores at the first stage can then take the second stage diagnostic
exam. As the diagnostic exam is low-stakes (reported only back to the
examinees and possibly the examinees’ instructors), there is no need
to change the items (unless the test specifications change). Also,
this can be done online without proctoring, making the test long
enough to get enough information about each aspect of proficiency that
is important. Also, if the high-stakes stage can be linked to the
diagnostic stage, then the scores from the first stage can be used as
a starting point for the diagnostic analysis (this is straightforward
to do with Bayesian scoring algorithms).