Moving from a Craft to a Science in Assessment Design

Guest post by Paul Nichols, ACT

This is one of a series of blog posts from chapter authors from the new Handbook of Cognition and Assessment.  See Beginning of a Series: Cognition and Assessment Handbook for more details.

In our chapter in the Handbook, we present and illustrate criteria for evaluating the extent to which theories of learning and cognition, and the associated research, when used within a principled assessment design (PAD) approach, support explicitly connecting and coordinating the three elements comprising the assessment triangle: a theory or set of beliefs about how students think and develop competence in a domain (cognition), the content used to elicit evidence about those aspects of learning and cognition (observation), and the methods used to analyze and make inferences from the evidence (interpretation). Some writers have cautioned that the three elements comprising the assessment triangle must be explicitly connected and coordinated during assessment design and development or the validity of the inferences drawn from the assessment results will be compromised.

We claimed that a criterion for evaluating the fitness of theories of learning and cognition to inform assessment design and development was the extent to which theories facilitate the identification of content features that accurately and consistently elicit the targeted knowledge and skills at the targeted levels of complexity. PAD approaches attempt to engineer intended interpretations and uses of assessment results through the explicit manipulation of the features of content that tend to effectively elicit the targeted knowledge and skills at the targets complexity levels. From the perspective of PAD, theories of learning and cognition, along with the empirical research associated with the theories, should inform the identification of those important content features.

The claim from a PAD perspective is that training item writers to intentionally manipulate characteristic and variable content features enables item writers to systematically manipulate these features when creating items and tasks. Subsequently, items and tasks with these different features will elicit the kind of thinking and problem solving at the levels of complexity intended by the item assignment. But I have no scientific evidence supporting this claim. I have only rationale arguments, e.g., if item writer understand the critical content features then they will use them, and anecdotes, e.g., item writers told me they found the training helpful, to support this claim.

An approach that might help me and other researchers gather evidence with regard to such claims is called design science. Design science is fundamentally a problem solving paradigm. Design science is the scientific study and creation of artefacts as they are developed and used by people with the goal of solving problems and improving practices in peoples’ lives. In contrast to natural entities, artefacts are objects, such as tests, conceptual entities, such as growth models, scoring algorithms or PAD, or processes, such as standard setting methods, created by people to solve practical problems. The goal of design science is to generate and test hypotheses about generic artefacts as solutions to practical problems. Design science research deals with planning, demonstrating and empirically evaluating those generic artefacts. A main purpose of design science is to support designers and researchers in making knowledge of artefact creation explicit, thereby moving design from craft to science.

ACT is hosting a conference on educational and psychological assessment and design science this summer at their Iowa City, IA, headquarters. A small group of innovators in assessment are coming together to consider the potential of design science to aid assessment designers in designing and developing the next generation of assessments. Look for the findings from that conference at AERA or NCME in 2017.

 

Beginning of a Series: Cognition and Assessment Handbook

HandbookDear SIG colleagues,

We are incredibly excited to share with you information regarding a new book that we have co-edited with contributions from several member of our SIG as well as other colleagues in the field. It is called the Handbook of Cognition and Assessment: Frameworks, Methodologies, and Applications and will appear in Britain and the US in the fall. It is close to 600 pages long and includes a total of 22 chapters divided into 3 sections (Frameworks – 9 chapters, Methodologies – 6 chapters, and Applications– 7 chapters). In addition to the core invited chapters we wrote thoughtful introductory and synthesis chapters that situate the book within current developments in the field of cognitively-grounded assessment. We also created a glossary with key terms, including both those that are used repeatedly across chapters and those that are of particular importance to specific chapters, and have worked with all authors to find common definitions that represent appropriate consensus definitions. Throughout the book, we highlighted all of these terms through boldfacing upon first mention in each chapter.

We probably do not have to convince you that the publication of such a handbook is timely and we sincerely hope that it will be considered a useful reference for a diverse audience. To us, this audience includes professors, students, and postdocs, assessment specialists working at testing companies or for governmental agencies, and really anyone who wants to update their knowledge on recent thinking in this area. In order to help reach those audiences we worked closely with authors to keep the level of conceptual and technical complexity at a comparable level across chapters so that interested colleagues have
the ability to read individual chapters, entire sections, or even the full book. The list of contributors in this book is really quite amazing and, by itself, presents a great resource of expertise that you may want to consider the next time you are looking for advice on
a project or for someone on your advisory panel.

In order to get you all even more excited about the handbook we have invited our contributing authors to share their current thoughts via upcoming blog entries for the SIG so look out for those in the coming months! It also goes without saying that our book is only one of many artifacts that helps to synthesize ideas and build conceptual and
practical bridges across communities. Therefore, if you have additional suggestions for us regarding what kinds of efforts we could initiate around the handbook please let us know – we would love to hear from you! Finally, if you like the Handbook, we would
certainly appreciate it if you could share information about it on social media, perhaps for now by sharing the cover image below. Many thanks in advance and thank you very much for your interest!

André A. Rupp and Jacqueline P. Leighton

(Co-editors, Handbook of Cognition and Assessment)

Paper Rubric for 2017 AERA

As a measurement specialist, I’ve always found the AERA evaluation rubric to be a bit minimal. AERA provides the names of the scales, but little information about what goes into them. Some of that is a function of the fact that different divisions and SIGs have very different ideas of what constitutes research (qualitative, quantitative, methodological, literature synthesis). We as a SIG can do better. So please help me out with an experiment on this part.

AERA defines six scales for us (see below). The goal of this post is to provide a first draft of a rubric for those six areas. I’m roughly following a methodology from Mark Wilson’s BEAR system, particularly, the construct maps (Wilson, 2004). As I’ve been teaching the method, I take the scale and divide it up into High, Medium, and Low areas, than then think about what kind of evidence I might see that (in this case) a paper was at that level on the indicated criteria. I only define three anchor points, with the idea that a five point scale can interpolate between them.

In all such cases, it is usually easier to edit a draft than to create such a scale from scratch. So I’ve drafted rubrics for the six criteria that AERA gives us. These are very much drafts, and I hope to get lots of feedback in the comments about things that I left out, should not have included, or put in the wrong place. In many cases the AERA scale labels are deliberately vague so as to not exclude particular kinds of research. In these cases, I’ve often picked the label that would most often apply to Cognition and Assessment papers, with the idea that it would be interpreted liberally in cases where it didn’t quite fit.

Here they are:

1. Objectives and Purposes
High (Critically Significant) Objectives and purposes are clearly stated. Objectives and purposes involve both cognition and assessment. Objectives touch on issues central to the field of Cognition and Assessment.
Medium Objectives and purposes can be inferred from the paper. Objectives and purposes involve either cognition or assessment. Objectives touch on issues somewhat related to the field of Cognition and Assessment.
Low (Insignificant) Objectives and purposes are unclear. Objectives and purposes re only tagentially related to cognition or assessment. Objectives touch on issues unrelated to the field of Cognition and Assessment.
2. Perspective(s) or Theoretical Framework
High (Well Articulated) Both cognitive and measurement frameworks are clearly stated and free from major errors. The cognitive and measurement frameworks are complementary. Framework is supported by appropriate review of the literature.
Medium Only one of the cognitive and measurement perspectives are clearly stated, or
both are implicit. If errors are present, they are minor and easily corrected.
Fit between cognitive and measurement frameworks is not well justified. Framework is not well supported by the literature review.
Low (Not Articulated) Cognitive and measurement frameworks are unclear or have major substantive errors. There is a lack of fit between the cognitive and measurement models. Literature reviews clearly misses key references.

Note that implicit in this rubric is the idea that a Cognition and Assessment paper should both have a cognitive framework and a measurement framework.

3. Methods, Techniques or Modes of Inquiry
High (Now Well Executed) Techniques are clearly and correctly described. Techniques are appropriate for the framework. Appropriate evaluation of the methods is included in the paper.
Medium Techniques are described, but possibly only implicitly, or there are only easily corrected errors in the techniques. Techniques are mostly appropriate for the framework. Minimal evaluation of the methods is included in the paper.
Low (Well Executed) Techniques are not clearly described, implicitly or explicitly; or there are significant errors in the methods. Techniques are not appropriate for the framework. No evaluation of the methods is included in the paper.

I’ve sort of interchangeably used techniques and methods to stand for Methods, Techniques or Modes of Inquiry.

4. Data Sources, Evidence Objects, or Materials
High (Inappropriate) Data are clearly described. Data are appropriate for the framework and methods. Limitations of the data (e.g., small sample, convenience sample) are clearly acknowledged.
Medium Data source only partially described. Data not clearly aligned with the purposes/methods. Limitations of the data incompletely acknowledged.
Low (Appropriate) Data description is unclear. Data are clearly inappropriate given purpose, framework and/or methods. Data have obvious, unacknowledged limitations.

Here data (note that data is a plural noun) has to be interpreted liberally to incorporate traditional assessment data, simulation results, participant observations, literature reviews and other evidence to support the claims of the paper.

5. Results and/or substantiated conclusions or warrants for arguments/points of view
High (Well Grounded) Results are clearly presented, with appropriate standard errors or description of expected results is clear. Success criteria are clearly stated. It would be possible for the results to falsify the claims. If results are available, conclusions are appropriate from the results.
Medium Results are somewhat clearly presented or expected results are likely to be appropriate. Success criteria are implicit. There are many researcher degrees of freedom in the analysis, so the results presented are likely to be the ones that most strongly support the claims. If results are available, the conclusions are mostly appropriate.
Low (Ungrounded) Results are unclear, or the expected results are unclear. Success criteria are not stated and (are/could be) determined after the fact. The results are not capable of refuting the claim, or there are too many research degrees of freedom so that can always be made to support the claim. If results are present, the conclusion is inappropriate.

I’ve tried to carefully word this so that it is clear that both papers in which the results are present and in which the results are anticipated are appropriate. There are also two new issues which are not often explicitly stated, but should be. First, the standard of evidence should be fair in that it should be possible to either accept or reject the main claims of the paper on the basis of the evidence. Second, there are often many analytical decisions that an author can use to make the results look better, for example, choosing which covariates to adjust. Andrew Gelman refers to this as the Garden of Forking Paths. I’m trying to encourage both reviewers to look for this and authors to be honest about the data dependent analysis decisions they used, and the corresponding limitations of the results.

6. Scientific or Scholarly Significance of the study or work
High (Highly Original) Objectives and purposes offer either an expansion of the state of the art or important confirming or disconfirming evidence for commonly held beliefs. The proposed framework represents an extension of either the cognitive or measurement part or a novel combination of the two. Methods are either novel in themselves or offer novelty in their application. Results could have a significant impact in practice in the field.
Medium Objectives and purposes are similar to those of many other projects in the field. The proposed framework has been commonly used in other projects. Paper presents a routine application of the method. Results would mostly provide continued support for existing practices.
Low (Routine) Objectives and purposes offer only a routine application of well understood procedures to a problem without much novelty. The framework offers no novelty, or the novelty is a function of inappropriate use. If novelty in methodology is present, it is because the application is inappropriate. Results would mostly be ignored because of flaws in the study or lack or relevance to practice.

When I’m asked to review papers without a specific set of criteria, I always look for the following four elements:

  1. Novelty
  2. Technical Soundness
  3. Appropriateness for the venue
  4. Readability

These don’t map neatly onto the six criteria that AERA uses. I tried to build appropriateness into AERA’s criteria about Objectives and purposes, and to build novelty into AERA’s criteria about Significance. Almost all of AERA’s criteria deal with some aspect of technical soundness.

Readability somehow seems left out. Maybe I need another scale for this one. On the other hand, it has an inhibitor relationship with the other scales. If the paper is not sufficiently readable, then it fails to make its case for the other criteria.

It is also hard to figure out how to weigh the six subscales onto the overall accept/reject decision axis. This is the old problem of pushing multiple scales onto a single scale. It is a bit harder because this is an interesting relationship, being part conjunctive and part disjunctive.

The conjunctive part comes with the relationship between the Low and Medium levels. Unless all of the criteria are at least at moderate levels, or the flaws causing the paper to get a low rating on that criteria are easy to fix, there is a fairly strong argument for rejecting the paper and not representing
sufficiently high quality work.

However, to go from the minimally acceptable to high priority for acceptance, the relationship is disjunctive: any one of the criteria being high (especially very high) would move it up.

A bigger problem is what to do with a paper that is mixed: possibly high in some areas, and low in others. Here I think I need to rely on the judgement of the referees. With that said, here is my proposed rubric for overall evaluation.

Overall Evaluation
Clear Accept All criteria are at the Medium level, with at least some criteria at the High
level.
Research has at least one interesting element in the choice of objectives, framework, methods, data or results that would make people want to hear the paper/see the poster.
Borderline Accept Most criteria are at the Medium level, with any flaws possible to correct by the authors before presentation. Research may be of interest to at least some members of the SIG.
Clear Reject One or more criteria at the Low level and flaws difficult to fix without fundamentally changing the research. Research will be of little interest to members of the SIG.

The last problem is that we are using short abstracts rather than full papers. In many cases, there may not be enough material in the paper to judge? What are your feelings about that. The SIG management team has generally like the abstract review format as that makes both the reviewing faster, and it easier to submit in progress work. Should we continue with this format? (Too late to change for 2017, but 2018 is still open.)

I’m sure that these rubrics have many more issues than the ones I’ve noticed here. I would encourage you to find all the holes in my work and point them out in the comments. Maybe we can get them fixed before we use this flawed rubric to evaluation your paper.

Edit:  I’ve added the official AERA labels for the scales in parenthesis, as AERA has FINALLY let me in to see them.

2017 AERA Call for Proposals

AERA2017

Cognition and Assessment SIG 167

You are invited to submit a proposal to the 2017 annual meeting of AERA Cognition and Assessment SIG. The Cognition and Assessment SIG presents researchers and practitioners with an opportunity for cross-disciplinary research within education. We are a group of learning scientists and researchers who are interested in better assessing cognition and are interested in leveraging cognitive theory and methods in the design and interpretation of assessment tools including tests. Our research features many different methods, including psychometric simulations, empirical applications, cognitive model development, theoretical rationales, and combinations of all of the above.

The Cognition and Assessment SIG welcomes research proposals covering an array of topics that meet the broad needs and research interests of the SIG. We encourage you to submit 500 Word Proposals for: Symposium, Paper, Poster or other Innovative sessions. If you would like to use the full AERA limit on words, however, we will accept that.

Please submit proposals to the Cognition and Assessment SIG through the AERA online program portal by July 22, 2016. Please find the specific proposal guidelines for each of the session types here: http://www.aera.net/Portals/38/docs/Annual_Meeting/2017%20Annual%20Meeting/2017_CallforSubmissions.pdf

If you have any question about submitting a proposal to the Cognition and Assessment SIG, please contact the SIG Program Chair, Dr. Russell Almond <ralmond@fsu.edu > or SIG Chair Dr. Laine Bradshaw < laineb@uga.edu >.  

Note:  Discussing and clarifying the rubric used to evaluate the proposals is an important issue related to our SIG.  Look for an upcoming blog post on this topic.

Objectives and Evidence

Its time to mark my beliefs to market on my earlier statements about Behaviourist and Cognitive perspectives on assessment.

I’m now involved in a small consulting project where Betsy Becker, myself and some of our students are helping an agency review their licensure exam.  Right now we have our students working on extracting objectives from the documents of the requirements.  We are trying this out with the cognitive approach rather than the behaviourist one, so our students are asking questions about how to write the objectives.  I thought I would share our advise, and possibly get some feedback from the community at large.

I don’t have permission from our client to discuss their project, so let me use something from my introductory statistics class.  Reading the chapter on correlation, I find the following objective:

  • Understand the effect of leverage points and outliers on the correlation coefficient.

The behaviourists wouldn’t like that one, because it uses an unmeasurable verb, understands.  They would prefer met to substitute an observable verb in the statement, something like:

  • Can estimate the effect of a leverage point or outlier on the correlation coefficient.

This is measurable, but it only captures part of what I had in mind in the original objective.  The solution is to return to the original objective, but add some evidence:

  • Understand the effect of leverage points and outliers on the correlation coefficient.
    1. Given a scatterplot, can identify outliers and leverage points.
    2. Given a scatterplot and with a potential leverage point identified, can estimate the effect of removing the leverage point on the correlation coefficient.
    3. Can describe the sensitivity of the correlation coefficient to leverage points in words.
    4. Given a small data set with a leverage point, can estimate the effect of the leverage point on the correlation.
    5. Can add a high leverage point to a data set to change the correlation in a predefined direction.

The list is not exhaustive.  In fact, a strong advantage of the cognitive approach is that it suggest novel ways of thinking about measuring and perhaps then teaching the objective.

Listing at least a few examples of evidence when defining the objective helps make the objective more concrete.  It also ties it to relative difficulty, as we can move these sources of evidence up and down Bloom’s taxonomy to make the assessment harder or easier.  For example:

  1. [Easier] Define leverage point.
  2. [Harder] Generate an example with a high leverage point.

We are instructing our team to write objectives in this way.  Hopefully, I’ll get a chance later to tell you about how well it worked.

 

Cog & Assessment Poster Session (April 11, 10:00)

Tags

This is an open thread for the SIG poster session. Poster presenters, feel free to add more detail (and handouts) about your work.  Others, feel free to comment on and give feedback on the posters.

Cognition & Assessment SIG Poster Session
Mon, April 11, 10:00 to 11:30am,
Convention Center, Level Two, Exhibit Hall D

Posters:

  • Allocation of Visual Attention When Processing Multiple‐Choice Items With Different Types of Multiple  Representations ‐ Steffani Sass, IPN ‐ Leibniz Institute for Science and Mathematics Education; Kerstin Schütte, IPN;  Marlit Annalena Lindner, IPN ‐ Leibniz Institute for Science and Mathematics Education
  • Definition and Development of a Cognitively Diagnostic Assessment of the Early Concept of Angle Using the Q‐Matrix Theory and the Rule‐Space Model ‐ Elvira Khasanova, University at Buffalo ‐ SUNY
  • In‐Task Assessment Framework: A Framework for Assessing Individual Collaborative Problem‐Solving Skills in an Online Environment ‐ Jessica J Andrews, Educational Testing Service; Deirdre Song Kerr, Educational Testing Service; Paul Horwitz, The Concord Consortium; John Chamberlain, Center for Occupational Research and Development; Al Koon; Cynthia McIntyre, The Concord Consortium; Alina A. Von Davier, ETS
  • Measuring Reading Comprehension: Construction and Validation of a Cognitive Diagnostic Assessment ‐ Junli Wei, University of Illinois at Urbana‐Champaign
  • Assessing Metacognition in the Learning Process (MILP): Development of the MILP Inventory ‐ Inka Hähnlein, University of Passau; Pablo Nicolai Pirnay‐Dummer, University of Halle, Germany
  • Developing Bayes Nets for Modeling Student Cognition in Digital and Nondigital Assessment Environments ‐ Yuning Xu, Arizona State University; Roy Levy, Arizona State University; Kristen E. Dicerbo, Pearson; Emily R. Lai, Pearson; Laura Holland, Pearson Education, Inc.
  • Implementing Diagnostic Classification Modeling in Language Assessment: A Cognitive Model of Second Language  Reading Comprehension ‐ Tugba Elif Toprak, Gazi University
  • Introduction to Truncated Logistic Item Response Theory Model ‐ Jaehwa Choi, The George Washington University
  • Misconceptions in Middle‐Grades Statistics: Preliminary Findings From a Diagnostic Assessment ‐ Jessica Masters, Research Matters, LLC; Lisa Famularo, Measured Progress
  • Using Verbal Protocol Analysis to Detect Test Bias ‐ Megan E. Welsh, University of California ‐ Davis; Sandra M. Chafouleas, University of Connecticut; Gregory Fabiano; T. Chris Riley‐Tillman, University of Missouri ‐ Columbia

Using Multiple Learning Progressions to Support Assessment (April 10, 2:45)

Tags

This is an open thread for comments on the upcoming AERA symposium.  Feel free to share thoughts and discuss the papers below.

Sun, April 10, 2:45 to 4:15pm,
Marriott Marquis, Level Four, Independence Salon G
Title: Using Multiple Learning Progressions to Support Assessment

Abstract: To use learning progressions to support students’ conceptual and linguistic development, teachers must elicit samples of student reasoning and discourse. This requires a change in pedagogical approaches, from the traditional teacher‐fronted direct instruction to the facilitation of meaning‐making discourse and student reasoning. Both
educators and students need resources  to enact these changes, and it is critical that these resources support the full inclusion of English learners. This paper discusses an NSF‐funded project to develop and pilot such resources. Teachers implementing this discourse‐focused instruction reported that it provided them frequent opportunities to track students’ conceptual understanding and to assist them, in the moment, to probe more deeply and to model the academic language that students needed to express themselves effectively.

Papers:

  • Using a Proportional Reasoning Learning Progression to Develop, Score, and Interpret Student explanations –E. Caroline Wylie, ETS; Malcolm Bauer, ETS
  • Tandem Learning Progressions Provide a Salient Intersection of Student Mathematics and Language Abilities ‐ Alison L. Bailey, University of California ‐ Los Angeles; Margaret University of Minnesota
  • Simultaneous Assessment of Two Learning Progressions for Mathematical Practices ‐ Gabrielle Cayton; Leslie Nabors Olah, Educational Testing Service; Sarah Ohls, ETS; Allyson J. Kiss, University of Minnesota
  • Professional Development to Support Formative Assessment of Mathematics Constructs and Language in Mathematics Classrooms ‐ Christy Reveles, WIDA at Wisconsin Center for Educational Research; Rita MacDonald, University of Wisconsin ‐ Madison

Chair: E. Caroline Wylie, ETS
Discussant : Guadalupe Carmona, The University of Texas at San Antonio

 

Cognition & Assessment SIG Panel (April 8, 2:15)

Tags

This is an open comment thread for the Panel session.  Feel free to continue the panel discussion off-line.

Sat, April 9, 2:15 to 3:45pm,
Marriott Marquis, Level Four, Independence Salon F
Title: Principled Assessment Design in Action: Best Professional Practices for Digitally
Delivered Learning and Assessment Systems

Abstract
In this moderated panel session, interdisciplinary experts with both technical and practical expertise discuss best practices for putting a principled design approach for digitally‐delivered learning and assessment systems into practice. They will debate critical issues around successful professional practices within their teams and institutions in
order to develop and nurture coherent ways of acting, reflecting, and planning. That is, they will discuss their “lessons learned”, both successful and not‐so‐successful. Panelists will also provide take‐home handouts with key principles for further dissemination.

Panelists and topics:

  • Process and Product Data Capture ‐ Tiago Calico, University of Maryland ‐ College Park
  • Automated Writing Diagnostics and Scoring ‐ Peter W. Foltz, Pearson
  • Diagnostics for Digital Learning Environments ‐ Janice D. Gobert, Worcester Polytechnic Services
  • Assessment of Professional Skills ‐ Vandhana Mehta, Cisco Systems Inc
  • Research Capacities for Technology‐Rich Assessment ‐ Andreas H. Oranje, Educational Testing Services
  • Stealth Assessment ‐ Valerie J. Shute, Florida State University
  • Computational Psychometrics ‐ Alina A. Von Davier, ETS

Chair: Andre A. Rupp, Educational Testing Service (ETS)

 

Cognitive Models for Assessment (April 9, 10:35)

Tags

This is an open comments section for the upcoming session at AERA.  Feel free to comment and discuss what was learned at the session.

Sat, April 9, 10:35am to 12:05pm,
Marriott Marquis, Level Four, Independence Salon F
Title: Cognitive Models for Assessment

Abstract
Cognitive modeling has been actively used to understand human cognition in a wide range of educational research.  However, application of cognitive modeling to educational assessment does not have an extensive history. With a growing emphasis on problem solving skills and inquiry practices, interactive games and simulations are becoming a
common tool for educational assessments. Compared with traditional models, cognitive modeling offers enhanced capabilities to understand complex process data from game/simulation‐based assessments at lower levels of grain size. This symposium will present a few examples of how cognitive modeling is being used to understand and assess cognitive processes in various game/simulation‐based assessments. The benefits and limitations of cognitive modeling approach and the implications to future assessment research will be also discussed.

Papers:

  • Modeling Science Inquiry in an Interactive Simulation Task ‐ Jung Aa Moon, Educational Testing Service
  • Extending the Aditive Factors Model to Assess Student Learning Rates ‐ Ran Liu, The University of Pennsylvania; Kenneth R. Koedinger, Carnegie Mellon University
  • Evaluating the Efficacy of Real‐Time Scaffolding for Data Interpretation Skills ‐ Raha Moussavi, Worcester Polytechnic Institute; Michael A. Sao Pedro, Worcester Polytechnic Institute & Apprendis LLC; Janice D. Gobert, Worcester Polytechnic Institute
  • What Are Mental Models of Electronic Circuits? Basing an Assessment on Computational Simulations of Experts ‐ Kurt VanLehn, Arizona State University ‐ Tempe
  • From Artificial Intelligence to Intelligent Assessment ‐ Michelle LaMar, Educational Testing Service

Chairs: Jung Aa Moon, Educational Testing Service; Michelle LaMar, Educational Testing Service
Discussant : Irvin R. Katz, Educational Testing Service

Crowdsourcing an ECD database

In the response to the recent AERA call for key findings and results, I wrote a short peice on evidence-centered assessment design (ECD) as a key technology.  I thought you might be interested, so here is a copy:

http://ecd.ralmond.net/ECDFindings3.rtf

You may also want to look at the bibliography separately.  Here it is in bibtex format:

http://ecd.ralmond.net/ECDBib.bib

As usual, I was doing this at the last minute.  So desipite some feeback from Bob Mislevy and Val Shute, I’m sure I missed somebody important or some key reference.  Please use the comment section to tell me who I and what I missed.

On a related note, I’m trying to get the page of ECD projects at

http://ecd.ralmond.net/ecdwiki/ECD/Projects

up to date.  Email me if you want the editing password.