Valerie Shute

A few examples of hard‐to‐measure constructs that we’ve assessed in our own work lately include creativity (see Kim & Shute, 2015; Shute & Wang, 2016), problem solving (see Shute, Ventura, & Ke, 2015; Shute, Wang, Greiff, Zhao, & Moore, 2016), persistence (see Ventura & Shute, 2013), systems thinking (see Shute, Masduki, & Donmez, 2010), gaming‐the‐system (see Wang, Kim, & Shute, 2013), and design thinking (see Razzouk & Shute, 2012), among others. In this blog, I’d like to describe our approach used to measure creativity, specifically within the context of a digital game called Physics Playground. My premise is that good games, coupled with evidence‐based embedded assessment, show promise as a means to dynamically assess hard‐to‐measure constructs more accurately and much more engagingly than traditional approaches. This information, then, can be used to close-the-loop (i.e., use the estimates as the basis for providing targeted support).

Most of us would agree that creativity is a really valuable skill—in school, on the job, and throughout life. But it’s also particularly hard to measure for various reasons. For instance, there’s no clear and agreed‐upon definition, and it has psychological and statistical multidimensionality. The generality of the construct is also unclear (e.g., is there a single “creativity” variable, or is it solely dependent on the context). Finally, a common way to measure creativity is through self‐report measures, where data are easy to collect but unfortunately the measures are flawed. That is, self‐report measures are subject to “social desirability effects” that can lead to false reports the construct being assessed. In addition, people may interpret specific self‐report items differently leading to unreliability and lower validity.

To accomplish our goal of measuring creativity based on gameplay data, we followed the series of steps outlined and illustrated in Shute, Ke, and Wang (in press). These steps include: (1) Develop a competency model (CM) of targeted knowledge, skills, or other attributes based on full literature and expert reviews; (2) Determine which game (or learning environment) into which the stealth assessment will be embedded; (3) Compile a full list of relevant game-play actions/indicators that serve as evidence to inform the CM variables; (4) Create new tasks in the game, if necessary (Task model); (5) Create a Q-matrix to link actions/indicators to relevant facets of target competencies; (6)  Determine how to score indicators using classification into discrete categories which comprise the “scoring rules” part of the evidence model (EM); (7) Establish statistical relationships between each indicator and associated levels of competency variables (EM); (8) Pilot test Bayesian Networks (BNs) and modify parameters; (9) Validate the stealth assessment with external measures; then (10) Use the current estimates about a player’s competency states to provide adaptive learning support (e.g., targeted formative feedback, progressively harder levels relative to the player’s abilities, and so on).

In line with this process, the first thing we did once we decided to measure creativity was to conduct an extensive literature review. Based on the literature, we defined creativity as encompassing three main facets: fluency, flexibility, and originality. Fluency refers to the ability to produce a large number of ideas (also known as divergent thinking and brainstorming); flexibility is the ability to synthesize ideas from different domains or categories (i.e., the opposite of functional fixedness); and originality means that ideas are novel and relevant. There are other dispositional constructs that are an important part of creativity (i.e., openness to new experiences, willingness to take risks, and tolerance for ambiguity) but due to the nature of the game we were using as the vehicle for the assessment, we decided to focus on the cognitive skills of creativity. We shared the competency model with two well-known creativity experts, and revised the model accordingly.

Next, we brainstormed indicators (i.e., specific gameplay behaviors) that were associated with each of the main facets. For instance, flexibility is the opposite of functional fixedness and represents one’s ability to switch things up while solving a problem in the game. The game knows, per level, the simple machines (or “agents” as they’re called in the game) which are appropriate for a solution. So one flexibility indicator in the game would be the degree to which a player sticks with an inappropriate agent across solution attempts—which is reverse coded. We amassed all variables (creativity node and related facets) and associated indicators in an augmented Q-matrix, which additional contained all discrimination and difficulty parameters for each indicator in each level (see Almond, 2010; and Shute & Wang, 2016). In the basic format of the Q-matrix, the rows represent the indicators relevant for problem solving in each level and the columns represent the main facets of creativity. If an indicator is relevant to a skill, the value of the cell is “1” otherwise it is “0.” Translating the Q-matrix into Bayes nets (our statistical machinery to accumulate evidence across gameplay) involves using Almond’s CPTtools (

Based on the experiences to date in designing valid assessments of hard-to-measure constructs in game environments, I feel that it’s best to bring together educators, game designers, and assessment experts to work together from the onset. This type of diverse team is a critical part of creating an effective learning ecosystem. Having a shared understanding of educational and gaming goals is key to moving forward with the design of engaging, educational games. The next step in this process (which I’m trying, via multiple research proposals, to get funded) after establishing the validity of creativity and its particular facets—is how can we support its development?

Shute, V., & Wang, L. (2016). Assessing and supporting hard-to-measure constructs in video games. In A. A. Rupp and J. P. Leighton (Eds.), The Handbook on cognition and assessment: Frameworks, methodologies, & applications (pp. 535-563). West Sussex, UK: Wiley.