Why Do We Analyze Process Data?

Advances in computer technologies have facilitated the convenient collection of process data, such as user click-stream data, in computer-based interactive environments. By analyzing process data in addition to product data (e.g., explicit responses to test or survey questions), we can potentially parse out a lot of extra information and uncover patterns that would be missed otherwise. We list a few ways that process data can further our understanding of users’ traits and behaviors and to improve our predictions.

Collecting Collateral Information for Modeling and Predictions

It is often reasonable to assume that the sequence of events an individual engaged in is correlated with the final outcome. For example, process data on problem-solving questions may reflect the specific strategies an individual takes, and those who use more effective strategies may have higher chances of solving the problems correctly. In this case, process data can be used as predictors to make better predictions about final outcomes.

Process data can also be a reflection of the unobserved latent trait(s) of interest. For example, suppose we are interested in measuring individuals’ latent problem-solving ability. Respondents with higher general problem-solving ability may not only be more likely to provide a final correct answer, but also be more likely to engage in specific behaviors in the process of solving the problem. The process data can thus be used in conjunction with the product data to measure the latent traits, and the measurement accuracy may be higher than that from using the product data alone.

Understanding Cognitive Processes underlying Responses and Decisions

Scientists and practitioners are often interested in the “Whys” behind human behavior — that is, the underlying cognitive processes which lead to the final decision or response. Taking the PISA complex problem-solving items as an example, the steps that a student takes to identify the problem, plan and carry out solutions, and re-adjust strategies based on real-time progress is just as important as the final outcome. By understand these processes, we can identify patterns of behaviors that lead to successful attempts, as well as the differences in problem-solving strategies between those who can successfully solve the problems and those who cannot. This information can then be used by educators for teaching how to solve complex problems effectively. It is very difficult to infer about underlying cognitive processes based on the product data themselves (e.g., responses to test questions). However, the process data, consisting of the full sequence of interactions between human and computer, can bring a lot of possibilities.