2019

Plenary Speech

Alistair Van Moere

How should we interpret score fluctuations in repeated test-taking?

In language assessment there is a tendency to interpret scores from single-administration tests as accurate indicators of student ability. In other words: when students take a test, we trust that their scores can be taken at face value. But the reality is that test scores are associated with uncertainty, and if a student were to take the same test again (or if they took a different form of the same test again) just one or two weeks later, their score would likely be different. That is, their new test score could be lower or higher than their original test score, even though their English proficiency has not changed.

There can be various reasons for score fluctuations, such as measurement error in the test, the student’s motivation, or test conditions. This poses a problem to the validity of test scores in many different contexts.

For example, in high-stakes exams such as university entrance or immigration tests, students with financial resources can (unfairly) take expensive international exams in test centers every month until they get a high enough score. Similarly, in formative testing contexts where we would like to track a student’s progress or score gains every few months, it can be problematic for a teacher to explain why a student’s standardized test scores dropped even though their English proficiency should have increased.

In this presentation I will present data from numerous contexts: university speaking and writing tests, large-scale automatically scored tests, and PISA exams. I will outline the causes of test score fluctuation and how researchers quantify it, as well as discuss the consequences and social impact of test score fluctuations. Finally, I propose how researchers can mitigate these effects in the reporting of test scores, and in statistical techniques for interpreting longitudinal data over many test administrations.

Dr Alistair Van Moere is Chief Product Officer at MetaMetrics Inc, where he drives innovation and helps organizations make sense of test measurement. Previously Alistair was President of Pearson’s Knowledge Technologies group and managed artificial intelligence scoring in speaking and writing for tens of millions of learners. He has worked as a teacher, examiner, director of studies, university lecturer, and test developer, in the US, UK, Japan, and Thailand. Alistair’s PhD won the Jacqueline Ross TOEFL award for best dissertation in language testing; he has an MBA, and has authored over 20 research publications on assessment and educational technology.