Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

Can the Internet grade math? Crowdsourcing a complex scoring task and picking the optimal crowd size

Publication Date

September, 2011

Publication Type

Tech Report


Nathan Van Houdnos


This paper presents crowdsourcing as a novel approach to reducing the grading burden of constructed response assessments. We find that the average rating of 10 workers from a commercial crowdsourcing service can grade student work cheaply ($0.35 per student response) and reliably (Pearson correlation with the teacher’s gold-standard scores, ρ = 0.86 ± .04 for Grade 3 to ρ = .72 ± .07 for Grade 8). The specific context of our proof-of-concept dataset is 3rd-8th grade constructed response math questions. A secondary contribution of the paper is the development of a novel subsampling procedure, which allows a large data-collection experiment to be split into many smaller pseudo-experiments in such a way as to respect within-worker and between-worker variance. The subsampling procedure plays a key role in our calculation that the average of 10 workers’ scores suffices to produce reliable scores.