Research and Publications

Often people use technology in an attempt to enhance education, but they seldom directly assess the system they put in place. Instead, they simply have faith that the logic behind the system is sound, and that the specific implementation is appropriately designed. peerScholar is an exception in this regard. For its very first use, Joordens and Pare have conducted research to ensure the system is fair, that students are satisfied with it, and that the pedagogical principles behind it are sound. Some of this research is now published in peer-reviewed educational journals; much of it has been presented at conferences on teaching and learning—both national and international; and other papers are currently in press or submitted for publication. The remainder of this section briefly describes that work with links, where appropriate, to papers or posters.

The first research project on peerScholar, now published in the Journal of Computer Assisted Learning (Pare & Joordens, 2008), was an assessment of the fairness of the grades that students received in peerScholar. An important point here is that while undergraduates are clearly just learning to think and write, and while their individual marking abilities are clearly not as good as those of graduate-level teaching assistants, the mark that a student receives in peerScholar is the average of six peer grades. Averages remove variability and, in so doing, enjoy increases in reliability. The research project directly compared the reliability of this average peer mark to the reliability of graduate-level TAs and found them to not differ. That is, the average peer mark correlated as well with the graduate-level mark as did another graduate-level marker.

Subsequent research examined this validity index as a function of how many assignments students are asked to grade (Pare & Joordens, 2009a). One can imagine a trade-off at play. As students are asked to mark more assignments, each composition ends up being graded by more students, and the average should contain even less noise, all else being equal. However, if students are asked to mark too many compositions, their attention and effort may wane and more noise might enter into the marks. Thus, one might imagine a "sweet spot"t reflecting the trade-off between statistical properties and human performance. In assignments like the ones peerScholar uses, that sweet spot seems to be at the six-marker level. Validity measures are highest—sometimes even higher than the agreement between experts—when each student is asked to grade six compositions, meaning their composition is, in turn, graded by six peers.

All of this research suggests that, on average, the grades acquired within peerScholar are as valid as those provided by expert markers. However, that does not preclude the possibility that some compositions could end up being marked by, say, six particularly tough or uninformed markers. It also does not preclude the possibility that students may feel their grade is inappropriate, whether it is or isn't, and feel unfairly treated by their peers. To address this issue, a re-mark option was added to the peerScholar process (Pare & Joordens, 2009). Specifically, students were told the following: "If you have considered the average grade given to you by your peers and you truly feel it does not reflect the quality of your work, then you may ask to have your composition re-graded by a graduate-level teaching assistant. The teaching assistant will re-grade your composition without reference to the original marks or comments, and your final mark on the assignment will be whatever the teaching assistant assigns, for better or worse." With a class of 1500 students there was some concern about the take-up rate for this offer, but it has since been repeatedly shown that the take-up rate is reliably in the 2% to 4% range. Thus, with 1500 students, 30 to 60 compositions might require a re-mark, a small price to pay to ensure quality grades and student satisfaction.

Also of interest, the analyses of the re-mark data suggests that that average change in mark is zero, once again verifying the validity of the grades on average. However, some grades do go up, and sometimes by a substantial degree. Others drop, suggesting that some students actual had better grades than they deserved, but nonetheless felt unfairly graded. It is fair to say that those students have received a strong feedback signal that their writing needs work.

Most of the research described above targets issues of fairness and student satisfaction with the system, but perhaps the most interesting research findings focus on assessing the pedagogical power of the peer-assessment experience. To examine this, Pare & Joordens (2009b) performed an experiment in which participants were asked to grade their own written piece as they submitted it during the writing phase (Phase 1). They were then asked to grade their composition again after completing their assessments in the evaluating phase (Phase 2). Relative to their final mark on the assignment, their self-marks became significantly more accurate after just one exposure to the peer-assessment process. Thus, the discrimination process that lies at the heart of this innovation does indeed enhance learning.

Taken together, this research validates the peerScholar system by showing that (a) it is fair to students in terms of the grades they receive, (b) a re-mark option can be added to further enhance student satisfaction without a huge cost, and (c) the system does indeed enhance learning. It is worth emphasizing that the phase where this learning enhancement occurs, the reflecting phase (Phase 3), is a phase that students do not experience in typical written assignments given that "experts" do the marking. It is for this reason that peerScholar represents not only a way to have written assignments in any size class but, rather, a better way to do so than the traditional approach.