What did the researchers want to know?
What is the potential influence of adjudicators on performance ratings at a live large ensemble festival?
What did the researchers do?
Springer and Bradley (2018) collected evaluation forms from a concert band festival in the Pacific Northwest U.S. Each of the 31 middle school/junior high school bands performed three pieces and were rated by three expert judges on a scale of 5 (superior) to 1 (poor). Judges were also allowed to award “half points,” and they rated each group on eight criteria: tone quality, intonation, rhythm, balance/blend, technique, interpretation/musicianship, articulation, an “other performance factors” (such as appearance, posture, and general conduct). The researchers analyzed the data through a complex process called the Many-Facets Rasch Model.
What did the researchers find?
The use of half-points resulted in less clear/precise measurement than if half-points had not been allowed. All but one of the performance criteria “did not effectively distinguish among the highest-performing ensembles or the lowest-performing ensembles” (p. 385), which could indicate a halo effect–when judgements of certain criteria positively or negatively influence judgements of other criteria. Examination of judge severity revealed that one judge was more severe in their ratings than the other two, though all three more heavily utilized the higher end of the rating scale, indicating “leniency or generosity error” (p. 386). Finally, numerous instances in which some bands were rated unexpectedly higher or lower by one judge than the other two suggests “evidence of bias” (p. 386).
What does this mean for my classroom?
Adjudication training and calibration—ensuring judges rate in similar manners—is critical. Adjudicator training for the band festival studied by Springer and Bradley involved only a 30-minute session in which the adjudicator instructions and evaluation form were discussed and adjudicators were allowed to ask questions. A more in-depth and ongoing adjudicator training process may help improve the validity and reliability of ratings given. For example, Springer and Bradley suggest that adjudicators might participate in an “anchoring technique”—a process in which judges rate sample recordings and then discuss the specific “aural qualities necessary for rating each performance criterion on the scales provided on the evaluation form” (p. 389). Festival coordinators might also attempt to hire adjudicators from other geographic regions in order to reduce bias due to prior familiarity with bands or directors.