RTRL.72: Effects of Clothing on Performance Evaluation (Urbaniak & Mitchell, 2022)

Source:

Urbaniak, O., & Mitchell, H. F. (2022). How to dress to impress: The effect of concert dress type on perception of female classical pianists. Psychology of Music, 50(2), 422-438.

What did the researchers want to know?

Does clothing influence judges’ evaluations of female pianists’ performances?

What did the researchers do?

Urbaniak and Mitchell (2022) recorded four female undergraduate pianists, each of whom gave nine performances of three musical works. Each pianist wore three black outfits: a long dress, a short dress, and a suit. (Each pianist performed each of the three pieces in each of the three outfits.) A total of 45 excerpts were prepared—9 with only audio and 36 with audio and video.

Study participants were 30 graduate and undergraduate students, 20 of whom were pianists and 10 who were other classical instrumentalists. Each participant met with the researcher individually and was asked to evaluate the excerpts as if they were adjudicators in a piano competition. Participants rated each excerpt on technical proficiency, musicality, appropriateness of dress, and overall performance.

After this, each participant was informally interviewed about their spontaneous observations. At the end of the interview, the researchers revealed the true purpose of the study and asked participants to reflect on potential unconscious biases.

Urbaniak and Mitchell used statistical analyses (three-way repeated measures ANOVA with two-way interactions) to examine the effects of dress, performer, and musical tasks. They also thematically analyzed the interview transcripts.

What did the researchers find?

Results indicated significant differences in ratings for appropriateness of dress, with the long dress being rated highest and short dress being rated the lowest. The researchers also found a significant effect of dress on overall performance rating, with performers in the long dress receiving the highest average performance ratings and performers in the short dress receiving the lowest average performance ratings. Urbaniak and Mitchell also found the same result for technical proficiency and musicality; in both cases, performers in the long dress were rated highest while performers in the short dress were rated lowest.

Post-evaluation interview findings revealed that most participants “were ashamed to find out that they had been unconsciously judging on dress and there was an element of shock which prompted introspection” (p. 433). One participant reflected, “My eyes tricked me into thinking I’m hearing things,” and another said, “Some of them sounded really different to me. […] How we look actually influences how we hear stuff” (p. 433)!

What does this mean for my classroom?

One interpretation of these research findings is that music educators and students should be conscious of their visual appearance when performing, including their clothing, hairstyle, and mannerisms. However, another interpretation is that unconscious bias is real and problematic. Rather than simply expecting performers to conform to the biases of the raters/judges, it is the responsibility of those in a position to evaluate others to examine their own potential unconscious biases and work to deconstruct them. The findings of this research indicate that unconscious bias toward women exists, as does policing of their dress, and this needs to be uncovered and eradicated. Other researchers have found that performance evaluation can be influenced by a performer’s race, gender, body size, attire (formal vs. casual), and stage deportment (e.g., engaged vs. disengaged facial expression, proper vs. improper body alignment, focused vs. wandering eye contact, etc.) (Davidson & Edgar, 2003; Elliott, 1995; Howard, 2012; VanWeelden, 2002). Evaluation forms/processes should be made as objective as possible in order to reduce the influence of bias, and evaluators should reflect on their own potential biases in order to move toward as fair and equitable a process as possible.

References

  • Davidson, J. W., & Edgar, R. (2003). Gender and race bias in the judgement of Western art music performance. Music Education Research5(2), 169–181. https://doi.org/10 .1080/1461380032000085540
  • Elliott, C. A. (1995). Race and gender as factors in judgments of musical performance. Bulletin of the Council for Research in Music Education127, 50–56. https://www.jstor.org/ stable/40318766
  • Howard, S. A. (2012). The effect of selected nonmusical factors on adjudicators’ ratings of high school solo vocal performances. Journal of Research in Music Education60(2), 166–185. https://doi.org/10.1177/0022429412444610
  • VanWeelden, K. (2002). Relationships between perceptions of conducting effectiveness and ensemble performance. Journal of Research in Music Education50(2), 165–176. https:// doi.org/10.2307/3345820

RTRL.06: “Investigating Adjudicator Bias in Concert Band Evaluations” (Springer & Bradley, 2018)

Source:

Springer, D. G., & Bradley, K. D. (2018). Investigating adjudicator bias in concert band evaluations: An applications of the Many-Facets Rasch Model. Musicae Scientiae, 22(3), 377-393.

What did the researchers want to know?

What is the potential influence of adjudicators on performance ratings at a live large ensemble festival?

What did the researchers do?

Springer and Bradley (2018) collected evaluation forms from a concert band festival in the Pacific Northwest U.S. Each of the 31 middle school/junior high school bands performed three pieces and were rated by three expert judges on a scale of 5 (superior) to 1 (poor). Judges were also allowed to award “half points,” and they rated each group on eight criteria: tone quality, intonation, rhythm, balance/blend, technique, interpretation/musicianship, articulation, an “other performance factors” (such as appearance, posture, and general conduct). The researchers analyzed the data through a complex process called the Many-Facets Rasch Model.

What did the researchers find?

The use of half-points resulted in less clear/precise measurement than if half-points had not been allowed. All but one of the performance criteria “did not effectively distinguish among the highest-performing ensembles or the lowest-performing ensembles” (p. 385), which could indicate a halo effect–when judgements of certain criteria positively or negatively influence judgements of other criteria. Examination of judge severity revealed that one judge was more severe in their ratings than the other two, though all three more heavily utilized the higher end of the rating scale, indicating “leniency or generosity error” (p. 386). Finally, numerous instances in which some bands were rated unexpectedly higher or lower by one judge than the other two suggests “evidence of bias” (p. 386).

What does this mean for my classroom?

Adjudication training and calibration—ensuring judges rate in similar manners—is critical. Adjudicator training for the band festival studied by Springer and Bradley involved only a 30-minute session in which the adjudicator instructions and evaluation form were discussed and adjudicators were allowed to ask questions. A more in-depth and ongoing adjudicator training process may help improve the validity and reliability of ratings given. For example, Springer and Bradley suggest that adjudicators might participate in an “anchoring technique”—a process in which judges rate sample recordings and then discuss the specific “aural qualities necessary for rating each performance criterion on the scales provided on the evaluation form” (p. 389).  Festival coordinators might also attempt to hire adjudicators from other geographic regions in order to reduce bias due to prior familiarity with bands or directors.