The reliability of the ITU-P.85 standard for the evaluation of text-to-speech systems

Vazquez-Alvarez, Y. and Huckvale, M. (2002) The reliability of the ITU-P.85 standard for the evaluation of text-to-speech systems. In: Interspeech 2002, Denver, CO, USA, 16-20 Sep 2002, pp. 329-332.

Full text not currently available from Enlighten.

Publisher's URL: http://www.informatik.uni-trier.de/~ley/db/conf/interspeech/interspeech2002.html

Abstract

An evaluation of the reliability of the ITU-T P.85 recommended standard for the evaluation of voice output systems was conducted using six English TTS systems. The P.85 standard is based on mean-opinion-score judgements of a listening panel on a number of rating scales. The study looked at how the ranking of the six systems on the scales varied across four different text genres and across two listening sessions. Rankings were also compared with a much simpler pair-comparison test across genres and listening sessions. For the ITU test a large degree of correlation was found across scales, implying that these were not really testing different aspects of the systems. There were surprisingly similar results across sessions, implying that listeners were indeed making real judgements. In comparison, the pair comparison test gave (almost) identical rankings for systems with far less variability, making statistically significant comparisons between systems possible, even across genres.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Vazquez-Alvarez, Dr Yolanda
Authors: Vazquez-Alvarez, Y., and Huckvale, M.
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record