Automatic pathological speech intelligibility measures are crucial to assist the clinical diagnosis and treatment of speech disorders. The recently proposed pathological short-time objective intelligibility (P-ESTOI) measure was shown to be very advantageous, yielding a high performance for several speech pathologies. However, to assess the intelligibility of an utterance from a patient, P-ESTOI relies on the availability of recordings of the same utterance by several healthy speakers such that an intelligible reference model can be created. Such recordings are not always easily available, limiting the practical applicability of P-ESTOI. To be able to use P-ESTOI in such scenarios, in this paper we propose to use synthetic speech generated by state-of-the-art high-quality text-to-speech systems to create an intelligible reference model. Experimental results on a database of Cerebral Palsy patients show that the performance of P-ESTOI using synthetic speech references is comparable to using natural speech references, making P-ESTOI a flexible measure which does not require healthy speech recordings and which outperforms state-of-the-art pathological speech intelligibility measures.


Links:      DOI   SLIDES