Abstract:
Mainstream deep learning-based dysarthric speech detection approaches typically rely on processing the magnitude spectrum of the short-time Fourier transform of input signals, while ignoring the
phase spectrum. Although considerable insight about the structure
of a signal can be obtained from the magnitude spectrum, the phase
spectrum also contains inherent structures which are not immediately apparent due to phase discontinuity. To reveal meaningful
phase structures, alternative phase representations such as the modified group delay (MGD) and instantaneous frequency (IF) spectra have been investigated in several applications. The objective of this
paper is to investigate the applicability of the unprocessed phase,
MGD, and IF spectra for dysarthric speech detection. Experimental
results show that dysarthric cues are present in all considered phase
representations. Further, it is shown that using phase representations
as complementary features to the magnitude spectrum is beneficial for deep learning-based dysarthric speech detection, with the
combination of magnitude and IF spectra yielding a high performance. The presented results should raise awareness in the research
community about the potential of the phase spectrum for dysarthric
speech detection and motivate research into novel architectures
which optimally exploit magnitude and phase information.
Type:
CONFERENCE PAPER