A hierarchical approach to the automatic identification of Putonghua unvoiced consonants in isolated syllables

by Yeung, Dit-yan

Abstract (Summary)
Abstract of thesis entitled "A Hierarchical Approach to the Automatic Identification of Putonghua Unvoiced Consonants in Isolated Syllables" submitted by Yeung Dit Yan for the degree of M.Phil. at the University of Hong Kong in July 1985.

In this thesis, the specification of a hypothetical Putonghua speech recognition system is described in the framework of distributed processing. The subpatterns of Putonghua speech waveforms are recognized using an integrated approach of both phoneme-based and syllable-based methods. The suitability of each of these two methods depends on the module concerned and the classification domain of it. Emphasis of this study was put on the Unvoiced Consonant Identification (UCI) Module, in which the. phoneme-based recognition was found to be applicable in identifying the unvoiced consonants, using only the consonant portion of a Putonghua syllable. Among several feature representation methods studied, the dynamic represehtation of integrated temporal and log power spectral features was found to be the best. A reduced set of features was selected by carrying out an automatic stepwise discriminant analysis during the training phase of the clas-



sifier. Using the feature vector after dimensionality reduction, each testing case was then classified into a fuzzy consonant group with a certain membership function value. Context-free recognition experiments were performed to identify the unvoiced consonants with unknown following vowel context. The average recognition rates for close and open testing experiments are 94% and 89% respectively. When context-dependent recognition experiments were performed with known following vowel context, the average recognition rate for close testing increased to 96% even with less features. The satisfactory results of context-free recognition show that there is no need for the UCI Module to be totally dependent on the performance of the Vowel Identification (VI) Module, while those of context-dependent recognition show that the context information supplied by the VI Module does contribute to the improvement of the UCI Module. These results lead us to believe that the VCI Module can be made relatively independent and that information exchange between the VCI and VI Modules is beneficial to the overall improvement in recognition. The distance measure used in the classification was also used to derive the ~embership function which helps in developing the protocol of the system control strategy for approximate reasoning between modules.

