A speech processing application for the Huberman-Hogg neural network model
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A typical word recognition system requires that several major tasks be performed; necessary components include (1) a preprocessor to extract the significant information from the speech time waveform, (2) a section which stores the training set of word models or templates and then compares an unknown input pattern with the training set, and (3) decision logic to determine the best matching word. This thesis reports on experiments that explore isolated word recognition with an artificial neural network based on the Huberman- Hogg (H-H) model. The results presented in this manuscript were developed from computer simulations of the speech recognition system, but an electro-optical H-H system is also proposed and described.
The principal goal of the experimental work is to test the suitability of the ambiguity function representation in preprocessing speech data. Employing the ambiguity function for the speech signal representation was expected to provide two advantages: the input patterns to the H-H network should become less sensitive to time shifts of the total speech waveform, perhaps even making time alignment of the words unnecessary; the ambiguity function of a signal can be obtained in real time with a coherent optical processor, as shown by Marks, Walkup, and Krile (1977), to provide two-dimensional input to an electro-optical H-H network. Since studies indicate that the H-H neural network effectively processes a variety of input functions, this network was chosen as a classifier/recognizer for the ambiguity function patterns representing speech data. Ambiguity functions for isolated words are generated from digitized voice recordings and then submitted to the H-H network for training and recognition testing. Which pattern of the training set best matches the unknown pattern is a decision clearly dependent on the distance metric employed and these experiments explore use of several similarity measures.
Following an introductory discussion including an overview of speech processing, the radar ambiguity function, and the Huberman-Hogg neural network model, is a description of the experimental arrangement. Both the components of tiie system and the simulation software are treated. The next section gives the particulars of the various experimental conditions and results. It was found that the ambiguity function performed as desired, acting as a representation that allows the system to become less shift sensitive; however, the neural network processing, at least with the parameter set and decision logic employed, did not yield any increase in the recognition capabilities of the system. Several potential problem areas are identified and suggestions are made for future studies.