Voice input for decision support systems: the use of multiple discriminant analysis for word recognition
MetadataShow full item record
A Decision Support System (DSS) is characterized to have flexibility, ease of use, interactive capability and the capacity to support managerial decision making in ill-structured situations. The infrastructure of a DSS has been viewed to consist of a database, a modelbase, a user interface, and perhaps, a knowledgebase. Most DSS research has been directed towards the modules of database, modelbase, and knowledgebase. The work relevant to the user interface is limited. There is conclusive evidence, showing that within a problem-solving context, voice interaction is superior to other modes in terms of speed and task efficiency. Since speech recognition is an emerging field only few commercial systems are available currently. About 5% of the recognizers sold so far are still in use. Two major problems are: i) unpredictable performance in terms of recognition accuracy ii) inexpensive systems to compromise on algorithms. This study explores the possibility of a reliable voice input module for a DSS. Specifically, Multiple Discriminant Analysis (MDA), is used in modeling a speaker-trained, isolated word recognition environment. A design framework for MDA based recognizers is proposed. It provides details of alternatives available and guidelines for prototyping. Factors such as the training effort, the number of variables, estimation of covariance matrices, word population separations, computational requirements, ease of implementation in a DSS environment e t c , lead to the choice of a Linear Multiple Discriminant Analysis (LMDA) approach. This study compares the proposed LMDA model to the model based on Dynamic Time Wrapping (DTW) on performance criteria including accuracy, storage, and computational requirements. Part of the same Texas Instruments (TI) - database which was used in evaluating seven popular commercial recognizers was used to compare the substitution error and rejection error. Training size, and order of analysis were controlled and maintained across LMDA and DTW methods. The results validate the previous work with respect to training size, in that performance improved with up to 4 repetitions. With respect to substitution error the better performance of LMDA models is statistically validated. There was no statistically significant difference with respect to rejection error. The results indicate that the LMDA performance in reduced space peaks prior to reaching the full discriminant space. Inclusion of the last few discriminant functions tends to introduce distortion. It is recommended that the LMDA model should be operated in reduced space. The computational requirements of LMDA and DTW methods are compared using analysis of algorithms. Even in full discriminant space, the LMDA approach is superior to the DTW method, with respect to computational requirements. The LMDA approach for user-trained isolated word recognition problem, involves computationally higher training cost and reduced recognition cost. This study is limited to only LMDA based user-trained isolated word recognition systems. The vocabulary size was also small. This research can be extended to a large DSS vocabulary with various interfaces modes such as command-driven or menu-driven.