Homepage | Research | Publications

Download Full paper.
Beginners to "microprocessor based speech recognition" click here

A Microprocessor based Speech Recognizer for Isolated Hindi Digits

Ashutosh Saxena, Third year undergraduate, Department of Electrical Engineering, Indian Institute of Technology Kanpur.
Abhishek Singh, Final year undergraduate, Department of Electrical Engineering, Indian Institute of Technology Kanpur.

(As submitted to IEEE India Student Paper Contest 2002. Project may have been improved by now). (The paper has won first prize in MV Chauhan Student Paper Contest, IEEE India Council. The paper was presented in Annual Convention and Exhibition, IEEE ACE 2002, IEEE India Council, held in Kolkata in December 2002. The gadget was presented in All India Inter-collegiate technical festival, Techkriti 2002, IIT Kanpur.)

Abstract--A novel method for recognition of isolated spoken words on an 8-bit microprocessor is presented. The method uses a new but simple feature vector based on the zero-crossings of the speech signal. The feature vector is the histogram of the time-interval between successive zero-crossings of the speech signal. Dynamic time warping is used to calculate a time-aligned normalized distance between the feature vector and the reference templates.
The implementation needs only 1-bit A/D conversion and performs all its computations in integer arithmetic. A speaker-dependent recognition accuracy of 85% is obtained for Hindi digits spoken by 2 male speakers.

I. INTRODUCTION
Speech being a natural mode of communication for humans can provide a convenient interface to control devices. Some of the speech recognition applications require speaker-dependent isolated word recognition. Current implementations of speech recognizers have been done for personal computers and digital signal processors. However, some applications, which require a low-cost portable speech interface, cannot use a personal computer or digital signal processor based implementation on account of cost, portability and scalability. For instance, the control of a wheelchair by spoken commands or a speech interface for Simputer [1].
The implementation of a speech recognizer on a fixed-point microprocessor could provide a possible solution to such applications. Standard algorithms based on hidden markov models (HMM) and artificial neural networks (ANN) cannot be used be on a fixed-point microprocessor because these algorithms require computation which cannot be done in real-time on an 8-bit microprocessor. Hence, there is a need for a simpler algorithm.
We present a novel algorithm that uses only integer arithmetic and, hence, can be efficiently implemented on a fixed-point microprocessor in real-time. The algorithm is used for performing speaker-dependent recognition of isolated Hindi digits.
Speech recognition algorithms employ a short time feature vector to take care of the non-stationary nature of the speech signal. Standard feature vectors Mel frequency cepstrum coefficient (MFCC) or linear prediction coefficient (LPC) are computationally intensive. We designed a new but simple feature vector that uses only the zero crossings of the speech signal. The novel feature extraction requires 1-bit A/D conversion because it processes only zero crossings. The feature extraction is computationally very simple. It does not require any pre-processing of the speech signal. The feature vector preserves all information regarding the duration of the time intervals.
The short time feature vector is the histogram of time-interval between successive zero crossings of the speech utterance in a short time window. The feature vectors, extracted for each window in the speech utterance, are combined to form a feature matrix. The matrix is then normalized by multiplication with a weight vector. Dynamic time warping (DTW) [3] is used to calculate the distance between the feature matrix of the input signal and the reference patterns. DTW finds a best time-aligned path for minimum distance under certain specified constraints [3,4]. The pattern corresponding to the minimum distance is then identified to be the unknown input signal if the distance is less than a predetermined threshold.
The algorithm was implemented on Matlab to perform speaker dependent isolated word recognition on a vocabulary of 10 isolated Hindi digits. Simulation of the algorithm with recorded utterances gave an accuracy of 95.5%.
The algorithm was then implemented on Intel-8085, an 8-bit microprocessor. The details of the design, coding, memory and interfacing are given. With microprocessor running at a clock of 1.5 MHz, the average response time of the microprocessor is 0.9 second, with a worst-case response time of 2 seconds.

MATRIX

REFERENCES
[1] The Simputer Trust. Simputer™: What is a Simputer? [Online] Available: http://www.simputer.org/simputer/about
[2] Lipovac and V. Sarajevo, "Zero-crossing-based linear prediction for speech recognition", Electronics Letters, pages 9092, vol. 25 Issue 2,19 Jan 1989.
[3] H. Sakoe and S. Chiba, "Dynamic Programming Optimization for Spoken Word Recognition", IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-26(1):43-49, February 1978.
[4] Lawrence Rabiner, and Biing-Hwang Juang, "Fundamentals of Speech Recognition", PTR Prentice Hall, Englewood Cliffs, New Jersey 07632, 1993.