A Microprocessor based Speech Recognizer for Isolated Hindi Digits
Ashutosh Saxena, Third year undergraduate, Department of Electrical Engineering,
Indian Institute of Technology Kanpur.
Abhishek Singh, Final year undergraduate, Department of Electrical Engineering,
Indian Institute of Technology Kanpur.
(As submitted to IEEE India Student Paper Contest 2002. Project may have been improved by now). (The paper has won first prize in MV Chauhan Student Paper Contest, IEEE India Council. The paper was presented in Annual Convention and Exhibition, IEEE ACE 2002, IEEE India Council, held in Kolkata in December 2002. The gadget was presented in All India Inter-collegiate technical festival, Techkriti 2002, IIT Kanpur.)
Abstract--A novel method for recognition of isolated spoken words on
an 8-bit microprocessor is presented. The method uses a new but simple feature
vector based on the zero-crossings of the speech signal. The feature vector
is the histogram of the time-interval between successive zero-crossings of the
speech signal. Dynamic time warping is used to calculate a time-aligned normalized
distance between the feature vector and the reference templates.
The implementation needs only 1-bit A/D conversion and performs all its computations
in integer arithmetic. A speaker-dependent recognition accuracy of 85% is obtained
for Hindi digits spoken by 2 male speakers.
I. INTRODUCTION
Speech being a natural mode of communication for humans can provide a convenient
interface to control devices. Some of the speech recognition applications require
speaker-dependent isolated word recognition. Current implementations of speech
recognizers have been done for personal computers and digital signal processors.
However, some applications, which require a low-cost portable speech interface,
cannot use a personal computer or digital signal processor based implementation
on account of cost, portability and scalability. For instance, the control of
a wheelchair by spoken commands or a speech interface for Simputer [1].
The implementation of a speech recognizer on a fixed-point microprocessor could
provide a possible solution to such applications. Standard algorithms based
on hidden markov models (HMM) and artificial neural networks (ANN) cannot be
used be on a fixed-point microprocessor because these algorithms require computation
which cannot be done in real-time on an 8-bit microprocessor. Hence, there is
a need for a simpler algorithm.
We present a novel algorithm that uses only integer arithmetic and, hence, can
be efficiently implemented on a fixed-point microprocessor in real-time. The
algorithm is used for performing speaker-dependent recognition of isolated Hindi
digits.
Speech recognition algorithms employ a short time feature vector to take care
of the non-stationary nature of the speech signal. Standard feature vectors
Mel frequency cepstrum coefficient (MFCC) or linear prediction coefficient (LPC)
are computationally intensive. We designed a new but simple feature vector that
uses only the zero crossings of the speech signal. The novel feature extraction
requires 1-bit A/D conversion because it processes only zero crossings. The
feature extraction is computationally very simple. It does not require any pre-processing
of the speech signal. The feature vector preserves all information regarding
the duration of the time intervals.
The short time feature vector is the histogram of time-interval between successive
zero crossings of the speech utterance in a short time window. The feature vectors,
extracted for each window in the speech utterance, are combined to form a feature
matrix. The matrix is then normalized by multiplication with a weight vector.
Dynamic time warping (DTW) [3] is used to calculate the distance between the
feature matrix of the input signal and the reference patterns. DTW finds a best
time-aligned path for minimum distance under certain specified constraints [3,4].
The pattern corresponding to the minimum distance is then identified to be the
unknown input signal if the distance is less than a predetermined threshold.
The algorithm was implemented on Matlab to perform speaker dependent isolated
word recognition on a vocabulary of 10 isolated Hindi digits. Simulation of
the algorithm with recorded utterances gave an accuracy of 95.5%.
The algorithm was then implemented on Intel-8085, an 8-bit microprocessor. The
details of the design, coding, memory and interfacing are given. With microprocessor
running at a clock of 1.5 MHz, the average response time of the microprocessor
is 0.9 second, with a worst-case response time of 2 seconds.
MATRIX
REFERENCES
[1] The Simputer Trust. Simputer™: What is a Simputer? [Online] Available: http://www.simputer.org/simputer/about
[2] Lipovac and V. Sarajevo, "Zero-crossing-based linear prediction for
speech recognition", Electronics Letters, pages 9092, vol. 25 Issue 2,19
Jan 1989.
[3] H. Sakoe and S. Chiba, "Dynamic Programming Optimization for Spoken
Word Recognition", IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-26(1):43-49,
February 1978.
[4] Lawrence Rabiner, and Biing-Hwang Juang, "Fundamentals of Speech Recognition",
PTR Prentice Hall, Englewood Cliffs, New Jersey 07632, 1993.