Beginner's Guide to Microprocessor based Speech Recognizer
Ashutosh Saxena, Senior, Department of Electrical Engineering,
Indian Institute of Technology Kanpur.
To design a "Microprocessor based speech recognizer", here are the steps.
I would suggest to proceed--
Know the constraints of microprocessor. Microprocessor/microcontroller
have obvious limitations in terms of computational power, type of
arithmetic, etc. Do not go into detailsat this stage, just know the
limitations.
Find/develop a speech recognition. Most of the ones available in
papers/books are COMPUTATIONALLY INTENSIVE, hence cannot be implemented on
a uP. Look for a simple.
Decide what you can do
Continuous speech recognition: possible on good/expensive DSP's only.
Isolated word recognition: algorithms can be implemented on a reasonable uP
Speaker independent: Difficult to do on uP
Speaker dependent: very easy to do on uP
Noise robustness: DSP's only
Number of classes: The more it becomes more difficult
Simulate algorithm on MATLAB (or other software like C++, C, Java), to
check its accuracy. This involves getting database and checking accuracy
on more than 20 words for each utterance. You should expect atleast more
than 96% accuracy at this stage: The more the better.
Choose uP. Get its simulator. uP/uC programming can be done in its
assembly. However, there are some whoose programming is done pseudo-C (C
like language), and programming becomes slightly easy. There are
simulators for that DSP/uP/uC. Code the algo, and then check on its
simulator. Check internet for "free simulator uPName".
Download code (or make a circuit with memory/decoder/uP: if readymade
kit not available). Check for simple test programs. Make MIC circuit for
sound input (filtering required). Check if it works !!!
(Two pieces of advice: 1. Never try to check whole program in one go.
Divide and conquer; 2. Always know real-time computational constraints of
your algorithm and uP. Ensure that upper bound of former lies below lower
bound of latter)
Signal Processing Fundae:
Speech recognition algorithm, in general, involves--
Preprocessing: This involves, prefiltering, quantization,
preemphasizing, etc.
Feature extraction, in which complete signal is compressed into a
N-dimensional feature vector, where N should be tried to make as small as
possible.
Classifying the feature vector into final classes (i.e. outputs). Here
distance measures are involved. Popular distance measures are: Euclidean,
Mahalnobis, etc. Note speech signal involves many things (time warping,
expansion, speaker dependency, noise, etc.)--and to take care of these,
use combination of (1) and (2) at the cost of increasing complexity.
There are other methods like Neural Networks but the problem I faced are
computational constraints; and if something goes wrong you will never
understand why and how (fact accepted by researchers).
Mail me for more questions, or my codes and papers (which of course should
be acknowledged in any presentation or publication you make).