Homepage | Publications
My work on "Microprocessor based speech recognition": Click here

Beginner's Guide to Microprocessor based Speech Recognizer

Ashutosh Saxena, Senior, Department of Electrical Engineering, Indian Institute of Technology Kanpur.

To design a "Microprocessor based speech recognizer", here are the steps. I would suggest to proceed--

  1. Know the constraints of microprocessor. Microprocessor/microcontroller have obvious limitations in terms of computational power, type of arithmetic, etc. Do not go into detailsat this stage, just know the limitations.
  2. Find/develop a speech recognition. Most of the ones available in papers/books are COMPUTATIONALLY INTENSIVE, hence cannot be implemented on a uP. Look for a simple.
  3. Decide what you can do
  4. Simulate algorithm on MATLAB (or other software like C++, C, Java), to check its accuracy. This involves getting database and checking accuracy on more than 20 words for each utterance. You should expect atleast more than 96% accuracy at this stage: The more the better.
  5. Choose uP. Get its simulator. uP/uC programming can be done in its assembly. However, there are some whoose programming is done pseudo-C (C like language), and programming becomes slightly easy. There are simulators for that DSP/uP/uC. Code the algo, and then check on its simulator. Check internet for "free simulator uPName".
  6. Download code (or make a circuit with memory/decoder/uP: if readymade kit not available). Check for simple test programs. Make MIC circuit for sound input (filtering required). Check if it works !!!

(Two pieces of advice: 1. Never try to check whole program in one go. Divide and conquer; 2. Always know real-time computational constraints of your algorithm and uP. Ensure that upper bound of former lies below lower bound of latter)


Signal Processing Fundae: Speech recognition algorithm, in general, involves--
  1. Preprocessing: This involves, prefiltering, quantization, preemphasizing, etc.
  2. Feature extraction, in which complete signal is compressed into a N-dimensional feature vector, where N should be tried to make as small as possible.
  3. Classifying the feature vector into final classes (i.e. outputs). Here distance measures are involved. Popular distance measures are: Euclidean, Mahalnobis, etc. Note speech signal involves many things (time warping, expansion, speaker dependency, noise, etc.)--and to take care of these, use combination of (1) and (2) at the cost of increasing complexity.

There are other methods like Neural Networks but the problem I faced are computational constraints; and if something goes wrong you will never understand why and how (fact accepted by researchers). Mail me for more questions, or my codes and papers (which of course should be acknowledged in any presentation or publication you make).