|CS 224S/LINGUIST 236   -     Winter 2009
Homework 5: A Digit Recognizer
|Due: Tuesday February 10 before the start of class.|
The purpose of this homework is to make you familiar with Hidden Markov Model Toolkit (HTK), a portable toolkit for building and manipulating Hidden Markov Models.
In this assignment, you will build a simple digit recognizer with monophone models and report digit recognition accuracy.
http://htk.eng.cam.ac.uk/download.shtmlAlternatively, we provide all the HTK 3.4 binary files you need on AFS. You can either copy them to your working directory or link to them.
/afs/ir/class/cs224s/htk-3.4(These binaries have been tested on the myth, bramble, hedge, vine, and pod clusters. If you have issues getting HTK to run on one of these machines, please let us know.)
/afs/ir/class/cs224s/tidigitsThe corpus includes both training and testing data in the train and test directories. In the training set, there are 13 males and 14 females, totaling 27 speakers. In the test set, there are 3 males and 3 females, totaling 6 speakers. (You may copy the tidigits directory to your machine, but it is several hundred MB, so you might not want to.)
The digit sequences were made up of the digits: "zero", "oh", "one", "two", "three", "four", "five", "six", "seven", "eight", and "nine". The digit sequences spoken by each speaker can be broken down as follows:
22 isolated digits (2 productions of each of 11 digits)
11 2-digit sequences
11 3-digit sequences
11 4-digit sequences
11 5-digit sequences
11 7-digit sequences
You'll need to follow each step in the scripts directory. But you'll need to change the paths to fit your working directory. The paths also refer to the tidigits directory on the class AFS space, so you'll probably want to run the scripts on a machine with AFS access.
You can download the tar file here.
The scripts directory includes all scripts for training the digit recognizer from extracting MFCC to evaluating Word Error Rate. The simple description is as follow.
(5.1) 01_HCopy.sh: Generate MFCC from wave files.
(5.2) 02_HCompV.sh: Train an initial HMM with three states and single Gaussian from proto file.
(5.3) 03_hmmdef.sh: Generate the initial HMM for each phone from step (5.2).
(5.4) 04_HLEd.sh: Generate Master Label File (mlf).
(5.5) 05_HERest.sh: Use the Baum-Welch algorithm to train HMM.
(5.6) 06_mix02.sh: Split into 2 Gaussians and do Baum-Welch training.
(5.7) 07_mix04.sh: Split into 4 Gaussians and do Baum-Welch training.
(5.10) 10_HParse.sh: Generate a digit network for decoding.
(5.11) 11_HVite.sh: Do viterbi decoding.
(5.12) 12_HResult.sh: Do evaluation.
(1) The digit accuracy with all male speakers as training data and 4 Gaussians in each state.
(2) The digit accuracy with all male speakers as training data and 16 Gaussians in each state.
(3) The digit accuracy with all male and female speakers as training data and 4 Gaussians in each state.
(4) The digit accuracy with all male and female speakers as training data and 16 Gaussians in each state.
(5) A simple error analysis about what you found from those four digit recognizers.
(6) Optional extra point.