||CS 224S/LINGUIST 281   -    Winter 2009
Homework 7: Speaker Identification using Single-Gaussian Models
|Due: Tuesday February 24 before the start of class.
Adapted by Rion Snow in Winter 2005 from an assignment by Jonas Samuelsson
The purpose of this assignment is to build and test a speaker identification system for twelve speakers.
We provide ~20-second clips from 12 different speakers in the binary MATLAB file hw7.mat (~6MB file) -- for Windows MATLAB users, try hw7_v4.mat (~24MB file). To hear what the original speech sounds like, open up MATLAB by typing "matlab" on the command line (say, on elaine), and try the following commands:
load '/usr/class/cs224s/WWW/hw7/hw7.mat';If you're running MATLAB remotely, instead try:
load '/usr/class/cs224s/WWW/hw7/hw7.mat';This will write 'myfilename.wav' to your current directory, which you may then transfer to your own machine and play.
We have preprocessed each speech clip (using featurize.m) into corresponding "cepstral" feature matrices suitable for statistical classification. The speech data and extracted features have already been separated into training and evaluation sets, e.g. "sp1_tr" is the raw speech training set for speaker 1, and "ce11_ev" is the pre-processed cepstral feature evaluation set for speaker 11. (Use the function "whos" to see all the variables currently loaded into memory.)
wavwrite(sp1_tr / max(sp1_tr), 'myfilename.wav');
Send your MATLAB script, classifier accuracy, and 12x12 matrix of (log) point estimates log(P(x; m,S)) in a single plain text ascii file (or e-mail body) to email@example.com
- Calculate the mean feature vector m and covariance matrix S from the extracted cepstral feature vector sequence for each speaker in the training set.
- Given the resulting twelve speaker models (one Gaussian per speaker), predict the identity of each speaker in the test set by choosing the Gaussian model that maximizes the point estimate of generating the feature vector sequence.
Note: You should calculate the point estimate of the sequence of feature vectors in an evaluation clip by the product of the individual point estimates of each feature vector x, i.e., P(x; m,S) over all frames in the clip. You may also want to try the simple point estimate of the mean feature vector of the clip and see how they differ! (The product of all point estimates should perform much better.)
- Calculate the accuracy of your classifier.
The "mean" and "cov" MATLAB functions can do most of the work for you here. To save some lines of code, you may want to use the "reshape" function to put the vector of 12 10x10 covariance matrices into a single 3-dimensional (10x10x12) tensor.
Look at the probability density definition for the multivariate normal distribution in section 1.1 in the CS229 Notes above. To calculate the point estimate "probability" of generating a feature vector x, simply substitute x into in to the density calculation; you want to pick the Gaussian model that maximizes this point estimate. For more on the difference between a "point estimate" and a "true probability", see pg. 12 of Dan's lecture notes.
General MATLAB hints: To find out more about a specific function, use "help function-name". To look for a function using keyword search, try "lookfor keyword" -- to find out more about "lookfor", try "help lookfor".
In your solution, the (1,1) entry in the matrix of point estimates should be 1.0e+03 * -0.0485.