STANFORD CS 224S/LINGUIST 281   -     Winter 2009
Homework 3: Letter to Sound Rules
Due: January 27 at the start of class.

Please read this entire page before beginning.

This homework is stolen by permission from Richard Sproat's Speech Synthesis class at UIUC, and relates to pronunciation modeling, specifically the pronunciation of personal names in English, a particularly hard problem.

THE FOLLOWING TEXT IS ALL THANKS TO RICHARD SPROAT! SO WHEN IT SAYS "I", THIS MEANS RICHARD!

Wagon

When you installed Festival, you also installed the Edinburgh speech tools package. The binaries are probably installed somewhere like /usr/local/festival/test/speech_tools in your installation. One of the tools is Wagon, the Edinburgh speech tools' version of the CART (Classification and Regression Tree) algorithm (Breiman et al. 1984). Documentation for Wagon is available in several places, such as here.

. [I (Dan) have also installed a copy of linux-compiled version of wagon at /afs/ir/class/cs224s/wagon; it should be executable from vine, raptor, firebird.]

As you will see in the Wagon manual, you need to provide Wagon with a data file consisting of a set of feature vectors, with the first element being the predicted feature/value; and a feature description file that tells Wagon what the possible feature values are for each feature. For those of you who have used CART, this is the same idea as in CART, except that the format of the files is different. Note that you can run a test on test data by using the the -test flag.

The problem

Background

You are to use Wagon to train a pronunciation model for a set of family names. The dictionary can be found here. The dictionary consists of, in alternating lines, the spelling of the name (all in lower case), and the transcription of the name into a single-character phonetic alphabet. Note that the dictionary has been aligned automatically (using the algorithm described in Sproat, 2001), so that letters are mapped one-for-one onto phones. In some cases this results in a "deletion" (indicuted by a "#" on the phone side); in others this results in an "amalgamation", as in the combination of "i" and "k" into "_i_k_" in:

m c p h e r s o n 
m _i_k_ f # E R s & n 
(The phonetic transcription scheme is more or less the one listed as "JPO" (after Joe Olive) in Appendix A of this document by Jim Hieronymus.)

Assume that your task is to use only evidence from a fixed width left and right orthographic context to predict the phone (including amalgamated phones and deletion) for the current letter. For example, you might decide to use two letters on the left, two on the right, and the current letter. So your features would be the letters of the left and right context as well as the current letter. Ideally you would have one feature for each letter, with values, e.g., feat-2=p if the letter in position -2 is "p". Unfortunately CART is inefficient with categorical features that have a large number of possible values (>15, or so), so you will need to break up the features. One possible feature encoding is given here. The features include a general class "cons" versus "vowel", an indicator of case (redundant in this example since everything has been downcased), a vowel feature identifying the vowel or "n/a" if a consonant, and two sets of consonant features. Also used is the pad symbol, which is defined here as "#". Depending on your context, you will have to pad the left and right of the input and output strings with enough pads so that the leftmost and rightmost letters have sufficient context to their left/right.

Using these features, and assuming a window of 5 (including the target letter), the first few lines of an encoded data file might look as follows:

[rws@catarina hw3data]$ head -5 dict.data
i       pad pad pad pad pad     pad pad pad pad pad     vow lower Vi n/a n/a   cons lower n/a n/a Cm    pad pad pad pad pad
m       pad pad pad pad pad     vow lower Vi n/a n/a    cons lower n/a n/a Cm  pad pad pad pad pad      pad pad pad pad pad
i       pad pad pad pad pad     pad pad pad pad pad     vow lower Vi n/a n/a   cons lower n/a Cp n/a    pad pad pad pad pad
p       pad pad pad pad pad     vow lower Vi n/a n/a    cons lower n/a Cp n/a  pad pad pad pad pad      pad pad pad pad pad
v       pad pad pad pad pad     pad pad pad pad pad     cons lower n/a n/a Cv  vow lower Vo n/a n/a     pad pad pad pad pad

Specific Tasks

More on using Wagon

Caveat: you will want to increase the lisp heap size for Wagon for this task. Here was the invocation that I used, for example:

wagon -desc dict.desc \
      -stop 10 \
      -output dict.tree \
      -data dict.data \
      -test dict_test.data \
      -heap 5000000

It should take a few minutes to run, depending of course upon your memory and processor speed.

The parameters for wagon_test are:

./wagon_test -h
Usage: wagon_test 
Summary: program to test CART models on data
-desc      Field description file
-data      Datafile, one vector per line
-tree      File containing CART tree
-predict          Predict for each vector returning full vector
-predict_val      Predict for each vector returning just value
-predictee 
                  name of field to predict (default is first field)
-heap  {210000}
              Set size of Lisp heap, should not normally need
              to be changed from its default
-o         File to save output in

Extra credit

Many of you will have experience with other machine learning approaches. For extra credit, you can try your favorite approach on the same data. Report the algorithm used, and the results. How do they compare to CART/Wagon's performance? You must of course make the tests comparable: for example you have to keep the character window constant in both cases.

Homework Hint

We have provided the following hint in order to help you sanity check your code. We trained/tested with a window of three (one letter on either side) and got a per-letter accuracy of 87.642%. You can change your window to three and see if you approximately agree with this number (due to different ways of splitting the data, you may not perfectly agree).

References

Leo Breiman, Jerome H. Friedman, Richard~A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth & Brooks, Pacific Grove CA, 1984.

Richard Sproat, "Pmtools: A Pronunciation Modeling Toolkit", Proceedings of the Fourth ISCA Tutorial and Research Workshop on Speech Synthesis, Blair Atholl, Scotland, 2001.

How to turn in the homework: