![]() |
CS 224S/LINGUIST 281   -     Winter 2009
Homework 2: TTS |
| Due: January 20, before the start of class. |
WARNING: Please read this entire page before you start!!!!!
NOTE: These exercises are from Alan Black's course.
(2.0) Make sure you can get Festival to say "Hello World". Don't wait for this until the night before the homework is due.
(2.1) Make Festival say your entire name (first and last). If Festival doesn't say it correctly, fix it by adding explicit pronunciations to the lexicon. If it does say it correctly, find a friend's name that it doesn't say correctly and add a pronunciation to fix it.
(2.2) Copy 'text2pos' (in /afs/ir.stanford.edu/class/cs224s/newfestival/festival/examples) and modify it to output the number of nouns (of any type) in a given file.
(2.3) Copy 'text2pos' and modify it to output the number of vowels (phoneme vowels not letter vowels) in a given file.
(2.4) Add a token to word rule, to say money values in dollars and cents in a standard full form. For example, given $56.54, you should say "fifty six dollars and fifty four cents". Hints for how to do this are given below.
Please send a plain text e-mail or file containing your code/commands, sample output, and responses to cs224s-win0809-ta@lists.stanford.edu.
Here's various useful hints and helps about getting started
First, what should I read to understand Festival?
Second, where is Festival?
http://festvox.org/festival/downloads.html http://www.cstr.ed.ac.uk/downloads/festival/1.95
/afs/ir/class/cs224s/newfestival/festival/bin/festival
It is compiled for Dell (Intel) LINUX machines. That means it runs on Myth, firebird, and raptor of the Sweet Hall machines; it will not run on the Sun machines in Sweet Hall, only the linux machines. Some of the existing Sweet Hall computers are transferred to the School of Engineering for use in the new labs at Gates (room B08) and Terman (rooms 102-104).
You can only hear the output from festival if you are physically sitting at the machines listening to the speakers. So you have to go to Sweet Hall, Gates, or Terman. Unless you have remarkably good hearing. Since the sound will come out somewhere in Sweet Hall, Gates, or Terman (Probably startling everyone there).
If you have access to another Linux machine, some versions of linux come preloaded with Festival. If you can find one that does, (like if you have your own linux machine or something) feel free to use that instead of the class copy of Festival.
The simplest way to run festival is to create a small "Scheme" script file called "myrules.scm", which has the following first line:
(voice_kal_diphone)
and the lines afterward have your "lex.add.entry" commands.
Then you run your new file as follows:
festival myrules.scm
For advanced questions, a useful festival script saytime and other examples are in
/afs/ir/class/cs224s/newfestival/festival/examples
I recommend you add the following to your PATH variable:
/afs/ir/class/cs224s/newfestival/festival/bin/
Recall that there are three commonly-used ways to run festival: You can have festival synthesize things directly from the shell:
echo My name is ... | festival --ttsor within the command interpreter with the command:
(SayText "My name is ...")in the command interpreter, or you can write a script like the script example
saytime (in "examples/saytime").
If your name is not pronounced properly you can add new
entries to the lexicon using the the function lex.add.entry
For example the default synthesizer pronounces Ronald Reagan's
second name wrongly so we can redefine the pronunciation as
(lex.add.entry
'("reagan" n (((r ey) 1) ((g ax n) 0))))
To find out what the phoneme set is and possible formats, it is often useful
to lookup similar words. Use the lex.lookup function as in
(lex.lookup 'reagan)then copy the entry changing it as desired. To keep the pronunciation add it to your `.festivalrc' in your home directory. This file is automatically loaded every time you run Festival so then it will always know about your name. Because there are different lexicons for different languages/dialects you must first select the lexicon/voice first before setting the new pronunciation.
(voice_kal_diphone) (lex.add.entry ...)
But for this homework you should be able to use the default voice, so you probably shouldn't need to reset this.
(set! total_ns (+ 1 total_ns)) (format t "Total number of nouns %d\n" total_ns)
# See `/afs/ir/class/cs224s/newfestival/festival/lib/synthesis.scm' for the definition of Tokens UttType for list of extra modules to call. You want to look at the Segment relation
(if (string-equal (item.feat seg "ph_vc") "+")
(set! total_vs (+ 1 total_vs))
)
You will need to add a new definition for token_to_words.
The normal convention here is to save the existing one and call that for things that
don't match what you are looking for. Thus your file will look
something like
(set! previous_token_to_words token_to_words)
(define (token_to_words token name)
(cond
;; here insert the condition to recognize money tokens
;; return list of words
(t
(previous_token_to_words token name))))
Another hint: use previous_token_to_words to do the
hard parts (e.g., converting 56 to "fifty six" and 54 to "fifty four").
Your code should just do the easy part (adding the "dollars" and "cents"
in the right places).
Another hint: once you decide on the condition, remember that you need to return a list of words.