CS 224S / LINGUIST 281
Speech Recognition and Synthesis
Winter 2009

COURSE INFORMATION
Instructor Dan Jurafsky, jurafsky@stanford.edu
Office: Margaret Jacks Hall (bldg 460) 117
Office Hours: M 12:00-12:30, Tu 4:30-5:30
TA David Borowitz, cs224s-win0809-ta@lists.stanford.edu
Office: 460-022 (Margaret Jacks Hall Basement)
Office Hours: MW 2:15-4:15
Newsgroup We have set up a class newsgroup, su.class.cs224s. Please post questions there and check to see if your question has been asked before posting it.
Time Tuesdays and Thursdays, 3:15-4:30pm
Location 460-126
Textbook
  • Jurafsky and Martin. 2008. Speech and Language Processing, (2nd edition). Prentice-Hall. This is at the bookstores. Note you need the second edition, not the first.
  • Selected online chapters from the as-yet-unpublished Taylor, Paul. 2009. Text-to-Speech Synthesis
  • Some online papers from the literature.
  • Also, on reserve: Huang, Acero, Hon. 2001. Spoken Language Processing. Prentice-Hall.
Description This course is an introduction to automatic speech recognition, speech synthesis, and dialogue systems, from the computer science and linguistics (rather than EE) perspective. Focus on understanding of key algorithms including the noisy channel model, Hidden Markov Models (HMMs), GMM (Gaussian Mixture Model) acoustic models, A* and Viterbi decoding, N-gram language modeling, unit selection synthesis, dialogue architectures, and the role of linguistic knowledge (esp. phonetics, intonation, pronunciation variation, disfluencies, emotion). Prerequisites: programming experience and familiarity with probability. Recommended: cs221 or cs229.
Required Work
  • Homeworks: 7 homeworks. Homework is due at 3:14pm on the day it is due (i.e. before class starts).
    • Homework Collaboration Policy: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual homeworks and programs yourself.
    • Late homeworks: You have 5 free late (calendar) days to use on the homeworks. Once these late days are exhausted, any homework turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day.
  • Readings: To be read before the class period in which they will be discussed. THERE IS A LOT OF READING IN THIS COURSE!!! We are covering what are really three entire fields (speech recognition, speech synthesis, dialogue systems) in one 10-week quarter, and not everything can be covered in each lecture, so you need to do all the reading.
  • Final Project: Any project in speech recognition, speech synthesis, speech understanding, dialogue design, speech user interface design, etc etc. Either individual or joint projects are fine. The final project will be presented as a poster at the poster session on Thursday March 12, and is due on Monday March 16 at noon PST by email. Details here.
  • Determination of final grade:
    • 45%: final project
    • 45%: 7 homeworks
    • 10%: class participation


SCHEDULE
Wk
Date
HW
Lec

Topic and Readings

1
Jan 6
  Lec 1 (ppt)
Lec 1 (pdf)

Course Overview and History, Articulatory Phonetics and ARPAbet transcription

  • J+M Chapter 7: Phonetics, 215-230
  • pages 185-190 from Chapter 7 "Phonetics and Phonology" from Taylor, Paul. 2009. Text-to-Speech Synthesis
1
Jan 8
  Lec 2 (ppt)
Lec 2 (pdf)

Phonetics: Acoustic Phonetics

  • J+M Chapter 7: Phonetics, 230-end
2
Jan 13
HW 1 due Lec 3 (ppt)
Lec 3 (pdf)

TTS: Background (machine learning, classification, NLP) and Text Normalization

  • Note that there is a lot of reading for today. This is more than we will normally have.
  • J+M Chapter 8 Speech Synthesis, pages 249-257
  • SKIM J+M Chapter 5 Part-of-Speech Tagging, pages 1-35
  • Chapter 4 "Text Segmentation and Organisation" from from Taylor, Paul. 2009. Text-to-Speech Synthesis
  • Chapter 5 "Text Decoding" from Taylor, Paul. 2009. Text-to-Speech Synthesis
  • for those who haven't seen decision trees before read an introduction to decision trees like in Russell and Norvig or here's one on the web
2
Jan 15
Lec 4 (ppt)
Lec 4 (pdf)

Grapheme-to-phoneme and the Festival software (plus Text Normalization spillover)

3
Jan 20
HW 2 due Lec 5 (ppt)
Lec 5 (pdf)

TTS: Prosody (Intonation, Boundaries, and Duration)

3
Jan 22
Lec 6 (ppt)
Lec 6 (pdf)

TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)

4
Jan 27
HW 3 due Lec 7 (ppt)
Lec 7 (pdf)

ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi

  • J+M Chapter 6: Hidden Markov Models, pages 173-186
  • J+M Chapter 9: Automatic Speech Recognition, pages 285-295
4
Jan 29

Lec 8 (ppt)
Lec 8 (pdf)

ASR: HMMs continued: Baum Welch, application to ASR, Word Error Rate

  • J+M Chapter 6: Hidden Markov Models, pages 186-192
  • J+M Chapter 9: Automatic Speech Recognition pages 314-333
  • Getting a speech recognizer for your project: where are HTK and Sphinx?
5
Feb 3
HW 4 due
Lec 9 (ppt)
Lec 9 (pdf)

ASR: Feature Extraction

5
Feb 5
1-pgraph
project
proposal due
Lec 10 (ppt)
Lec 10 (pdf)
ASR: Acoustic Modeling

  • J+M Chapter 9: Automatic Speech Recognition pages 295-314
6
Feb 10
HW 5 due Lec 11 (ppt)
Lec 11 (pdf)

N-grams and Language Modeling

6
Feb 12

Lec 12 (ppt)
Lec 12 (pdf)

Conversational Agents: Human conversation, simple frame-based dialogue systems, and VoiceXML

  • J+M Chapter 24, Dialogue and Conversational Agents, pages 811-838
7
Feb 17
HW 6 due Lec 13 (ppt)
Lec 13 (pdf)
Conversational Agents: Grounding, Confirmation, Dialogue Acts, Evaluation
  • J+M Chapter 24, Dialogue and Conversational Agents, pages 838-846
7
Feb 19
Lec 14 (ppt)
Lec 14 (pdf)

Conversational Agents III: Markov Decision Processes (MDPs), etc
8
Feb 24
HW 7 due Lec 15 (ppt)
Lec 15 (pdf)

ASR: Search (Lattices, N-best lists, A*, etc)

  • J+M Chapter 10: Speech Recognition: Advanced Topics, pages 335-352
8
Feb 26
Lec 16 (ppt)
Lec 16 (pdf)

ASR: Dealing with Variation (Adaptation, MLLR, Pronunciation modeling)
  • J+M Chapter 10, Speech Recognition: Advanced Topics, pages 352-360
9
Mar 3

Lec 17 (ppt)
Lec 17 (pdf)

Disfluencies and Metadata: Boundaries, Fillers, Edit Terms
9
Mar 5

Lec 18 (ppt)
Lec 18 (pdf)

Emotional speech
10
Mar 10

No Class Today
10
Mar 12

Final Project Presentations