CS 224S / LINGUIST 281
Speech Recognition and Synthesis
Winter 2006

COURSE INFORMATION
Instructor
Dan Jurafsky, jurafsky@stanford.edu
Office: Margaret Jacks Hall (bld 460) 113
Office Hours: Mondays 3:30-5:00pm
TA
Jenny Finkel, cs224s-ta@cs.stanford.edu
Office: Gates 232
Office Hours: Mondays and Wednesdays, 3:00-4:00pm
Newsgroup
We have setup a class newsgroup, su.class.cs224s. Please post questions there and check to see if your question has been asked before posting it.
Time
Tuesdays and Thursdays, 3:30-4:45pm
Location
50-52H
Textbook
Huang, Acero, Hon. 2001. Spoken Language Processing. Prentice-Hall.
Plus selected chapters from the new edition of Jurafsky and Martin. 2007. Speech and Language Processing.

Description
This course is an introduction to automatic speech recognition, speech understanding and speech synthesis/text-to-speech from the computer science and linguistics (rather than EE) perspective. Focus on understanding of key algorithms including noisy channel model, Hidden Markov Models (HMMs), GMM (Gaussian Mixture Model) acoustic models, A* and Viterbi decoding, N-gram language modeling, unit selection synthesis, dialogue architectures, and the role of linguistic knowledge (esp. phonetics, intonation, pronunciation variation, disfluencies, emotion). Prerequisites: programming experience and basic familiarity with probability. Recommended: cs221 or cs229.
Required Work
  • Homeworks: 6 homeworks (Homework Collaboration Policy). Homework is due at 3:30pm on the day it is due (i.e. before class starts). LATE HOMEWORK WILL NOT BE ACCEPTED. But I will drop your lowest homework grade.
  • Readings: To be read before the class period in which they will be discussed.
  • Final Project: Any project in speech recognition, speech synthesis, speech understanding, dialogue design, speech user interface design, etc etc. Either individual or joint projects are fine. The final project will be presented as a poster at the poster session on March 16, and is due on Monday March 20 at noon PST by email. Details here
  • Determination of final grade:
    • 45%: final project
    • 45%: best 5 homeworks out of 6
    • 10%: class participation


SCHEDULE
Week
Date
HW/Lec

Topic and Readings

1
Jan 10
lec1.ppt
lec1.6up.pdf

Course Overview and History, Articulatory Phonetics and ARPAbet transcription

1
Jan 12
lec2.ppt
lec2.6up.pdf

Phonetics: Acoustic Phonetics

2
Jan 17
lec3.ppt
lec3.6up.pdf

HW 1 due

TTS: Introduction, Architectures, Festival

2
Jan 19
lec4.ppt
lec4.6up.pdf

TTS: Text normalization, grapheme-to-phoneme

3
Jan 24
lec5.ppt
lec5.6up.pdf


HW 2 due

TTS: Prosody (Intonation, Boundaries, and Duration)

3
Jan 26
lec6.ppt
lec6.6up.pdf

TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)

4
Jan 31
lec7.ppt
lec7.6up.pdf

HW 3 due!

ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi

4
Feb 2

lec8.ppt
lec8.6up.pdf

ASR: HMMs continued, up to Baum-Welch (Forward-Backward) algorithm

5
Feb 7

lec9.ppt
lec9.6up.pdf

ASR: Acoustic Modeling

5
Feb 9

1-pgraph project proposal lec10.ppt
lec10.6up.pdf

ASR: Dealing with Variation (Adaptation, MLLR, Pronunciation modeling)

6
Feb 14

HW 4 due lec11.ppt
lec11.6up.pdf

N-grams and Language Modeling

6
Feb 16

lec12.ppt
lec12.6up.pdf

ASR: Search (Lattices, N-best lists, A*, etc)

7
Feb 21

lec13.ppt
lec13.6up.pdf

Conversational Agents: Simple dialogue systems (frame-based systems, VXML, evaluation)

7
Feb 23
  HW 5 due lec14.ppt
lec14.6up.pdf

Conversational Agents: Grounding, Confirmation, Dialogue Acts
8
Feb 28

Ferrer.pdf

Guest Speaker: Luciana Ferrer on Speaker Recognition
8
March 2

HW 6 due lec15.ppt
lec15.6up.pdf

Conversational Agents III: Markov Decision Processes (MDPs), etc
9
Mar 7

lec16.ppt
lec16.6up.pdf

Disfluencies and Metadata: Boundaries, Fillers, Edit Terms
9
Mar 9

Guest Speaker Mark Mao on Acoustic Modeling: Covariance Modeling
10
Mar 14

lec17.ppt
lec17.6up.pdf

Emotional speech
10
Mar 16

Final Project Presentations