![]() |
|
|
COURSE INFORMATION
|
|
| Instructor |
Dan Jurafsky, jurafsky@stanford.edu Office: Margaret Jacks Hall (bldg 460) 117 Office Hours: M 12:00-12:30, Tu 4:30-5:30 |
| TA |
David Borowitz,
cs224s-win0809-ta@lists.stanford.edu Office: 460-022 (Margaret Jacks Hall Basement) Office Hours: MW 2:15-4:15 |
| Newsgroup | We have set up a class newsgroup, su.class.cs224s. Please post questions there and check to see if your question has been asked before posting it. |
| Time | Tuesdays and Thursdays, 3:15-4:30pm |
| Location | 460-126 |
| Textbook |
|
| Description | This course is an introduction to automatic speech recognition, speech synthesis, and dialogue systems, from the computer science and linguistics (rather than EE) perspective. Focus on understanding of key algorithms including the noisy channel model, Hidden Markov Models (HMMs), GMM (Gaussian Mixture Model) acoustic models, A* and Viterbi decoding, N-gram language modeling, unit selection synthesis, dialogue architectures, and the role of linguistic knowledge (esp. phonetics, intonation, pronunciation variation, disfluencies, emotion). Prerequisites: programming experience and familiarity with probability. Recommended: cs221 or cs229. |
| Required Work |
|
|
SCHEDULE
|
||||
|
Wk
|
Date
|
HW
|
Lec
|
Topic and Readings |
| 1 |
Jan 6 |
  |
Lec 1 (ppt) Lec 1 (pdf) |
Course Overview and History, Articulatory Phonetics and ARPAbet transcription
|
| 1 |
Jan 8 |
  |
Lec 2 (ppt) Lec 2 (pdf) |
Phonetics: Acoustic Phonetics
|
| 2 |
Jan 13 |
HW 1 due |
Lec 3 (ppt) Lec 3 (pdf) |
TTS: Background (machine learning, classification, NLP) and Text Normalization
|
| 2 |
Jan 15 |
Lec 4 (ppt) Lec 4 (pdf) |
Grapheme-to-phoneme and the Festival software (plus Text Normalization spillover)
|
|
| 3 |
Jan 20 |
HW 2 due |
Lec 5 (ppt) Lec 5 (pdf) |
TTS: Prosody (Intonation, Boundaries, and Duration)
|
| 3 |
Jan 22 |
Lec 6 (ppt) Lec 6 (pdf) |
TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)
|
|
| 4 |
Jan 27 |
HW 3 due |
Lec 7 (ppt) Lec 7 (pdf) |
ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi
|
| 4 |
Jan 29 |
Lec 8 (ppt) Lec 8 (pdf) |
ASR: HMMs continued: Baum Welch, application to ASR, Word Error Rate |
|
| 5 |
Feb 3 |
HW 4 due | Lec 9 (ppt) Lec 9 (pdf) |
ASR: Feature Extraction
|
| 5 |
Feb 5 |
1-pgraph project proposal due |
Lec 10 (ppt) Lec 10 (pdf) |
ASR: Acoustic Modeling
|
| 6 |
Feb 10 |
HW 5 due |
Lec 11 (ppt) Lec 11 (pdf) |
N-grams and Language Modeling
|
6 |
Feb 12 |
Lec 12 (ppt) Lec 12 (pdf) |
Conversational Agents: Human conversation, simple frame-based dialogue systems, and VoiceXML
|
| 7 |
Feb 17 |
HW 6 due |
Lec 13 (ppt) Lec 13 (pdf) |
Conversational Agents: Grounding, Confirmation, Dialogue Acts, Evaluation
|
| 7 |
Feb 19 |
Lec 14 (ppt) Lec 14 (pdf) |
Conversational Agents III: Markov Decision Processes (MDPs), etc
|
|
| 8 |
Feb 24 |
HW 7 due |
Lec 15 (ppt) Lec 15 (pdf) |
ASR: Search (Lattices, N-best lists, A*, etc)
|
| 8 |
Feb 26 |
Lec 16 (ppt) Lec 16 (pdf) |
ASR: Dealing with Variation (Adaptation, MLLR, Pronunciation modeling)
|
|
| 9 |
Mar 3 |
Lec 17 (ppt) Lec 17 (pdf) |
Disfluencies and Metadata: Boundaries, Fillers, Edit Terms
|
|
| 9 |
Mar 5 |
Lec 18 (ppt) Lec 18 (pdf) |
Emotional speech |
|
| 10 |
Mar 10 |
No Class Today |
||
| 10 |
Mar 12 |
Final Project Presentations |
||