![]() |
|
|
COURSE
INFORMATION
|
|
| Instructor |
Dan Jurafsky, jurafsky@stanford.edu |
| TA |
Yun-Hsuan Sung cs224s-ta@cs.stanford.edu |
| Newsgroup |
We have setup a class newsgroup, su.class.cs224s. Please post questions there and check to see if your question has been asked before posting it. |
| Time |
Tuesdays and Thursdays, 10:00-11:15am (NOTE SLIGHT CHANGE IN TIME TO START AT 10:00am) |
| Location |
200-217 |
| Textbook |
|
| Description |
This course is an introduction to automatic speech recognition, speech understanding and speech synthesis/text-to-speech from the computer science and linguistics (rather than EE) perspective. Focus on understanding of key algorithms including noisy channel model, Hidden Markov Models (HMMs), GMM (Gaussian Mixture Model) acoustic models, A* and Viterbi decoding, N-gram language modeling, unit selection synthesis, dialogue architectures, and the role of linguistic knowledge (esp. phonetics, intonation, pronunciation variation, disfluencies, emotion). Prerequisites: programming experience and familiarity with probability. Recommended: cs221 or cs229. |
| Required Work |
|
|
SCHEDULE
|
||||
|
Wk
|
Date
|
HW
|
Lec
|
Topic and Readings |
| 1 |
Jan 9 |
  |
Lec 1 (ppt) Lec 1 (6up PDF) |
Course Overview and History, Articulatory Phonetics and ARPAbet transcription
|
| 1 |
Jan 11 |
  |
Lec 2 (ppt) Lec 2 (6up PDF) |
Phonetics: Acoustic Phonetics |
| 2 |
Jan 16 |
HW 1 due |
Lec 3 (ppt) Lec 3 (6up PDF) |
TTS: Background (machine learning, classification, NLP) and Text Normalization
|
| 2 |
Jan 18 |
Lec 4 (ppt) Lec 4 (6up PDF) |
Grapheme-to-phoneme and the Festival software (plus Text Normalization spillover)
|
|
| 3 |
Jan 23 |
HW 2 due |
Lec 5 (ppt) Lec 5 (6up PDF) |
TTS: Prosody (Intonation, Boundaries, and Duration)
|
| 3 |
Jan 25 |
Lec 6 (ppt) Lec 6 (6up PDF) |
TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)
|
|
| 4 |
Jan 30 |
HW 3 due |
Lec 7 (ppt) Lec 7 (6up PDF) |
ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi, Start of Baum-Welch |
| 4 |
Feb 1 |
Lec 8 (ppt) Lec 8 (6up PDF) |
ASR: HMMs continued, and application to ASR
|
|
| 5 |
Feb 6 |
1-pgraph project proposal |
Lec 9 (ppt) Lec 9 (pdf) |
ASR: Feature Extraction
|
| 5 |
Feb 8 |
Lec 10 (ppt) Lec 10 (6up PDF) |
ASR: Acoustic Modeling |
|
| 6 |
Feb 13 |
HW 4 due |
Lec 11 (ppt) Lec 11 (6up PDF) |
N-grams and Language Modeling
|
| 6 |
Feb 15 |
Lec 12 (ppt) Lec 12 (6up PDF) |
ASR: Search (Lattices, N-best lists, A*, etc) |
|
| 7 |
Feb 20 |
HW 5 due | Lec 13 (ppt) Lec 13 (6up PDF) |
Conversational Agents: Simple dialogue systems (frame-based systems, VXML, evaluation) |
| 7 |
Feb 22 |
Lec 14 (ppt) Lec 14 (6up PDF) |
Conversational Agents: Grounding, Confirmation, Dialogue Acts |
|
| 8 |
Feb 27 |
HW 6 due | Lec 15 (ppt) Lec 15 (6up PDF) |
Conversational Agents III: Markov Decision Processes (MDPs), etc |
| 8 |
Mar 1 |
Lec 16 (ppt) Lec 16 (6up PDF) |
ASR: Dealing with Variation (Adaptation, MLLR, Pronunciation modeling) |
|
| 9 |
Mar 6 |
Lec 17 (ppt) Lec 17 (6up PDF) |
Disfluencies and Metadata: Boundaries, Fillers, Edit Terms
|
|
| 9 |
Mar 8 |
Lec 18 (ppt) Lec 18 (6up PDF) |
Emotional speech |
|
| 10 |
Mar 13 |
Guest Lecture: Luciana Ferrer on Speaker Recognition
|
||
| 10 |
Mar 15 |
Final Project Presentations |
||