![]() |
|
|
COURSE
INFORMATION
|
|
| Instructor |
Dan Jurafsky, jurafsky@stanford.edu |
| TA |
Jenny Finkel, cs224s-ta@cs.stanford.edu |
| Newsgroup |
We have setup a class newsgroup, su.class.cs224s. Please post questions there and check to see if your question has been asked before posting it. |
| Time |
Tuesdays and Thursdays, 3:30-4:45pm |
| Location |
50-52H |
| Textbook |
Huang, Acero, Hon. 2001. Spoken Language Processing. Prentice-Hall. |
| Description |
This course is an introduction to automatic speech recognition, speech understanding and speech synthesis/text-to-speech from the computer science and linguistics (rather than EE) perspective. Focus on understanding of key algorithms including noisy channel model, Hidden Markov Models (HMMs), GMM (Gaussian Mixture Model) acoustic models, A* and Viterbi decoding, N-gram language modeling, unit selection synthesis, dialogue architectures, and the role of linguistic knowledge (esp. phonetics, intonation, pronunciation variation, disfluencies, emotion). Prerequisites: programming experience and basic familiarity with probability. Recommended: cs221 or cs229. |
| Required Work |
|
|
SCHEDULE
|
|||
|
Week
|
Date
|
HW/Lec
|
Topic and Readings |
| 1 |
Jan 10 |
lec1.ppt lec1.6up.pdf |
Course Overview and History, Articulatory Phonetics and ARPAbet transcription
|
| 1 |
Jan 12 |
lec2.ppt lec2.6up.pdf |
Phonetics: Acoustic Phonetics |
| 2 |
Jan 17 |
lec3.ppt lec3.6up.pdf HW 1 due |
TTS: Introduction, Architectures, Festival
|
| 2 |
Jan 19 |
lec4.ppt lec4.6up.pdf |
TTS: Text normalization, grapheme-to-phoneme
|
| 3 |
Jan 24 |
lec5.ppt lec5.6up.pdf HW 2 due |
TTS: Prosody (Intonation, Boundaries, and Duration)
|
| 3 |
Jan 26 |
lec6.ppt lec6.6up.pdf |
TTS: Waveform Synthesis (Diphone and Unit Selection Synthesis)
|
| 4 |
Jan 31 |
lec7.ppt lec7.6up.pdf HW 3 due! |
ASR: Noisy Channel Model, Bayes, HMMs, Forward, Viterbi |
| 4 |
Feb 2 |
lec8.ppt lec8.6up.pdf |
ASR: HMMs continued, up to Baum-Welch (Forward-Backward) algorithm
|
| 5 |
Feb 7 |
lec9.ppt lec9.6up.pdf |
ASR: Acoustic Modeling
|
| 5 |
Feb 9 |
1-pgraph project proposal lec10.ppt lec10.6up.pdf |
ASR: Dealing with Variation (Adaptation, MLLR, Pronunciation modeling) |
| 6 |
Feb 14 |
HW 4 due lec11.ppt lec11.6up.pdf |
N-grams and Language Modeling
|
| 6 |
Feb 16 |
lec12.ppt lec12.6up.pdf |
ASR: Search (Lattices, N-best lists, A*, etc) |
| 7 |
Feb 21 |
lec13.ppt lec13.6up.pdf |
Conversational Agents: Simple dialogue systems (frame-based systems, VXML, evaluation) |
| 7 |
Feb 23 |
HW 5 due
lec14.ppt lec14.6up.pdf |
Conversational Agents: Grounding, Confirmation, Dialogue Acts |
| 8 |
Feb 28 |
Ferrer.pdf |
Guest Speaker: Luciana Ferrer on Speaker Recognition
|
| 8 |
March 2 |
HW 6 due lec15.ppt lec15.6up.pdf |
Conversational Agents III: Markov Decision Processes (MDPs), etc |
| 9 |
Mar 7 |
lec16.ppt lec16.6up.pdf |
Disfluencies and Metadata: Boundaries, Fillers, Edit Terms
|
| 9 |
Mar 9 |
Guest Speaker Mark Mao on Acoustic Modeling: Covariance Modeling |
|
| 10 |
Mar 14 |
lec17.ppt lec17.6up.pdf |
Emotional speech |
| 10 |
Mar 16 |
Final Project Presentations |
|