CS 224S/LINGUIST 285
Spoken Language Processing

Spring 2014 · Dan Jurafsky

Introduction to spoken language technology with an emphasis on dialogue and conversational systems. Automatic speech recognition, extraction of affect and social meaning from speech, speech synthesis, dialogue management, and applications to digital assistants, search, and recommender systems.

Room: 260-113, TuTh 2:15-3:30pm

Schedule Piazza Forum

Schedule

Week Date Homework In-class Readings
1 Apr 1 and 3 -
Course Overview and History, Articulatory Phonetics and ARPAbet transcription [pptx] [pdf]
  • J+M Chapter 7: Phonetics, 215-230
Phonetics: Acoustic Phonetics [pptx] [pdf]
  • J+M Chapter 7: Phonetics, 230-end
2 Apr 8 and 10

Homework 1 due Apr 8 2:00pm

ASR: Noisy Channel Model, HMMs, Forward, Viterbi, Word Error Rate [pptx] [pdf]
  • J+M Chapter 6: Hidden Markov Models, pages 173-186
  • J+M Chapter 9: Automatic Speech Recognition, pages 285-295
ASR: HMMs continued: Baum Welch, Advanced Decoding [pptx] [pdf]
  • J+M Chapter 6: Hidden Markov Models, pages 186-192
  • J+M Chapter 9: Automatic Speech Recognition pages 314-333
  • J+M Chapter 10, Speech Recognition: Advanced Topics, section 10.1, 335-341
(On your own: ASR: Language Modeling)
  • If you have never had language modeling (i.e., have never taken CS124 or CS224N or similar) please either watch the language modeling videos for CS124 on Stanford Coursera or else read Chapter 4 (pages 83-100, 4.1-4.5.1; 104,4.6, and 109-111,4.9.1) of J+M. Just the first 6 of the 8 videos is sufficient. You may need to create an account. The lectures are here, look under Language Modeling:
    https://stanford.coursera.org/cs124-002/lecture
3 Apr 15 and 17

Homework 2 due Apr 15 2:00pm

ASR: Acoustic Modeling [pptx] [pdf]
  • J+M Chapter 9: Automatic Speech Recognition pages 295-314
  • J+M Chapter 10: Speech Recognition Advanced Topics 10.3 pages 345-349
ASR: Feature Extraction [pptx] [pdf]
  • J+M Section 9.3 (295-303)
4 Apr 22 and 24

Homework 3 due Apr 22 2:00pm

Social Meaning Extraction
5 Apr 29 and May 1

Homework 4

Conversational Agents: Human conversation, simple frame-based dialogue systems
  • J+M Chapter 24, Dialogue and Conversational Agents, pages 811-838
Conversational Agents: Grounding, Confirmation, Dialogue Acts, Evaluation
  • J+M Chapter 24, Dialogue and Conversational Agents, pages 838-846
6 May 6 and 8

Homework 5:

Extracting Social Meaning
Extracting Paralinguistics and Medical Informatics
7 May 13 and 15
Conversational Agents III: Markov Decision Processes (MDPs), etc
  • J+M Chapter 24, Dialogue and Conversational Agents, pages 846-end
Siri and Mobile Conversational Agents
8 May 20 and 22
Text to Speech (TTS)
Speaker Identification and Verification (and Conclusion of TTS)
9 May 27 and 29
Deep Neural Networks for Acoustic Modeling (Lecture by Andrew Maas)
Thursday TBD
10 June 3 -
Final Project Draft Presentations
- Monday June 9: 12:00 noon - -
Final Project Due

Course Information

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: Mondays 12:30-1:30 or by appointment
Teaching Assistants

Andrew Maas
Peng Qi
Sushobhan Nayak
Frank Liu

TA Office Hours
  • Monday 12:30-1:30, Dan, 117 Margaret Jacks
  • Monday 6:00-7:00pm, Gates B28, Homework Office Hour
  • Tuesday 1:00-2:00pm, Gates 120, Andrew
  • Wednesday 7:30-8:30pm, Gates B28, Peng
  • Thursday 4:00-5:00pm, Gates B28, Sushobhan
  • Friday 4:00-5:00pm, Gates B28, Frank
Class Time

Tuesday and Thursday 2:15-3:30pm. Room is currently 260-113, although it might change so watch this space.

Email

If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs224s-spr1314-staff@lists.stanford.edu. We can not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs224s-spr1314-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

Textbooks
  • Required: Jurafsky and Martin. 2009. Speech and Language Processing (2nd Edition). Pearson. There are two copies on reserve in the library.

Course Description

Introduction to spoken language technology with an emphasis on dialogue and conversational systems. Automatic speech recognition, extraction of affect and social meaning from speech, speech synthesis, dialogue management, and applications to digital assistants, search, and recommender systems.

Prerequisites

CS 124, 221, 224N, or 229

Required Work

Homeworks

5 homeworks. Homework is due at 2:00pm on the day it is due (i.e. before class starts).

Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself.

Late homeworks

You have 5 free late (calendar) days to use on the programming assignments Once these are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day.

Readings

We will expect you to do a significant amount of textbook reading in this course.

Final exam

There is no final exam for this course

Final project

Any project in speech recognition, speech synthesis, speech understanding, dialogue design, speech user interface design, etc etc. Projects should be joint; 3 people is optimal; 2 is acceptable only if you have a convincing reason. The final project will be presented as a poster at the poster session on Tuesday June 3, and is due on Monday June 9 at noon PST by email

Final grade
  • 45% homeworks
  • 45% final project
  • 10% class participation