CS 124/LINGUIST 180
From Languages to Information

Winter 2012

Dan Jurafsky

We are excited to announce that "From Languages to Information" is going online, adopting the format used by CS145 and CS229A!

What this means:



SCHEDULE
Wk
Date
HW

Topic and Readings

1
Jan 10 & 12
Intro Lecture [pptx] [pdf]

Basic Text Processing   [slides pptx]   [slides pdf]

  • J+M Section 2.1 Regular Expressions (17-26)
  • J+M section 3.9 Word and Sentence Tokenization (68-72)
  • MR+S Chapter 2: Term vocabulary and postings lists (Online version: 19-35, Paper version: 18-33)
  • Ken Church's tutorial Unix for Poets, at least pages 1-19

Edit Distance   [slides pptx]   [slides pdf]

  • J+M section 3.11: Minimum Edit Distance (pages 72-77)
2
Jan 17 & 19
HW 1: Spamlord

Due Fri Jan 20, 5:00pm

Language Modeling   [slides pptx]   [slides pdf]

  • J+M Chapter 4, N-grams

Spelling Correction and the Noisy Channel   [slides pptx]   [slides pdf]

3
Jan 24 & 26 HW 2: AutoCorrect!

Due Fri Jan 27, 5:00pm

Naive Bayes and Text Classification   [slides pptx]   [slides pdf]

  • MR+S Chapter 13: Text classification and Naive Bayes (skip sections 13.3 and 13.5) (Online version: 253-270, Paper version: 234-250)
Sentiment Analysis   [slides pptx]   [slides pdf]
4
Jan 31 & Feb 2
HW 3: Thumbs up!

Due Fri Feb 3, 5:00pm

MaxEnt Classifiers   [slides pptx]   [slides pdf]

  • J+M Chapter 6: Logistic Regression and MaxEnt Models, pages 193-211(=IE 227-245)

MEMM Sequence Models and Named Entity Tagging   [slides pptx]   [slides pdf]

  • J+M Chapter 22: Information Extraction, pages 727-734, 743-749 (=IE 761-768, 777-783)
5
Feb 7 & 9
HW 4: Extract!

Due Fri Feb 10, 5:00pm

Information Retrieval (I)   [slides pptx]   [slides pdf]

  • MR+S Chapter 1: Boolean Retrieval
  • The rest of MR+S Chapter 2: Term vocabulary and postings lists

Information Retrieval (II)   [slides pptx]   [slides pdf]

  • MR+S Chapter 6: Scoring, term weighting, and the vector space model
  • MR+S Chapter 8: Evaluation in Information Retrieval
6
Feb 14 & 16
HW 5: Search!

Due Fri Feb 17, 5:00pm
Relation Extraction   [slides pptx]   [slides pdf]
  • J+M Chapter 22: Information Extraction page 734-762 (=IE 768-785)

XML: accessing structured information

7
Feb 21 & 22

Word Meaning and Word Similarity   [slides pptx]   [slides pdf]
  • J+M Chapter 19: Lexical Semantics (pages 611-619 = IE 645-653)
  • J+M Chapter 20 Computational Lexical Semantics 20 (pages 652-670 = IE 686-704)
Question Answering   [slides pptx]   [slides pdf]
8
Feb 28 & Mar 1
HW 6: Jeopardy!

Due Fri Mar 2, 5:00pm

Machine Translation 1   [slides pptx]   [slides pdf]

  • J+M Chapter 25: Machine Translation, page 859-879 (=IE 895-915)

Machine Translation 2   [slides pptx]   [slides pdf]

  • J+M Chapter 25: Machine Translation, page 879-897 (=IE 915-933)
9
Mar 6 & Mar 8
HW 7: Translate!

Due Fri Mar 9, 5:00pm

Web graphs, Links, and PageRank
10
Mar 13 & 15
Review for the final
Review Answer Sheet

Mar 22

Final Exam
  • Thurs March 22, 12:15-3:15 in room 370-370