CS 124/LINGUIST 180
From Languages to Information

Winter 2012

Dan Jurafsky

We are excited to announce that "From Languages to Information" is going online, adopting the format used by CS145 and CS229A!

What this means:



SCHEDULE
Wk
Date
HW

Topic and Readings

1
Jan 10 & 12
Intro Lecture [pptx] [pdf]

Basic Text Processing   [slides pptx]   [slides pdf]

  • J+M Section 2.1 Regular Expressions (17-26)
  • J+M section 3.9 Word and Sentence Tokenization (68-72)
  • MR+S Chapter 2: Term vocabulary and postings lists (Online version: 19-35, Paper version: 18-33)
  • Ken Church's tutorial Unix for Poets, at least pages 1-19

Edit Distance   [slides pptx]   [slides pdf]

  • J+M section 3.11: Minimum Edit Distance (pages 72-77)
2
Jan 17 & 19
HW 1: Spamlord

Due Fri Jan 20, 5:00pm

Language Modeling   [slides pptx]   [slides pdf]

  • J+M Chapter 4, N-grams

Spelling Correction and the Noisy Channel   [slides pptx]   [slides pdf]

3
Jan 24 & 26 HW 2: Darn You AutoCorrect!

Due Fri Jan 27, 5:00pm

Naive Bayes and Text Classification

  • MR+S Chapter 13: Text classification and Naive Bayes (skip sections 13.3 and 13.5) (Online version: 253-270, Paper version: 234-250)
Sentiment Analysis

4
Jan 31 & Feb 2
HW 3: Thumbs up!

Due Fri Feb 3, 5:00pm

MaxEnt Classifiers

  • J+M Chapter 6: Logistic Regression and MaxEnt Models, pages 193-211(=IE 227-245)

MEMM Sequence Models and Named Entity Tagging

  • J+M Chapter 22: Information Extraction, pages 727-734, 743-749 (=IE 761-768, 777-783)
5
Feb 7 & 9
HW 4: Extract!

Due Fri Feb 10, 5:00pm

Information Retrieval (I)

  • MR+S Chapter 1: Boolean Retrieval
  • The rest of MR+S Chapter 2: Term vocabulary and postings lists

Information Retrieval (II)

  • MR+S Chapter 6: Scoring, term weighting, and the vector space model
  • MR+S Chapter 8: Evaluation in Information Retrieval
6
Feb 14 & 16
HW 5: Search!

Due Fri Feb 17, 5:00pm

Computing Word Meaning
  • J+M Chapter 19: Lexical Semantics (pages 611-619 = IE 645-653)
  • J+M Chapter 20 Computational Lexical Semantics 20 (pages 652-670 = IE 686-704)
Question Answering
7
Feb 21 & 22

XML: accessing structured information (I)

XML: accessing structured information (II)
  • XML in a Nutshell via Safari Tech books, Chapter 8 (XSLT),
  • XML in a Nutshell via Safari Tech books, Chapter 9 (XPath),
    To get these, go to library.stanford.edu/ezproxy/, choose Safari Tech Books, and search for XML in a Nutshell.
8
Feb 28 & Mar 1
HW 6: Jeopardy!

Due Fri Mar 2, 5:00pm
Relation and Information Extraction
  • J+M Chapter 22: Information Extraction (including Biomedical Information Extraction), page 734-743, 749-764 (=IE 768-777, 783-798)

Machine Translation 1

  • J+M Chapter 25: Machine Translation, page 859-879 (=IE 895-915)
9
Mar 6 & Mar 8
HW 7: Translate!

Due Fri Mar 9, 5:00pm

Machine Translation

  • J+M Chapter 25: Machine Translation, page 879-897 (=IE 915-933)

Web graphs, Links, and PageRank
10
Mar 13 & 15

Networks: Small Worlds and Fat Tails


Mar 22

Final Exam
  • Thurs March 22, 12:15-3:15