|
CS 276 / LING 286 |
IIR = Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze. Cambridge University Press, 2008.
This book is available in from the Stanford bookstore (or your favorite book purveyor). You can also download and print chapters at the book website. (The book is brand new and we’d appreciate any reports of typos or of higher-level problems for the second printing. Thanks.)
MG = Managing Gigabytes, by I. Witten, A. Moffat, and T. Bell.
IRAH = Information Retrieval: Algorithms and Heuristics by D. Grossman and O. Frieder.
IR = Modern Information Retrieval, by R. Baeza-Yates and B. Ribeiro-Neto.
FOA = Finding Out About, by R. Belew.
MTW = Mining the Web, by S. Chakrabarti.
FSNLP = Foundations of Statistical Natural Language Processing, by C. Manning and H. Schütze.
These books all have useful information on topics that we cover and are recommended as references. MG is particularly good as a detailed reference for technical IR in the first half of the course. MTW covers many of the topics from the latter part of the course.
CM = Christopher Manning
PR = Prabhakar Raghavan
TA = TA
All lectures will be held at Gates B01.
Lectures are Tuesdays and Thursdays from 4:15 to 5:30.
Six review sessions are scheduled for assignments and exams.
Review sessions are on Tuesdays from 1:15 to 2:05 in Skilling 193.
Scroll down to see the specific dates for the review sessions.
Final exam is schedule for December 11 from 12:15 to 3:15 in Gates B01.
Please contact the course staff if you wish to take the alternative exam on December 9 from 12:15 to 3:15.
Details of the schedule, slides and reading lists will be updated as the quarter progresses.
Date |
Topics |
Notes |
Who |
Readings |
Assignments |
Tue 23 Sep |
IR 1. Introduction to Information Retrieval. Inverted indices and boolean queries. Query optimization. The nature of unstructured and semi-structured text. Course administrivia. |
[powerpoint] |
PR |
IIR Ch. 1 |
|
Thu 25 Sep |
IR 2. The term vocabulary and postings lists. Text encoding: tokenization, stemming, lemmatization, stop words, phrases. Optimizing indices with skip lists. Proximity and phrase queries. Positional indices. |
[powerpoint] |
CM |
IIR Ch. 2 |
|
Tue 30 Sep |
IR 3. Dictionaries and tolerant retrieval. Dictionary data structures. Wild-card queries, permuterm indices, n-gram indices. Spelling correction and synonyms: edit distance, soundex, language detection. |
[powerpoint] |
CM |
IIR Ch. 3 |
[PS1 out] |
Thu 2 Oct |
IR 4. Index construction. Postings size estimation, sort-based indexing, dynamic indexing, positional indexes, n-gram indexes, distributed indexing, real-world issues. |
[powerpoint] |
CM |
IIR Ch. 4 |
|
Tue 7 Oct |
Review session for PS1 |
TA |
|||
Tue 7 Oct |
IR 5. Index compression: lexicon compression and postings lists compression. Gap encoding, gamma codes, Zipf's Law, variable-byte encoding. Blocking. Extreme compression. |
[powerpoint] |
PR |
IIR Ch. 5 |
[PE1 out] |
Thu 9 Oct |
IR 6. Scoring, term weighting, and the vector space model. Parametric or fielded search. Document zones. The vector space retrieval model. tf.idf weighting. The cosine measure. Scoring documents. [YK] |
[powerpoint] |
PR |
PS1 due |
|
Tue 14 Oct |
Review session for PE1 |
TA |
|||
Tue 14 Oct |
IR 7. Computing scores in a complete search system: Components of an IR system. Efficient vector space scoring. Nearest neighbor techniques, reduced dimensionality approximations, random projection. |
[powerpoint] |
PR |
||
Thu 16 Oct |
IR 8. Results summaries: static and dynamic. Evaluating search engines. User happiness, precision, recall, F-measure. Creating test collections: kappa measure, interjudge agreement. Relevance, approximate vector retrieval. |
[powerpoint] |
CM |
IIR Ch. 8 |
PE1 due |
Tue 21 Oct |
Review session for midterm |
TA |
|||
Tue 21 Oct |
IR 9. Relevance feedback. Pseudo relevance feedback. Query expansion. Automatic thesaurus generation. Sense-based retrieval. Experimental results of performance effectiveness. |
[powerpoint] |
PR |
IIR Ch. 9 |
|
Thu 23 Oct |
Midterm to be held in-class |
cm |
Midterm |
||
Tue 28 Oct |
CLASSIFICATION 1. Introduction to text classification. Naive Bayes models. Spam filtering. |
[powerpoint] |
CM |
IIR Ch. 13 |
|
Thu 30 Oct |
CLASSIFICATION 2. K Nearest Neighbors, Decision boundaries, Vector space classification using centroids, Decision Trees. Comparative results. |
[powerpoint] |
CM |
IIR Ch. 14 |
|
Tue 4 Nov |
CLASSIFICATION 3. Support vector machine classifiers. Kernel Function. Evaluation of classification. Micro- and macro-averaging. Learning rankings. |
[powerpoint] |
CM |
IIR Ch. 15 |
[PS2 out] |
Thu 6 Nov |
Web 1: What makes the web different. Web search overview, web structure, the user, paid placement, search engine optimization/spam. Web size measurement. |
[powerpoint] |
PR |
||
Tue 11 Nov |
Review session for PS2 |
TA |
|||
Tue 11 Nov |
Web 2: Crawling and web indexes. Near-duplicate detection. |
[powerpoint] |
PR |
IIR Ch. 20 |
|
Thu 13 Nov |
Web 3: Link analysis |
[powerpoint] |
PR |
IIR Ch. 21 |
PS2 due |
Tue 18 Nov |
Review session for PE2 |
TA |
|||
Tue 18 Nov |
Web 4: Learning to rank |
[powerpoint] |
CM |
IIR
6.1.2-3, IIR 15.4 |
|
Thu 20 Nov |
CLUSTERING 1. Introduction to the problem. Partitioning methods: k-means clustering; Hierarchical clustering. |
[powerpoint] |
CM |
PE2 due |
|
Tue 25 Nov |
Thanksgiving (no class) |
-- |
|||
Thu 27 Nov |
Thanksgiving (no class) |
-- |
|||
Tue 2 Dec |
Review session for final |
TA |
|||
Tue 2 Dec |
CLUSTERING 2. Latent semantic indexing (LSI). Applications to clustering and to information retrieval. |
[powerpoint] |
CM |
||
Thu 4 Dec |
Text understanding and mining: Question Answering |
[powerpoint] |
CM |
AskMSR: Question answering using the worldwide web (Banko et al.) |
|
Tue 9 Dec |
Alternative Final Exam, 12:15-3:15 in Gates B01 | Pleaset contact the course staff if you cannot make the scheduled final exam time and need to take the alternative exam. | Alt. Final |
||
Thu 11 Dec |
Final Exam, 12:15-3:15 in Gates B01 |
Final |