|
|
CS 276
/ LING 286 |
IIR = Introduction
to Information Retrieval, by C. Manning, P. Raghavan,
and H. Schütze. Cambridge
University Press, 2008.
This book is available from the Stanford bookstore
(or your favorite book purveyor). You can also download and print chapters at
the book
website. (We’d appreciate any reports of typos or of higher-level problems
for the third printing. Thanks.)
MG = Managing
Gigabytes, by I. Witten, A. Moffat, and T. Bell.
IRAH = Information Retrieval: Algorithms
and Heuristics by D. Grossman and O. Frieder.
IR = Modern Information Retrieval, by
R. Baeza-Yates and B. Ribeiro-Neto.
FOA = Finding Out About, by R. Belew.
MTW = Mining the Web, by S. Chakrabarti.
FSNLP = Foundations of Statistical
Natural Language Processing, by C. Manning and H. Schütze.
These books all have useful information on topics
that we cover and are recommended as references. MG is particularly good as a
detailed reference for technical IR in the first half of the course. MTW covers
many of the topics from the latter part of the course.
CM = Christopher Manning
PR = Prabhakar Raghavan
TA = TA
All lectures will be held at Gates B01.
Lectures are Tuesdays and Thursdays from 4:15 to 5:30.
Six review sessions are scheduled for assignments
and exams.
Review sessions time and place is Tuesdays from 1:15 to 2:05 in Skilling 193.
Scroll down to see the specific dates for the review sessions.
Final exam is scheduled for December 9th, Wed 12:15-3:15pm.
Details of the schedule, slides and reading lists
will be updated as the quarter progresses.
|
Date |
Topics |
Notes |
Who |
Readings |
Assignments |
|
Tue 22 Sep |
IR 1. Introduction to Information Retrieval. Inverted
indices and boolean queries. Query optimization.
The nature of unstructured and semi-structured text. Course administrivia. |
[powerpoint] |
CM |
IIR
Ch. 1 |
|
|
Thu 24 Sep |
IR 2. The term vocabulary and postings lists. Text
encoding: tokenization, stemming, lemmatization, stop
words, phrases. Optimizing indices with skip lists. Proximity and phrase
queries. Positional indices. |
[powerpoint] |
CM |
IIR
Ch. 2 |
|
|
Tue 29 Sep |
IR 3. Dictionaries and tolerant retrieval. Dictionary data
structures. Wild-card queries, permuterm indices,
n-gram indices. Spelling correction and synonyms: edit distance, soundex, language detection. |
[powerpoint] |
PR |
IIR
Ch. 3 |
[PS1 out] |
|
Thu 1 Oct |
IR 4. Index construction. Postings size estimation,
sort-based indexing, dynamic indexing, positional indexes, n-gram indexes,
distributed indexing, real-world issues. |
[powerpoint] |
CM |
IIR
Ch. 4 |
|
|
Tue 6 Oct |
Review session for PS1 |
|
TA |
|
|
|
Tue 6 Oct |
IR 5. Index compression: lexicon compression and postings
lists compression. Gap encoding, gamma codes, Zipf's
Law, variable-byte encoding. Blocking. Extreme compression. |
[powerpoint] |
CM |
IIR Ch. 5 |
[PE1 out] |
|
Thu 8 Oct |
IR 6. Scoring, term weighting, and the vector space model.
Parametric or fielded search. Document zones. The vector space retrieval
model. tf.idf weighting.
The cosine measure. Scoring documents. [YK] |
[powerpoint] |
CM |
PS1 due |
|
|
Tue 13 Oct |
Review session for PE1 |
TA |
|
|
|
|
Tue 13 Oct |
IR 7. Computing scores in a complete search system:
Components of an IR system. Efficient vector space scoring. Nearest neighbor
techniques, reduced dimensionality approximations, random projection. |
[powerpoint] |
PR |
|
|
|
Thu 15 Oct |
IR 8. Results summaries: static and dynamic. Evaluating
search engines. User happiness, precision, recall, F-measure. Creating test collections: kappa measure, interjudge
agreement. Relevance, approximate vector retrieval. |
[powerpoint] |
PR |
IIR
Ch. 8 |
PE1 due |
|
Tue 20 Oct |
Review session for midterm |
|
TA |
|
|
|
Tue 20 Oct |
IR 9. Relevance feedback. Pseudo relevance feedback. Query
expansion. Automatic thesaurus generation. Sense-based retrieval.
Experimental results of performance effectiveness. |
[powerpoint] |
PR |
IIR
Ch. 9 |
|
|
Thu 22 Oct |
Midterm to be held in-class |
|
cm |
Midterm Statistics |
Midterm |
|
Tue 27 Oct |
CLASSIFICATION 1. Introduction to text classification.
Naive Bayes models. Spam filtering. |
[powerpoint] |
CM |
IIR
Ch. 13 Tackling
the poor assumptions of Naive Bayes classifier (Rennie
et al. 2003) (for PE2) |
|
|
Thu 29 Oct |
CLASSIFICATION 2. K Nearest Neighbors, Decision boundaries,
Vector space classification using centroids,
Decision Trees. Comparative results. |
[powerpoint] |
CM |
IIR
Ch. 14 |
|
|
Tue 3 Nov |
CLUSTERING 1. Introduction to the problem. Partitioning
methods: k-means clustering; Hierarchical clustering. |
[powerpoint] |
PR |
|
[PS2 out] |
|
Thu 5 Nov |
CLUSTERING 2. Latent semantic indexing (LSI). Applications
to clustering and to information retrieval. |
[powerpoint] |
PR |
|
|
|
Tue 10 Nov |
Review session for PS2 |
TA |
|
|
|
|
Tue 10 Nov |
CLASSIFICATION 3. Support vector machine classifiers.
Kernel Function. Evaluation of classification. Micro- and macro-averaging.
Learning rankings. |
[powerpoint] |
CM |
IIR
Ch. 15 |
|
|
Thu 12 Nov |
Web 4: Learning to rank |
[powerpoint] |
CM |
IIR
6.1.2-3, IIR 15.4 |
PS2 due |
|
Tue 17 Nov |
Review session for PE2 |
|
TA |
|
|
|
Tue 17 Nov |
Web 1: What makes the web different. Web search overview,
web structure, the user, paid placement, search engine optimization/spam. Web
size measurement. |
[powerpoint] |
PR |
|
|
|
Thu 19 Nov |
Web 2: Crawling and web indexes. Near-duplicate
detection. |
[powerpoint] |
PR |
IIR
Ch. 20 |
PE2 due |
|
Tue 24 Nov |
Thanksgiving
(no class) |
|
-- |
|
|
|
Thu 26 Nov |
Thanksgiving
(no class) |
|
-- |
|
|
|
Tue 1 Dec |
Review session for final |
|
TA |
|
[Practice Final] |
|
Tue 1 Dec |
Web 3: Link analysis |
[powerpoint] |
PR |
IIR
Ch. 21 |
|
|
Thu 3 Dec |
Text understanding and mining: Question Answering |
[powerpoint] |
CM |
AskMSR: Question answering using the worldwide web (Banko et al.) |
|
|
|
|
|
|
|
|
|
7-11 Dec |
Final Exam: December 9th, Wed 12:15-3:15pm. |
|
|
|
Final |