|
|
CS 276 / LING 286 |
Course Syllabus
Required Textbook:
IIR
= Introduction to Information Retrieval, by C. Manning, P. Raghavan, and
H. Schütze.
This
book is available in draft form, as a reader, from the Stanford bookstore. You
can also download and print (perhaps updated) chapters at the book
website or from the course syllabus.
We’re trying to improve and finish this book, so any comments from typos
through higher-level content and organization suggestions would be greatly
appreciated! (It’d be useful if you could mention the date of the version you
were reading.)
Other Good IR etc. Books:
MG
= Managing Gigabytes, by I.
MIR = Modern Information Retrieval, by R. Baeza-Yates and B.
Ribeiro-Neto.
FOA = Finding Out About, by R. Belew.
MTW = Mining the Web, by S.
Chakrabarti.
FSNLP = Foundations of Statistical Natural Language Processing, by C.
Manning and H. Schütze.
IRAH = Information Retrieval: Algorithms and Heuristics by D. Grossman
and O. Frieder.
These
books all have useful information on topics that we cover and are recommended
as references. MG is particularly good as a detailed reference for technical IR
in the first half of the course. MTW covers many of the topics from the latter
part of the course.
Lecturers:
CM
= Christopher Manning
PR = Prabhakar Raghavan
AN = Ani Nenkova
TA = TA
Locations and times:
All
lectures will be held at Gates B03 and review sessions will be
held at Skilling 193. Final will be at Skilling Aud.
Lectures are Tuesdays and Thursdays from 4:15 to 5:30.
Review sessions, when scheduled, are Friday from
Schedule:
Details
of the schedule, slides and reading lists will be updated as the quarter
progresses.
We have scheduled review sessions for assignments and exams on Friday
afternoons.
|
Date |
Topics |
Notes |
Who |
Readings |
Assignments |
|
Tue 26 Sep |
IR 1. Introduction to Information Retrieval. Inverted
indices and boolean queries. Query optimization. The nature of unstructured
and semi-structured text. Course administrivia. |
|
CM |
IIR 1:
pdf |
|
|
Thu 28 Sep |
IR 2. Text encoding: tokenization, stemming,
lemmatization, stop words, phrases. Further optimizing indices for query
processing. Proximity and phrase queries. Positional indices. |
[powerpoint] |
CM |
IIR 2:
pdf |
|
|
Tue 3 Oct |
IR 3. Tolerant retrieval: spelling correction and
synonyms. Wild-card queries, permuterm indices, n-gram indices. Edit
distance, soundex, language detection. |
[powerpoint] |
PR |
IIR 3 pdf Mikael Tillenius: Efficient Generation and
Ranking of Spelling Error Corrections. Master's thesis at Efficient spell retrieval:n K. Kukich. Techniques for
automatically correcting words in text. ACM Computing Surveys 24(4), Dec
1992. |
PS1 out |
|
Thu 5 Oct |
IR 4. Index construction. Postings size estimation, merge
sort, dynamic indexing, positional indexes, n-gram indexes, real-world
issues. |
[powerpoint] |
PR |
IIR 4
pdf |
|
|
Fri 6 Oct |
Review session for PS1 |
|
TA |
|
|
|
Tue 10 Oct |
IR 5. Index compression: lexicon compression and postings
lists compression. Gap encoding, gamma codes, Zipf's Law. Blocking. Extreme
compression. |
[powerpoint] |
AN |
IIR 5 pdf |
PE1 out |
|
Thu 12 Oct |
IR 6. Parametric or fielded search. Document zones. The
vector space retrieval model. tf.idf weighting. Scoring documents. |
[powerpoint] |
AN |
IIR 6
pdf Exploring
the Similarity Space - Zobel and Moffat (1998). |
PS1 due |
|
Fri 13 Oct |
Review session for PE1 |
|
TA |
|
|
|
Tue 17 Oct |
IR 7. Vector space scoring. The cosine measure.
Efficiency considerations. Nearest neighbor techniques, reduced
dimensionality approximations, random projection. |
[powerpoint] |
CM |
IIR 7.
pdf Anh, V.N., de Krester, O, and A. Moffat. Vector-Space Ranking with Effective Early Termination, SIGIR 2001.
Random
projection theorem- Dasgupta and Gupta. An elementary proof of the
Johnson-Lindenstrauss Lemma (1999). Faster
random projection - A.M. Frieze, R. Kannan, S. Vempala. Fast Monte-Carlo
Algorithms for finding low-rank approximations. IEEE Symposium on Foundations
of Computer Science, 1998. |
|
|
Thu 19 Oct |
IR 8. Results summaries: static and dynamic. Evaluating
search engines. User happiness, precision, recall, F-measure. Creating test collections:
kappa measure, interjudge agreement. Relevance, approximate vector retrieval.
|
[powerpoint] |
CM |
IIR 8
pdf MG 4.5 MIR Chapter 3 |
PE1 due |
|
Fri 21 Oct |
Review session for midterm |
|
TA |
|
|
|
Tue 24 Oct |
IR 9. Relevance feedback. Pseudo relevance feedback. Query
expansion. Automatic thesaurus generation. Sense-based retrieval.
Experimental results of performance effectiveness. |
[powerpoint] |
CM |
IIR 9.
pdf |
|
|
Thu 26 Oct |
Midterm to be held in-class |
|
|
|
|
|
Tue 31 Oct |
CLUSTERING 1. Introduction to the problem. Partitioning
methods: k-means clustering; Hierarchical clustering.
|
[powerpoint] |
PR |
|
|
|
Thu 2 Nov |
CLUSTERING 2.
Latent semantic indexing (LSI). Applications to clustering and to information
retrieval. |
[powerpoint] |
PR |
|
|
|
Tue 7 Nov |
CLASSIFICATION 1. Introduction to text classification.
Naive Bayes models. Spam filtering. |
[powerpoint] |
CM |
IIR 13. |
PS2 out |
|
Thu 9 Nov |
CLASSIFICATION 2. K Nearest Neighbors, Decision boundaries, Vector space classification using centroids,
Decision Trees
|
[powerpoint] |
CM |
IIR 14. |
|
|
Fri 10 Nov |
Review session for PS2 |
|
TA |
|
|
|
Tue 14 Nov |
CLASSIFICATION 3. Support vector machine classifiers. Kernel
Function. Evaluation of classification. Micro- and macro-averaging.
Comparative results. |
[powerpoint] |
CM |
IIR 15. |
PE2 out |
|
Thu 16 Nov |
Web 1: What makes the web different. Web search overview,
web structure, the user, paid placement, search engine optimization/spam. |
[powerpoint] |
PR |
PS2 due |
|
|
Fri 17 Nov |
Review session for PE2 |
|
TA |
|
|
|
Tue 21 Nov |
Thanksgiving (no class) |
|
|
|
|
|
Thu 23 Nov |
Thanksgiving (no class) |
|
|
|
|
|
Tue 28 Nov |
Web 2: Web characteristics II:
Web size measurement,
Near-duplicate detection |
[powerpoint] |
PR |
IIR 19. |
|
|
Thu 30 Nov |
Web 3: Crawling and web indexes |
[powerpoint] |
PR |
IIR 20
|
PE2 due |
|
Tue 5 Dec |
Web 4: Link analysis |
[powerpoint] |
PR |
Ranking the Web Frontier |
|
|
Thu 7 Dec |
Text understanding and mining: Question Answering |
[powerpoint] |
CM |
|
|
|
Fri 8 Dec |
Review session for final |
|
TA |
|
|
|
Mon 11 Dec |
Final Exam 12:15-3:15, Skilling Aud |
|
|
|
|