Image credit

Course Description

Information retrieval is the process through which a computer system can respond to a user's query for text-based information on a specific topic. IR was one of the first and remains one of the most important problems in the domain of natural language processing (NLP). Web search is the application of information retrieval techniques to the largest corpus of text anywhere — the web — and it is the context where many people interact with IR systems most frequently.

In this course, we will cover basic and advanced techniques for building text-based information systems, including the following topics:

  • Efficient text indexing
  • Boolean and vector-space retrieval models
  • Evaluation and interface issues
  • IR techniques for the web, including crawling, link-based algorithms, and metadata usage
  • Document clustering and classification
  • Traditional and machine learning-based ranking approaches

Class time & location

Spring quarter 2019
Lecture times: Tues/Thurs, 4:30–5:50pm, April 1 to June 5
Location: Gates B1 (Basement)

Grading & course policies

See the course policies page for details on grading, late days, and other policies.

Required textbook

Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze (Cambridge University Press, 2008).

This book is available from Amazon, the Stanford bookstore, or your favorite book purveyor. You can also download and print chapters for free at the book website. (We’d appreciate any reports of typos or of higher-level problems for the third printing.)

This book will be referred to as IIR in the reading assignments listed in the course schedule section.

Other useful references

  • (MG) Managing Gigabytes, by I. Witten, A. Moffat, and T. Bell.
  • (IRAH) Information Retrieval: Algorithms and Heuristics, by D. Grossman and O. Frieder.
  • (MIR) Modern Information Retrieval, by R. Baeza-Yates and B. Ribeiro-Neto.
  • (FSNLP) Foundations of Statistical Natural Language Processing, by C. Manning and H. Schütze.
  • (SE) Search Engines: Information Retrieval in Practice, by B. Croft, D. Metzler, and T. Strohman.
  • (IRIE) Information Retrieval: Implementing and Evaluating Search Engines, by S. Büttcher, C. Clarke, and G. Cormack.

Prerequisites

Programming Tutorials

Note:
Some of the slides and video links are from previous offering of the course. We leave them here for your reference and they will be updated/replaced by each lecture. * marks the latest updated slides.
The complementary videos are on Canvas, and the slides of the videos are linked below.

Course Schedule


Week Date Event Description & materials Readings & other resources
Week 1 Tues. 4/2 Lecture (Pandu) Introduction to the course

Thurs. 4/4 Lecture (Chris) Inverted Indices: Dictionary and postings lists, boolean querying

Week 2 Tues. 4/9 Lecture (Pandu) Index Construction

Tues. 4/9 PA1 release Programming assignment #1 released
Thurs. 4/11 Lecture (Chris) Algorithms for postings list compression

Week 3 Tues. 4/16 Lecture (Pandu) Spelling correction

  • Videos: "Dictionaries and Tolerant Retrieval"
  • Slides: PPT| PDF/6 | PDF/1
Tues. 4/16 PS1 release Problem set #1 released
Tues. 4/16 Query quiz release Query quiz released
Thurs. 4/18 Lecture (Pandu) Scoring, term weighting and the vector space model

Sun. 4/20 Query quiz due Query quiz due
Week 4 Tues. 4/23 PA1 due Programming assignment #1 due
Tues. 4/23 Guest lecture Guest lecture by Joachim Kupke (Principal Software Engineer, Google)

NOTE: attendance required for on-campus students
Tues. 4/23 PA2 release Programming assignment #2 released
Thurs. 4/25 Lecture (Chris) Probabilistic IR: the binary independence model, BM25, BM25F

  • Videos: "Vector Space Model"
  • Slides: PPT | PDF/6 | PDF/1
  • Week 5 Tues. 4/30 PS1 due Problem set #1 due
    Tues. 4/30 Lecture (Chris) Evaluation methods & NDCG

    Tues. 4/30 Ranking quiz release Ranking quiz released
    Thurs. 5/2 Lecture (Pandu) Systems issues in efficient retrieval and scoring

    Slides:
    Week 6 Tues. 5/7 PA2 due Programming assignment #2 due
    Tues. 5/7 Lecture (Pandu) Classification and clustering in vector spaces(Naive Bayes, kNN, decision boundaries)

    Slides:
    • Videos: "Naive Bayes"
    Thurs. 5/9 Lecture (Chris) Text classification

    Slides:
    Thurs/ 5/9 PA3 release Programming assignment #3 released
    Week 7 Tues. 5/14 Lecture (Chris) Distributed word representations for IR

    Slides:
    Tues. 5/14 PS2 released Problem set #2 released
    Thurs. 5/16 Lecture (Chris) Learning to rank

    Slides:
    Week 8 Tues. 5/21 Lecture (Chris) Link analysis

    Slides:
    Thurs. 5/23 PS2 due Problem set #2 due
    Thurs. 5/23 Guest lecture Guest lecture by Susan Dumais (Distinguished Scientist & Deputy Managing Director, Microsoft Research Lab)

    Slides: NOTE: attendance required for on-campus students
    Week 9 Tues. 5/28 Lecture (Pandu) Crawling and near-duplicate pages

    Slides:
    Thurs. 5/30 PA3 due Programming assignment #3 due
    Thurs. 5/30 Lecture (Chris) Question answering

    Slides:
    Week 10 Tues. 6/4 Lecture (Pandu) Personalization

    Slides:
    Exam week Fri. 6/7 Final exam Alternate final exam (8:30-11:30am)
    Wed. 6/12 Final exam Final exam (3:30-6:30pm) Practice final and solution are on Canvas

    FAQ

    Can I take this course on credit/no credit basis?
    Yes. Credit will be given to those who would have otherwise earned a C- or above.
    Can I audit or sit in?
    In general we are very open to sitting-in guests if you are a member of the Stanford community (registered student, staff, and/or faculty). Out of courtesy, we would appreciate that you first email us or talk to the instructor after the first class you attend.
    I have a question about the class. What is the best way to reach the course staff?
    In general, we ask students to use the Piazza forum for our class so that other students may benefit from your questions and our answers. If you have a personal matter that you believe is not appropriate to share on Piazza (even in a private post), you may email the course staff at cs276-spr1819-staff@lists.stanford.edu. We may NOT be able to reply emails sent to individual instructors or TAs regarding the class.
    As an SCPD student, how do I take the final exam?
    For SCPD students, if you are local, you're encouraged to just come to Stanford for one of the on-campus exams. If you decide to take on-campus exams, please let us know in advance (through a survey that we send out closer to the final exam date). If you are not local or can't make it at the on-campus exams, you need to line up an exam monitor (usually your manager or a co-worker at your company), and submit the form specifying this person to SCPD in advance. You won't get an exam if you don't have an exam monitor on file. You need to make sure we get the exam back promptly (monitor should scan and email directly to us).If you are taking the exam in the first 24 hour period, you need to make sure we get the exam back from your monitor by Saturday 12:30 pm PT. If you are taking the exam in the second 24 hour period, you need to make sure we get the exam back from your monitor by Wednesday 7:30 pm PT. We need to grade exams immediately after that in order to be able to turn grades in in time. Please refer to the course policies page for Final exam details
    Will there be virtual office hours for SCPD students?
    We will be sure to join a Google hangout for at least some office hours. We will use QueueStatus and post google hangout link on QueueStatus page in each office hour for SCPD students.