STANFORD CME 340
Computational Methods in Data Mining
Winter 2007


Announcements

  • (01/10) Please e-mail me your preferences for the paper that you would like to present.
  • Course Information

    Professor:S. Kamvar
    Seminar:Thurs 5:15-6:45PM, 1 unit
    Location:TBD
    Office HoursTuesday 2:30-3:30PM
    Location:M08

    Syllabus

  • Week 1: Information Visualization
    We Feel Fine and Searching the Emotional Web, S Kamvar, J Harris

  • Week 2: Search
    Authoritative Sources in a Hyperlinked Environment, JM Kleinberg.
    The PageRank Citation Ranking: Bringing Order to the Web, L Page, S Brin, R Motwani, T Winograd.

  • Week 3: Personalized Search
    Topic-Sensitive PageRank, TH Haveliwala.
    Exploiting the Block Structure of the Web for Computing PageRank, S Kamvar, T Haveliwala, C Manning, G Golub

  • Week 4: Latent Semantic Indexing
    Indexing By Latent Semantic Analysis Journal of the American Society for Information Science, Deerwester, Dumais, Furnas, Landauer, Harshman
    Probabilistic Latent Semantic Indexing, T Hofmann.

  • Week 5: Recommender Systems
    Amazon.com Recommendations: Item-to-Item Collaborative Filtering, G Linden, B Smith, J York.
    Lessons from the Netflix Prize Challenge, R Bell, Y Koren.
    Item-based collaborative Filtering Recommendation Algorithms, B Sarwar, G Karypis, J Konstan, J Riedl.

  • Week 6: Classification and Clustering
    Hierarchically Classifying Documents Using Very Few Words, D Koller, M Sahami.
    Principal Direction Divisive Partitioning, D Boley.

  • Week 7: Human Computation
    Analyzing the Mechanical Turk Marketplace. P Ipeirotis.
    TurKit: Human Computation Algorithms on Mechanical Turk G Little, L Chilton, M Goldman, R Miller.

  • Week 8: Peer-to-Peer and Social Search
    The Anatomy of a Large-Scale Social Search Engine. D Horowitz, S Kamvar.
    The EigenTrust Algorithm for Reputation Management in P2P Networks S Kamvar,M Schlosser, H Garcia-Molina

  • Course Description

    In April of 1995, Lycos had the largest index of the web with 3.6 million web pages. Today, 15 years later, the largest web index is Google's index with over 10 billion pages. The proliferation of online data in the past decade has increased the visibility and importance of data mining, and has also caused some fundamental changes in methods for data mining.

    This project course will focus on methods for mining of large-scale unstructured data sets. The format is seminar-style, and students will read recent research papers in data mining and present them in class. Students should have basic knowledge of machine learning and statistics.

    Prerequisites