In April of 1995, Lycos had the largest index of the web with 3.6 million web
pages. Today, 15 years later, the largest web index is Google's index with
over 10 billion pages. The proliferation of online data in the past decade
has increased the visibility and importance of data mining, and has also
caused some fundamental changes in methods for data mining.
This project course will focus on methods for mining of large-scale
unstructured data sets. The format is seminar-style, and students will read
recent research papers in data mining and present them in class. Students
should have basic knowledge of machine learning and statistics.
Prerequisites
Familiarity with the basic concepts of probability theory.
(Stat116
is sufficient but not necessary.)
Familiarity with linear algebra. (Math 113 or CS237A are sufficient
but
not neccessary).
Knowledge of basic computer science principles and skills at the
level
of CS103.
Ability to understand and analyze
algorithms and data structures.