STATISTICAL LEARNING AND DATA MINING III
State-of-the-Art Statistical Methods for
Ten Hot Ideas for Learning from Data
A short course given by
Trevor Hastie and
both of Stanford University
This two-day course gives a detailed overview of statistical models
for data mining, inference and prediction. With the rapid
developments in internet technology, genomics, financial risk
modeling, and other high-tech industries, we rely increasingly more on
data analysis and statistical models to exploit the vast amounts of
data at our fingertips.
In this course we emphasize some of the most useful tools for tackling modern-day
data analysis problems.
Our top-ten list of topics are:
- Regression and Logistic Regression (two golden oldies),
Lasso and Related Methods,
Support Vector and Kernel Methodology,
Principal Components (SVD) and Variations: sparse SVD, supervised PCA,
Multidimensional Scaling and Isomap, Nonnegative Matrix Factorization, and Local Linear Embedding,
Boosting, Random Forests and Ensemble Methods,
Rule based methods (PRIM),
Feature Selection, False Discovery Rates and Permutation Tests.
This course is the fourth in a series, and follows our popular past
Modern Regression and Classification (1996-2000)
Statistical Learning and Data Mining (2001-2005)
Statistical Learning and Data Mining II (2005-2008)
Our earlier courses are not a prerequisite for this new course. Although there is some overlap with past courses, our new course contains many topics not covered by us before.
Software for these techniques will be illustrated, and a copy of the
text "Elements of Statistical Learning: data mining, inference and
prediction (2nd Edition)" and a comprehensive set
of class notes will be provided.
Professor Trevor Hastie of the Statistics and Biostatistics
Departments at Stanford University was formerly a member of the
Statistics and Data Analysis Research group, AT&T Bell
Laboratories. He co-authored with Tibshirani the monograph
Generalized Additive Models (1990) published by Chapman and
Hall, and has many research articles in the area of nonparametric
regression and classification. He also co-edited the Wadsworth book
Statistical Models in S (1991) with John Chambers. His
Ph.D. thesis Principal Curves introduced one of the first
nonlinear versions of principal components analysis. During his ten
years at Bell Laboratories he gained valuable experience with
classification and regression problems in industry and
Professor Robert Tibshirani of the Biostatistics and Statistics departments at Stanford University is a recipient of the COPSS award - an award given jointly by all the leading statistical societies to the most outstanding statistician under the age of 40. He also has many research articles on nonparametric regression and classification. With Bradley Efron he co-authored the best-selling text An Introduction to the Bootstrap in 1993, and has been an active researcher on bootstrap technology over the years. His 1984 Ph.D thesis spawned the currently lively research area known as Local Likelihood. He has more than twenty years experience in consulting on biostatistical problems.
This course is based on The Elements of Statistical Learning. This is the 2nd edition (2009) of the best-selling Springer book published in 2001 by Hastie, Tibshirani and Friedman
Hastie and Tibshirani
of Statistical learning: Data mining,
inference and prediction", with Jerome
Friedman (springer, 2001). This book has
received a terrific reception, with over 30,000 copies sold. The second edition of this book will appear in February 2009, and has been augmented and brought up to date. Both presenters are
actively involved in research in
regression, classification and
clustering, and are well-known not only in the statistics community
but in the machine-learning, neural
network and bioinformatics fields as
In the past 10 years they have become leaders in the statistical analysis of
DNA microarrays, working with leading-edge
such as Patrick Brown of Stanford University, and David Botstein of Princeton. They have given
many short courses together over the past 12 years, to academic,
government and industrial
audiences. They are both actively
involved with consulting in data analysis and modeling, for the
Stanford medical community as well as local biotech and web-related
industries. They have a reputation for
being good instructors who interact well
with the needs of the audience.
So far "Statistical Learning and Data Mining III" took place at:
- Sheraton Palo Alto, California, March 16-17, 2009.
- Donau University, Krems, Austria, September 25-26, 2009.
- Sheraton Palo Alto, California, March 18-19, 2010.
- Georgetown University Conference Center, October 11-12, 2010.
- Sheraton Palo Alto, California, March 14-15, 2011.
- Georgetown University Conference Center, October 18-19, 2011.
- Sheraton Palo Alto, California, March 15-16, 2012.
- Executive Conference Center, New York September 18-19, 2012.
- Sheraton Palo Alto, California, March 14-15, 2013
The "Statistical Learning and Data Mining II" course took place at:
- Harvard Conference Center, Boston, Mass, Oct 31-Nov 1, 2005.
- Sheraton Palo Alto, California, April 3-4, 2006.
- Doubletree Hotel, Philadelphia, October 12-13, 2006.
- Sheraton Palo Alto, California, March 8-9, 2007.
- Georgetown University conference center, October 18-19, 2007.
- Sheraton Palo Alto, California, March 7-8, 2008.
- Harvard Conference Center, Boston, Mass, Oct 6-7, 2008.
The second course "Statistical Learning and Data Mining" by Hastie and Tibshirani
took place at
These courses were filled to capacity, and were enthusiastically received
by attendees from biotech, financial and other industrial areas.
- University Hotel@MIT, Cambridge, September 6-7, 2001
- Sheraton Hotel, Palo Alto, February 28-March 1, 2002
- Georgetown University conference center, September 19-20, 2002.
- Sheraton Hotel, Palo Alto, February 27-28, 2003.
- University Hotel@MIT, Cambridge, September 15-16, 2003
- Sheraton Hotel, Palo Alto, February 25-26, 2004.
- Georgetown University conference center, September 20-21, 2004.
- Sheraton Hotel, Palo Alto, February 24-25, 2005.
Their first course - "Modern Regression and Classification" - took place at:
- Cleveland Clinic, Cleveland, March 1996
- Stanford Park Hotel, Menlo Park, California, April 1996
- Beckman Instruments, Los Angeles CA, June 1996
- Stanford Park Hotel, Menlo Park, California, June 1996 (overflow from April course)
- Center for Disease Control, Atlanta, September 1996
- Hyatt Regency Hotel, Boston, December 1996
- Hilton Hawaiian Village, Waikiki, February 1997
- Financial Marriott, New York, June 1997
- ETH Zentrum, Zurich, September 1997
- St Bride Institute, London, September 1997
- Georgetown Conference Center, Washington DCApril 1998
- Leiden University, the Netherlands, October 1998.
- Chicago Radisson Hotel, November 1998
- Stanford Park Hotel, March 1-2 1999
- Stanford Park Hotel, March 29-30 1999 (overflow from previous
- Financial Marriott, New York, June 1999.
- Technical University, Munich, September 1999.
- NSA, Maryland, November 1999.
- Mission Valley Marriott, San Diego, January 2000.
- Grand Hyatt Washington, DC June 2000.
- Sheraton Palo Alto, November 2000.
Some quotes from past attendees:
- "... the best presentation by professional statisticians I have ever had the pleasure of attending. "
- "Superior to most courses in all aspects"
- "I really liked how you emphasized concepts rather than
- "Topics were extremely well linked" ... " Very effective use of humor"
- "Your 2-day course has saved me months of research"
- "... Hastie and Tibshirani are excellent teachers..." - see the review by Steve Miller who attended the Boston 08 course
- Steve Miller also attended the Boston 2011 SLDMIII course - see his latest review
- March 14-15, 2013 Sheraton Hotel, Palo Alto,
Sheraton Palo Alto
625 El Camino Real
Palo Alto, CA 94301
SCHEDULE: Days 1 and 2
- 8:00am-9:00am: Check-in and coffee/tea + pastries.
- 9:00am-10:20am: Technical Sessions Begin
- 10:20am-10:35am: Coffee break.
- Noon-1:30pm: Lunch
- 1:30pm-2:30pm: Technical Session
- 2:30pm-2:40pm: Break
- 3:45pm-5pm: Technical session + discussion.
PRICE: $1450 per attendee. Full time student price: $1100.
Discounts for groups of 4 or more - 4th and additional attendees receive a $300 discount off the $1450 price, and pay $1150 each.
Attendance is limited to
the first 75 applicants, so sign up soon! These courses fill up
REGISTRATION FORM for Washington course
Hotels in vicinity of Georgetown Conference Center
Read here for more details on
attend, and our
not to sell our course notes.
SLDM Courses Homepage