Note

A note from Prof. Jennifer Widom, June 2020:

This was the last offering of CS 102. Congratulations to the students who were able to persevere through a pandemic and horrific racism to complete the course and gain some mastery of working with data, and a big thanks to the teaching assistants for their tremendous efforts. I'm hopeful that within a few years Stanford will offer a cohesive curriculum in data science. In the meanwhile, all of the material from CS 102, including Jupyter notebooks and data sets, is being kept current on the website of Prof. Widom's Instructional Odyssey.



Course Description

Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing data sets. This course provides a broad and practical introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.

Lectures

Tuesdays & Thursdays 1:30-2:50 PM
Delivered via Zoom. Link found on Canvas.
We prefer students participate live, but lectures will also be recorded.

Office Hours

The five TAs hold office hours throughout the week, and Professor Widom's office hours are usually Wednesdays 4:00-5:00 PM. All office hours are via Zoom, with each week's times and links posted on the course calendar. For TA office hours logistics, please refer to this Piazza post.

Optional Bootcamps

Some Wednesdays, 11:30-12:30PM
Bootcamp sessions are recorded and made available afterwards on Canvas. They provide extra setup help for the tools we are using, and additional programming examples for those who may have a weaker background in programming or seek additional practice.

Evaluation

There are 5 assignments, 2 projects, and 2 exams. The final grade is an equal weighting on composite scores for assignments, projects, and exams, i.e., 33.3% each for the the 5 homework assignments (weighted equally), the 2 projects (weighted equally) and the 2 exams (weighted equally). In spring quarter 2020, all courses are graded on an S/NC basis. We will compute a letter grade for each student -- all students who receive a C- or better will be assigned a grade of S, while D+ and below will be assigned a grade of NC.

Exams

Exams are held during the class period; see syllabus below for dates. Please make sure you will be available for both of the exam dates. Alternate times (but not dates) may be possible by petition for extenuating circumstances.

Communication

Please use Piazza for all questions related to the course. We use Piazza as our primary portal for course-related announcements, so make sure to sign up! For all Piazza posts, we guarantee that we will respond within 24 hours. Also check out the list of frequently asked questions.

Course Staff
Students with Documented Disabilities
Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty. Unless the student has a temporary disability, Accommodation letters are issued for the entire academic year. For CS102 we require accommodation letters to be filed with the instructor a minimum of two weeks before the requested accommodation. This policy is strictly enforced. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).
Schedule (Spring 2020)
Date Topic and Assignments Readings/References Notes
Tue April 7 Introduction & Course Logistics
Working with Data - Overview
Introductory Readings Course Information
Working With Data - Overview Slides
Wed April 8 Bootcamp: Google Sheets Setup Google Sheets Setup Instructions
Thu April 9 Working with Data - Overview (cont'd)
Data Analysis & Visualization Using Spreadsheets
Google Spreadsheets References Data Analysis Using Spreadsheets Slides
Spreadsheet Analysis Notes (Part 1)
Mon April 13 Assignment #1: Spreadsheets
Project #1: Personal Data Analysis
Tue April 14 Data Analysis & Visualization Using Spreadsheets (cont'd) Spreadsheet Analysis Notes (Part 2)
Wed April 15 Bootcamp: Tableau Setup Tableau Setup Instructions
Thu April 16 Data Analysis & Visualization Using Spreadsheets (cont'd)
Advanced Data Visualization Using Tableau
Common Visualization Mistakes Tableau References Data Visualization Using Spreadsheets Slides
Data Visualization Using Spreadsheets Notes
Advanced Data Visualization Using Tableau Slides
Advanced Data Visualization Using Tableau Notes
Mon April 20 Bootcamp: Instabase Setup Instabase Setup Instructions
Mon April 20 Assignment #1 due
Assignment #2: Tableau, SQL
Tue April 21 Relational Databases and Basic SQL SQL References
Project Jupyter home page
Relational Databases and SQL Slides
Basic SQL Notes
Thu April 23 Advanced SQL Advanced SQL Notes
Mon April 27 Project #1 proposal due
Tue April 28 Introduction to Python
Python for Data Analysis & Visualization
Python References
Python for Data Analysis & Visualization Slides
Python Basics Notes
Python Data Notes
Wed April 29 Bootcamp: SQL SQL Bootcamp Slides
Thu April 30 Python for Data Analysis & Visualization (cont'd) Pandas References
Python Pandas Notes
Thu April 30 Assignment #2 due
Assignment #3: Python
Tue May 5 Python for Data Analysis & Visualization (cont'd) PyPlot Tutorial Python Plotting Notes
Wed May 6 Bootcamp: Python Bootcamp Notebooks
Thu May 7 Machine Learning - Regression ML References - Regression Regression Slides
Regression Notes
Sat May 9 Assignment #3 due (no late submissions)
Tue May 12 Exam #1
Thu May 14 Machine Learning - Classification and Clustering ML References - Classification and Clustering Classification Slides
Clustering Slides
Classification & Clustering Notes
Mon May 18 Project #1 due
Assignment #4: Machine Learning, R
Project #2: Movie-Rating Predictions
The Netflix Prize
Tue May 19 Using Python for Machine Learning ML References - Python Python Machine Learning Notes
Thu May 21 The R Language - Data Analysis, Visualization, and Machine Learning R Tutorial
Choosing R or Python for data analysis? An infographic
R Slides
R Notes
Fri May 22 Bootcamp: R
Mon May 25 Assignment #4 due
Tue May 26 Data Mining Algorithms Data Mining References Data Mining Slides
Data Mining Notes
Thu May 28 Assignment #5: Data Mining, Network Analysis
Thu May 28 Data Mining Using Python Mining Python Notes
Fri May 29 Bootcamp: Data Mining using SQL
Mon June 1 Project #2 due
Tue June 2 Network Analysis Network References Network Slides
Networks Notes
Thu June 4 Project #2 results and discussion
Unstructured Data
Unstructured Data Slides
Sat June 6 Assignment #5 due
Tue June 9 Exam #2