HRP 223 - Data Management and Statistical Programming - 2008/2009 Edition

The goal of the course is to provide hands on instruction in data management and analysis techniques.
Topics discussed include:

  1. Working with large databases - what makes a good database turn bad
  2. Data cleaning techniques
  3. Generating numerical and graphical presentations
  4. Descriptive statistics

Contact information

Professor

Teaching Assistant(s)

Raymond R. Balise 
Redwood Bldg. T213D, MC 5092 
Stanford, California  94305-5405 

balise at stanford 
Voice (650) 724-2602 
Fax (650) 725-6951

Lamiya Sheikh

 

 

 

lamiyas at stanford

 

 

Prerequisites

Admission to Health Research and Policy and a comfortable knowledge of a Windows XP/Vista.

Lectures                                                                                                             

Monday and Wednesday 11:30-1:00 Redwood Building T138B.

Office Hours

By appointment in Redwood Building T213D.  Directions can be found here: www.stanford.edu/~balise/FindBalise.htm

Newsgroup

If you would like to ask a question or help others please visit the course newsgroup which is named:  su.class.hrp223. While not truly required for the class, you will suffer if you don’t have access to the news.  If you do not know how to subscribe to a newsgroup and you use Windows http://www.stanford.edu/services/email/config/thunderbird/newsreader/pc/ or a Mac http://www.stanford.edu/services/email/config/thunderbird/newsreader/mac/. Screenshots of my setup can be found here: www.stanford.edu/class/hrp223/2008/newsgroup.ppt

Readings

The Little SAS Book for Enterprise Guide 4.1: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=61054

SAS Programming for Enterprise Guide Users: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=61179

The little SAS Book 3rd Edition : http://www.sas.com/apps/pubscat/bookdetails.jsp?pc=59216

Optional Books

Common Statistical Methods for Clinical Research with SAS Examples: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58086

Grading

Grades will be based on four homework problem sets.   If you take the course for 2 units you must pass at least three of the four homework assignments and you must not violate the virus policy below.  If you take the course for 3 units you must pass all four assignments.  There will be many quick assignments that will not directly affect grades.

Turning in Homework and Viruses

All assignments and homework will be submitted via email to balise at stanford dot edu and lamiyas at stanford dot edu. Any student that sends me a virus (or any other malicious code) will fail the course.  There will be no exceptions made.  Therefore, you are strongly advised to download the latest version of the Sophos Anti-Virus software. If you need virus protection check here http://www.stanford.edu/services/ess/ and you can download the software for free. If you have any questions ask!

Late policy

Each of the assignments will be due at the beginning of class on the day specified.

That said, there are unforeseen emergencies (illness, bike accidents, disk crashes, network troubles, childbirth, etc.). Instead of having to ask for special allowances on an individual basis, I give each of you the privilege of granting yourself a small extension in case of crisis. You will have two late days which you may use to extend the due dates of any assignments without penalty. To avoid any ambiguity, there are seven days in a week and each day ends at 5:00 PM. Thus, if your assignment was due on Wednesday but turned in the following Monday before 5:00, that assignment would be five days late. After the grace period is up each assignment is down weighted 20% per day.  In all cases, assignments will not be accepted more than one calendar week after the original assignment due date.

Computer Platforms

The programs that you turn in must run on Windows SAS 9.1.3 SP4.  I can provide good support for Windows or a Mac running parallels (http://www.parallels.com/), fair support for UNIX.

 

Core Lecture Material

Lectures will be here.

Day 0

Software somebody should have told you about a long time ago

Using SAS as a calculator

 

The PowerPoint slides are here.

Assignment 1 is here and is due before class Monday September 29th.

 

Day 1 (September 24th into 29th)

            Loading data into SAS

General issues with data

Issues with Excel

Libraries

 

The PowerPoint slides are here for PowerPoint 2007 or here for PowerPoint 2003.

First EG project is here.

Importing with EG project is here.

Excel workbook used in topic 1 is here.

 

TLSB 1.1-1.4, 1.8, 2.4-2.5, 2.12, 2.16-2.17, 2,19-2.20

TLSBEG Tutorial A, Chapter1 especially 1.1-1.8

Day 2 (September 29th- Oct 1st)

            Loading data into SAS

            Organizing a project

            Adding variables with EG

How data steps work

Bugs

 

The PowerPoint slides are here for PowerPoint 2007 or here for PowerPoint 2003.

The revised PowerPoint slides are here for PowerPoint 2007 or here for PowerPoint 2003.

 

Teletubbies EG project is here.

Day2 project is here.

Walker Diabetes data is here.

 

Assignment 2 is here and the solution is here.

 

TLSBEG Tutorial B, Tutorial D, Chapter 2 2.1-2.12, Chapter 3

Day 3 (Oct 6th and 8th)

            Many Examples

Organizing projects

            Comparing two files

            Custom Formats in Excel

            Converting Character to Numeric

            Filtering and Querying

            Subsetting

            Dates

 

The PowerPoint slides are here for PowerPoint 2007 or here for PowerPoint 2003.

 

Bad data EG project is here.

SDplan EG project is here.

SDdone EG project is here.

distinctDates EG project is here.

Day3 project is here.

 

Homework 1 is here and is due before class Wednesday October 15th.

 

TLSBEG Tutorial B, Tutorial D, Chapter 2 2.1-2.12, Chapter 3

Day 4 (Oct 13th)

            Make toy data sets

            Pretty Contingency Tables

            Simulations

            Summarizing Numeric variables

            Introduction to Macros

 

The PowerPoint slides are here for PowerPoint 2007 or here for PowerPoint 2003.

 

Day 4 project is here.

 

Other stuff

A set of useful links can be found here.

SAS keyboard macros can be found here.

 

The 2007 version of HRP 223 can be found here.