The materials on this course website are only for the use of students enrolled in this course for purposes associated with this course and may not be retained or further disseminated. The materials on this course website may be protected by copyright; any further use of this material may be in violation of federal copyright law.
The goal of the course is to provide hands on instruction in
data management and analysis techniques.
Topics discussed include:
|
Professor |
Teaching Assistant(s) |
|
Raymond R. Balise Balise at Stanford |
Kameelah Abdullah kameelah at
Stanford
|
Admission to Health Research and Policy and a comfortable knowledge of a Windows XP/Vista.
Monday and Wednesday 11:30-1:00 Redwood Building T138B.
By appointment in Redwood Building T213D. Directions can be found here: www.stanford.edu/~balise/FindBalise.htm
If you would like to ask a question or help others please visit the course newsgroup which is named: su.class.hrp223. While not truly required for the class, you will suffer if you don’t have access to the news. If you do not know how to subscribe to a newsgroup and you use Windows http://www.stanford.edu/services/usenet/thunderbird_pc/index.html or a Mac http://www.stanford.edu/services/usenet/thunderbird_mac/index.html. Important notes of my setup can be found here: www.stanford.edu/class/hrp223/2009/newsgroup.ppt
There
are no books published (other than the manuals) covering Enterprise Guide 4.2
or SAS 9.2 TS2.
Read about SQL vs. data step code here: http://www2.sas.com/proceedings/sugi31/250-31.pdf
The Little SAS Book for Enterprise Guide 4.1: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=61054
SAS Programming for Enterprise Guide Users: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=61179
The little SAS Book 3rd Edition : http://www.sas.com/apps/pubscat/bookdetails.jsp?pc=59216
Common Statistical Methods for
Clinical Research with SAS Examples: http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58086
Learning SAS in the Computer Lab http://www.amazon.com/Learning-SAS-Computer-Advanced-Cengage/dp/0495559687/ref=sr_1_1?ie=UTF8&s=books&qid=1256016343&sr=8-1
Grades will be based on four homework problem sets. If you take the course for 2 units you must pass at least three of the four homework assignments and you must not violate the virus policy below. If you take the course for 3 units you must pass all four assignments. There will be many quick assignments that will not directly affect grades.
All assignments and homework will be submitted via email to balise at Stanford and kameelah at Stanford. Any student that sends me a virus (or any other malicious code) will fail the course. There will be no exceptions made. Therefore, you are strongly advised to download the latest version of the Sophos Anti-Virus software. If you need virus protection check here http://www.stanford.edu/services/ess/ and you can download the software for free. If you have any questions ask!
Each of the assignments will be due at the beginning of class on the day specified.
That said, there are unforeseen emergencies (illness, bike accidents, disk crashes, network troubles, childbirth, etc.). Instead of having to ask for special allowances on an individual basis, I give each of you the privilege of granting yourself a small extension in case of crisis. You will have two late days which you may use to extend the due dates of any assignments without penalty. To avoid any ambiguity, there are seven days in a week and each day ends at 5:00 PM. Thus, if your assignment was due on Wednesday but turned in the following Monday before 5:00, that assignment would be five days late. After the grace period is up each assignment is down weighted 20% per day. In all cases, assignments will not be accepted more than one calendar week after the original assignment due date.
The programs that you turn in must run on Windows SAS 9.2 TS2 and/or Enterprise Guide 4.2. I can provide good support for Windows or a Mac running parallels (http://www.parallels.com/).
Software somebody should have told you about a long time ago
Essential Stanford Software
(free stuff)
Tools of the trade
SAS
or SAS/Enterprise Guide instructions on installing SAS are here.
Other software I use (not officially endorsed by anybody and
ask your IT security person for help)
Using SAS Enterprise Guide as a calculator
The PowerPoint slides are here
for Excel 2007/2008 or here
for older versions.
What is a database?
Critical registry teaks for Excel
How to organize data in Excel
Variable
names
Dummy
records
Using Excel
Making
tables
Validation
Formulas
Quick
counts
Subsets
Random
subsets
Duplicates
Discrepancies and Differences
PivotTables
Collecting data in general
How to
score and store answers
The
value/danger of redundancy
Using REDCap
What is it?
How to
set it up.
The PowerPoint slides are here
for PowerPoint 2007 or here
for PowerPoint 2003.
The REDCap data dictionary is here
in Excel 2007 format or here
for Excel 2003. This is based on a file
made by Vanderbilt University (http://www.project-redcap.org/).
An Excel 2007 spreadsheet showing Excel is here.
Assignment 1 is here and
also watch
the three REDCap tutorial videos found here: http://www.project-redcap.org/videos.php.
Assignment 2 is here and
should be done before Oct 7th.
Types of Files
File suffixes
Loading data from Excel
Using libraries
Simple subsets
Basic numeric summaries
Converting character variables to
numeric with the input function
The PowerPoint slides are here
for PowerPoint 2007 or here
for PowerPoint 2003.
An Excel 2007 spreadsheet showing mixed type issues is here
and the EG project is here.
Setting up Windows
File extensions
Folders
Office
DOS
Configuring SAS Enterprise Guide
Application options
Keyboard macros
Using SAS
Libraries
Subsets
Toy data sets
Loading real data sets
Simple descriptive statistics and graphics
The slides are here for PowerPoint 2007 and here for PowerPoint 2003. They were revised on Oct 6th. New/edited slides are marked with a splat in the upper left corner.
Homework 1 is here and is due Wednesday Oct 14th before class.
Working with a library
Copying data
Into libraries
Selecting a few
variables
Selecting a few records
Calculations
Numeric variables
Complex math
Basic text manipulation
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
A zip file with SAS programs showing the use of %include is here. My general SAS program header is here.
An Enterprise Guide project that includes exporting and making fake data is here.
An Enterprise Guide project that shows SQL is here.
Reading The
Little SAS Book for Enterprise Guide 4.1 Chapter 4.
Creating data
Recoding values to make data
Project organization
Fixing bad values
Using formats
The Enterprise Guide project is located here.
Reading The
Little SAS Book for Enterprise Guide 4.1 Chapter 4.
Recoding values to fix problems
Collapsing data into categories with a new variable or formats
SQL joins
Inner joins
Left joins
The coalesce() and coalescec() functions
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
An Enterprise Guide project that uses SQL to merge files is here.
Reading The
Little SAS Book for Enterprise Guide 4.1 Chapter 4.
Joins
Summary functions with SQL
Fixing bad values
By using functions like
left
By recoding
Finding discrepancies when matches fail
More SQL and querying voodoo is covered in a project here.
This assignment uses information in
an Excel workbook that can be found here.
Homework 2 is here and is due Oct 28th.
Simulations with EG
Creating macro variables as prompts
SQL to select subsets of records
from multiple tables
Outer union and/or outer union
corresponding – concatenate
Union – unique rows from both
tables
Except – rows that are part
of the first query
Intersect – rows common to
both queries
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
The Enterprise Guide file for today is here.
Creating fake data
Importing
Creating tables from a single table
Selecting variables
Recoded variables
Using NULL numeric values
Filters to subsets of records
Working with character data
From multiple tables
SQL Joins
SQL set operations
How the data step really works
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
The Enterprise Guide file for today is here.
Queries and corrections when data is
not normalized
Normalizing data with arrays and
loops in a data step
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
An Enterprise Guide project showing macros is here.
An Enterprise Guide project showing fixing wide tables with data step is here.
Homework 3 is here and is due
November 18th before class.
A solution to HW2 is here. This is not what I expected you to be able to
produce.
Information on numeric precision in
a computer is here: http://support.sas.com/techsup/technote/ts654.pdf.
By processing
Retain statements
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
An Enterprise Guide project showing working with grouped data is here.
Logistic Regression in EG
Events in Trials Syntax
SAS sort order
Using formats or recoding for picking reference groups
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
Esophageal cancer data from Breslow
and Day in exactly the wrong format for Enterprise Guide is here.
Enterprise Guide project with logistic regression is here.
Class on the 18th
meets in the lab.
What makes a bad graphic bad
The gurus of data visualization
Visualizing categorical data
Bar charts
Dot plot
Waffle plots
Visualizing continuous data
Histograms
Box plos
Violin plot
How to get image formats for
publication
Working with the SAS ODS
ODS Statistical Graphics
SAS Graphics Editor
Setting up R
Tinn-R
PERL
High quality SAS graphics
Sgplot
hbox / category =
Sgplot
scatter
Sgplot
reg / group =
Sgplot
histogram
Sgpanel
panelby histogram
Sgpanel
panelby scatter
Lattice plots in SAS
Ods
graphics on
Selecting a subset of ODS graphics
ODS statistical graphics with many
elements
Truly custom ODS
statistical graphics
The slides are here for PowerPoint 2007 and here for PowerPoint 2003.
Enterprise Guide project is here.
R code for common R graphics is here.
R code for categorical data is here.
Enterprise Guide project with logistic regression is here.
R code is here.
Something
Homework 4 assigned and before 11:30
on December 11th. This cannot
be turned in late.
A set
of useful links can be found here.
A
few of my old favorite books are listed here.
SAS 2009 keyboard macros can be found here.