Statistics 209 / HRP 239/ Education 260
                                        Winter 2015

Statistical Methods for Group Comparisons and Causal Inference

previous title: Understanding Statistical Models and their Social Science Applications


David Rogosa
rag {AT} stat {DOT} stanford {DOT} edu

Lecture: TTh 12:35-2:05, Sequoia 200
course web page at http://web.stanford.edu/~rag/stat209/


                To see full course materials from Winter 2014 go here

Instructor. David Rogosa, Sequoia 224,  rag {AT} stanford {DOT} edu .
                   Office hours T 2:30-3:15.
TA  Wenfei Du       Office hour Fri 10-12:00, room 221   wdu {AT} stanford {DOT} edu

Registrar's Information
Description
 Critical examination of statistical methods in social science and life sciences applications, especially for 
 cause and effect determinations. Topics include:  matching and  propensity score methods,  analysis of covariance, 
 instrumental variables, compliance, path analysis, multilevel models,  longitudinal data, mediating and moderating variables. 
 Prerequisite: intermediate-level statistical methods
Course Overview
For students who have had intermediate-level instruction in statistical methods including multiple regression, logistic regression, log-linear models.
At the very least, the content of the course should provide some consolidation of previous instruction in statistical methods.
The goal is also to instill some introspection and critical analysis for the uses of statistical methods common in social science and medical applications, especially for observational studies.  
The focus of the course is on understanding what useful information statistical modeling can provide in experimental and especially non-experimental social science settings.

Quick Course Outline
Week 1. Course Introduction;  properties of regression models
Week 2. Experiments vs observational studies;  Neyman-Rubin-Holland formulation
Week 3. Path analysis and causal modeling, multiple regression with pictures. Graphical models.
Week 4. Multilevel data. Contextual effects, aggregation bias, random effects models
Week 5. The many uses and forms of analysis of covariance (including regression discontinuity designs)
Week 6. Instrumental variable methods, simultaneous equations, reciprocal effects
Week 7. Compliance and experimental protocols; encouragement designs; intent to treat
Week 8. Matching and propensity score methods
Week 9. Time-1, Time-2 group comparisons for experimental and non-experimental designs:
Dead Week. Overflow and course summary. 
Course Readings, Files and Examples

Texts (optional).    Class texts on reserve at Math/CS library
  Statistical Models: Theory and Practice David Freedman (2005) Revised edition (2009).
The course was created around David Freedman's text, and covers that material using auxiliary texts and online materials.
One intent of this course is for students to read some statistical literature and actual research reports to augment the texts (on that theme Freedman's text actually includes reprints of four published empirical research papers which are also available through Jstor).
Primary resource for R and data analysis.
  Data analysis and graphics using R (2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010    short draft version in CRAN 
     Text resource page      UCLA DAAG page      R-packages for Text Data Sets etc    R-Package DAAG    R-Package DAAGxtras  
Design of observational studies. Rosenbaum, Paul R. New York : Springer, c2010. Stanford access
Auxiliary texts, also on reserve at Math/CS library.
Regression Analysis : A Constructive Critique  Richard A Berk (2003). Table of contents
     Jan de Leeuw, Preface to Berk's "Regression Analysis: A Constructive Critique"  
Data analysis and regression: A second course in statistics. Mosteller, F. and Tukey, J. W. (1977) (the green book)
Matched Sampling for Causal Effects, Donald B. Rubin Cambridge University Press 2006
Observational Studies Paul R. Rosenbaum, Publisher: Springer; 2 edition (January 8, 2002)
David Freedman Statistical Models and Causal Inferencee Cambridge 2010 ISBN 978-0-521-19500-3

Grading, Homework and Exams.
Weekly homework assignments following class content will be posted, with solutions posted the next class cycle. Homeworks are not graded.
Assessment. Two take home problem sets will be scheduled:
TH1 covering content weeks 1-4.
TH2 covering content weeks 5-8.
In class exam, Exam 3 scheduled by registrar, exam week. My best reading of the Registrar's chart indicates Thurs March 19 7PM (in our classroom). Note this is not Saint Patrick's day. If needed, Exam 3 can be taken remotely).
See also class calendar

Course Assignments Page

Note to auditors. We should have plenty of room in Sequoia 200 (unlike in Bldg 160 last year) for auditors.
The Registrar does have a form (no-fee) for faculty, staff, post-docs: Application for Auditor or Permit to Attend (PTA) Status   

Statistical computing
Class presentation will be in, and students are encouraged to use, R, (with occasional reference to SAS, Mathematica, and Matlab).
1/7/09.  NYTimes endorses R: Data Analysts Captivated by R's Power
We have a set of 4 computer labs to supplement lecture materials (weeks 2, 4, 6, 8).
Lab 1. Multiple regression basics  Lab1 posted 1/18/15
Lab 2. Multilevel analysis (mixed-effects models) High School and Beyond example.
Lab 2 has evolved in three pieces.
a.   Lab2, exposition and commands provides a full write up (annotated) of the analyses
b.    Lab 2, Rogosa R-session (nlme legacy version)
c.    Lab2 (abbreviated version) using lme4, lmer  (with additional plots)   Lecture slide, lme lmer for Bryk data
    For those who are strapped for time or otherwise saturated, I provide a full single Bryk dataset that skips over the data manipulation portion of the activity
          Lab 2 posted 1/30/15
Lab 3, Instrumental Variables.
  Lab3, exposition and commands     
  Lab 3, Rogosa R-session        Mroz87 data description     Lab3 posted 2/14/15
note: I triple-checked and the dataset is where the description indicates and read.table("http://statweb.stanford.edu/~rag/stat209/Mroz87.dat", header = T) reads in the 753 cases.
Lab 4 Matching and propensity scores. Lalonde job training data
This lab is arranged in pieces
a.   Lab4, exposition and commands   posted 2/27/15
b.   Lab 4, Rogosa R-session, Base (sections 1-3)  posted 2/27/15
c.   Lab 4, Rogosa R-session, additional matching exercises (incl secs 4-6)  posted 2/27/15
d.   Lab 4, Rogosa R-session: not done until ancova is run  posted 2/27/15


Current version of R is R version R 3.1.2 (Pumpkin Helmet) release October 31, 2014. . For references and software: The R Project for Statistical Computing   Closest download mirror is Berkeley
The CRAN Task View: Statistics for the Social Sciences provides an overview of relevant R packages. Also of interest are CRAN Task View: Psychometric Models and Methods and CRAN Task View: Design of Experiments (DoE) and Analysis of Experimental Data
This past fall qtr I did short 5 week intro R-course intended for users of other statistical packages; see Ed401 page: http://www.stanford.edu/~rag/ed401/   Older introductory materials on R 2007 Stat141 site, especially the Course Files and Examples page
Among the infinite number of introduction to R resources is John Verzani's page A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li.   Another version
Even more stuff:   According to Peter Diggle: "The best resource for R that I have found is Karl Broman's Introduction to R page."   And a remarkably useful set of R-resources from Murray State
Wm. Revelle who develops the psych package also has a draft text which covers standard statistics plus specialized measurement topics (plus other R intros)
For those with a life sciences background a useful resource may be the book Analysis of epidemiological data using R and Epicalc and the Epicalc package.
An additional R resource that is efficient if you are experienced with another statistical package is a presentation An Introduction to R, John Verzani  For categorical data, especially if you've had a course using Agresti, the lengthy guide by Laura Thompson has more than you want to know.