Education 161  FINAL PROBLEMS, Winter 2000

Solutions for these problems are to be submitted in hard-copy
form. Given that these problems are untimed, some care should be
taken in presentation, clarity, format.  Especially important is
to give full and clear answers to questions, not just to submit
unannotated computer output, although relevant output should
be included.
You may use any inanimate resources--no collaboration.  This
work is done under Stanford's Honor Code.
Please read the questions carefully and answer the question that
is asked.  
Papers will be scored into 3 categories: "Excellent" indicates
sucessful completion of all parts of all questions (within
perhaps one or two very trivial arithmetic errors);
"Satisfactory" indicates a good attempt was made at all parts of
all problems, but there were some serious errors or omissions;
"Incomplete" indicates inadequate effort or performance. 

Problems due Wed March 15 5PM in Rogosa's Cubberley mailbox
or deliver to Alex Harris in his office between 3-5 PM 3/15.
No extensions/exceptions.

Note data files are available in one of two locations:
path: /usr/class/ed161/[data file]
or using web-services at URL
http://www.stanford.edu/class/ed161/hw/[data file]

================================================================

Problem 1

This problem and adapted data are taken from a text by an
illustrious educational researcher.
Research Setting:
The general concern is with childhood aggressive behavior
(and especially its persistence into adult violent behavior
and other negative outcomes). 
Hudley&Graham (Child Development, 1993) conducted a study (based
on attribution theory) which had as one of its outcome measures 
an assessment of negative intent. The rationale for this measure
being that a boy's aggressive behavior might be a consequence of 
their misattributing the intent of others ("actors")in ambiguous
situations.  Thus, an intervention that taught these boys to
interpret actors' intent as something other than negative in
ambiguous situations should ultimately lessen their aggressive
behavior.
Data:
In the Hudley&Graham study African American boys. average age
about 10.5 years, were assigned at random to one of three
experimental groups: (1) a 12 session intervention to infer
non-hostile intent in ambiguous situations; (2) attention
training (a lesser program to deal with effects of participating
in a study); (3) control group (no ontervention or training).
Data on 36 subjects are in file aggress.dat .
c1 has Negative Intent Rating and c2 contains membership in the
three experimental groups (intervention = 1; attention = 2;
control = 3).

a) Write a statistical model for this single classification data structure
b) carry out an anova for this one-way classification and test the
   omnibus null hypothesis of no differences between the group means
   for Negative Intent using Type I error rate .05.
c) carry out a post-hoc pairwise comparison procedure using the
   Tukey Method in order to construct interval estimates for all 
   pairwise comparisons with family-wise confidence coeff .95.
d) in planning a follow-up study which will have equal numbers of
   subjects in each group, how many subjects should there be in each
   group so that the interval estimate for these pairwise comparisons
   will have width of 1.5 units (using experimentwise error rate 
   .01, i.e. confidence coefficient .99)?

------------------------------------


Problem 2

2. We are far enough away from Los Angeles that we should be able
to view this with detachment.....

Assume that a statistical consultant has been called in to assist
the police department of a large city in evaluating its human
relations course for new officers. After the human relations
course is completed, an outcome measure 'attitude toward minority
groups' is obtained (higher is more positive); assume that an 
instrument previously validated by the consultant is being used. 

A total of 45 officers are involved in this study. Each officer
in the training has a type of beat (Factor A): upper-class,
middle-class, inner-city. Also, the training program has been
developed in three versions/levels (Factor B): 5 hours of human
relations training, 10 hours of training, 15 hours of training.
The study design has 5 officers for each combination of beat
location and training duration. The data can be arrayed as
follows:


                5hrs training      10hrs training       15hrs training 

upper-class     24 33 37 29 42     44 36 25 27 43       38 29 28 47 48


middle-class    30 21 39 26 34     35 40 27 31 22       26 27 36 46 45


inner-city      21 18 10 31 20     41 39 50 36 34       42 52 53 49 64

-----------------------------------------------------------------

a. Construct both profile plots for this two-way data structure using 
   length of course and type of beat on the horizontal axis.
   Comment on possible main effects and interactions.
b. Write out the statistical model for this two-way classification
c. Carry out the two-way anova and conduct the series of hypothesis tests 
   for main effects and interaction.
   Keep your overall error rate at or below .05 for the 3  tests.
   State conclusions and interpretation.
d. If appropriate, investigate further the interpretation of main
   effects analyzing row effects at each level of the column factor and/or 
   vice versa.
   Procedures for doing so were illustrated in problem 5 of HW2 for a 
   2x3 design.  Follow that approach
   for this data structure by restricting attention (for purposes of
   this problem) to just the upper-class and inner-city beats (temporarily
   setting aside the middle-class beat data).  For this subset of the
   data (now a 2x3 design) compare the results of procedures based on 
   pairwise Tukey intervals and  Bonferroni intervals
    [Instructor note: setting aside the middle-class beat data
    is a simplification merely for the purposes of keeping the
    work of this problem managable.  In real life, inferences
    comparing all three beats could be constructed at each level
    of training duration for example]
-------------------------------------------------------------------------

3.
IQ scores and reading ability
The file readiq.dat contains data (from a text) on 60
elementary school boys, 30 of whom were rated as poor or
very poor readers--at least 2 years below grade level. The
remaining 30 boys read normally, but otherwise resembled 
the poor readers in terms of schools, age, family background,
and other variables.  The 30 boys with reading problems 
consisted of 11 "very poor" readers and 19 who were merely "poor"
readers. In the data file c1= 1 for very poor; c1 = 2 for poor;
c1 = 3 for normal.
The relation of reading disability to IQ measures is currently seen
not to be as simple as "poor readers have lower intelligence".
We have in column c4 the full-scale WISC-R IQ score. In c2 we
have the attention/concentration sub-scale score  (composed of 
arithmetic, digit-span, coding subtests).  In c3 we have the 
spatial ability sub-scale score  (composed of picture completion, 
block design, object assembly subtests).

a)  Use the Minitab to obtain a scatterplots
    for the attention/concentration and spatial ability scores
    for each of the reading ability level (1,2,3) in c1 . Obtain the
    sample correlation coefficients for each of the three scatterplots.
    For the normal readers construct an interval estimate with .95
    confidence coefficient for the correlation.

b)  
    For the normal readers, use the subscale scores in c2 and
    c3 to form a prediction equation for the full-scale WISC-R
    scores in c4.  What are the coefficients and squared multiple
    correlation for this regression fit?  Plot the residuals
    versus the fits for this regression. Obtain a 95% prediction
    interval for the full-scale score for an individual having
    attention/concentration score of 32 and spatial ability score
    of 30.

----------------------------------------------------------

4.
 SLEEP  Looking forward to getting more?   

A simple experiment compared the effectiveness of two sedatives
in promoting length of sleep, labelled here as Drug A and Drug B.
Two groups of size 10 were formed by random assignment (in c3
group membership is coded A = 1, B =2) . The outcome measure in
c1 is the number of hours of sleep obtained by the subject after
taking the drug; the covariate in c2 is the number of hours of
sleep obtained by the subject normally with no medication.
Data reside in file sleep.dat.
------------

a. Construct a 90% confidence interval for difference of
group means on the outcome measure in c1 (i.e. do not use covariate c2 
information).

b. Now consider use of covariate information in c2. 
What are the sample within-group c2-on-c1 slopes? 
Carry out a preliminary test of the ancova assumption of equal c2 on c1
slopes in each group with Type I error rate .10.  

c. Obtain a point and interval estimate for the 
analysis of covariance treatment effect.  Use confidence coefficient .90.
Compare the width of this confidence interval with part (a).  
Did use of the covariate help in the estimation? Comment.

------------------------------------------------

5.
   But would you want to matriculate? 
   We consider data on admissions for Fall 1973 graduate study at 
U.C. Berkeley in the six largest departments.  The data on each 
applicant consists of the applicants gender (G), whether admitted (A) 
and major department (D).
        Whether admitted, male         Whether admitted, female

Dept       Yes         No                    Yes       No
a          512        313                    89        19 
b          353        207                    17         8
c          120        205                   202       391 
d          138        279                   131       244 
e           53        138                    94       299 
f           22        351                    24       317 

Total      1198      1493                   557      1278

a) For the males, which department has the largest proportion admitted?
   Carry out a statistical test that for the males, the proportion 
   admitted is constant for the six departments considered here.
b) construct a 99% confidence interval for the population proportion of 
   females admitted (pooling over departments).
c) Construct a 2x2 table of gender by admit status.  Carry out a test for 
   independence for this table.  What might this result be taken to 
   indicate about gender equity etc in the admit process?

-------------------------------------------------
6.  
If you're not crazy yet, you'll do ok.  

A psychologist conducted a study to examine the nature of the 
relation if any, between an employee's emotional stability (C2) 
and the employee's ability to perform in a task group (C1). Data 
on 27 employees are in file stable.dat .

Emotional stability was measured by a written test, and ability 
to perform in a task group (C1 = 1 if able, C1 = 0 if unable) was 
evaluated by the supervisor.


a.   From an OLS fit for a straight-line relation for predicting 
     C1 from C2, what level of emotional stability seems necessary 
     for a probability of successful performance of .75.
b.   Use Minitab blog to obtain a fit for a logistic response function 
     to these data 
     What is the predicted probability of success for an employee with 
     the median value of emotional stability?
     For the logistic fit, what level of emotional stability seems 
     necessary for a probability of successful performance of .75?
c.   For both the OLS regression and the logistic curve estimation,
     list the fitted-values for probability of success using the 
     emotional stability values in these data (C2). Comment on the
     similarity (or lack thereof) of these two fits.
-------------------------
END