Education 161 FINAL PROBLEMS, Winter 2000 Solutions for these problems are to be submitted in hard-copy form. Given that these problems are untimed, some care should be taken in presentation, clarity, format. Especially important is to give full and clear answers to questions, not just to submit unannotated computer output, although relevant output should be included. You may use any inanimate resources--no collaboration. This work is done under Stanford's Honor Code. Please read the questions carefully and answer the question that is asked. Papers will be scored into 3 categories: "Excellent" indicates sucessful completion of all parts of all questions (within perhaps one or two very trivial arithmetic errors); "Satisfactory" indicates a good attempt was made at all parts of all problems, but there were some serious errors or omissions; "Incomplete" indicates inadequate effort or performance. Problems due Wed March 15 5PM in Rogosa's Cubberley mailbox or deliver to Alex Harris in his office between 3-5 PM 3/15. No extensions/exceptions. Note data files are available in one of two locations: path: /usr/class/ed161/[data file] or using web-services at URL http://www.stanford.edu/class/ed161/hw/[data file] ================================================================ Problem 1 This problem and adapted data are taken from a text by an illustrious educational researcher. Research Setting: The general concern is with childhood aggressive behavior (and especially its persistence into adult violent behavior and other negative outcomes). Hudley&Graham (Child Development, 1993) conducted a study (based on attribution theory) which had as one of its outcome measures an assessment of negative intent. The rationale for this measure being that a boy's aggressive behavior might be a consequence of their misattributing the intent of others ("actors")in ambiguous situations. Thus, an intervention that taught these boys to interpret actors' intent as something other than negative in ambiguous situations should ultimately lessen their aggressive behavior. Data: In the Hudley&Graham study African American boys. average age about 10.5 years, were assigned at random to one of three experimental groups: (1) a 12 session intervention to infer non-hostile intent in ambiguous situations; (2) attention training (a lesser program to deal with effects of participating in a study); (3) control group (no ontervention or training). Data on 36 subjects are in file aggress.dat . c1 has Negative Intent Rating and c2 contains membership in the three experimental groups (intervention = 1; attention = 2; control = 3). a) Write a statistical model for this single classification data structure b) carry out an anova for this one-way classification and test the omnibus null hypothesis of no differences between the group means for Negative Intent using Type I error rate .05. c) carry out a post-hoc pairwise comparison procedure using the Tukey Method in order to construct interval estimates for all pairwise comparisons with family-wise confidence coeff .95. d) in planning a follow-up study which will have equal numbers of subjects in each group, how many subjects should there be in each group so that the interval estimate for these pairwise comparisons will have width of 1.5 units (using experimentwise error rate .01, i.e. confidence coefficient .99)? ------------------------------------ Problem 2 2. We are far enough away from Los Angeles that we should be able to view this with detachment..... Assume that a statistical consultant has been called in to assist the police department of a large city in evaluating its human relations course for new officers. After the human relations course is completed, an outcome measure 'attitude toward minority groups' is obtained (higher is more positive); assume that an instrument previously validated by the consultant is being used. A total of 45 officers are involved in this study. Each officer in the training has a type of beat (Factor A): upper-class, middle-class, inner-city. Also, the training program has been developed in three versions/levels (Factor B): 5 hours of human relations training, 10 hours of training, 15 hours of training. The study design has 5 officers for each combination of beat location and training duration. The data can be arrayed as follows: 5hrs training 10hrs training 15hrs training upper-class 24 33 37 29 42 44 36 25 27 43 38 29 28 47 48 middle-class 30 21 39 26 34 35 40 27 31 22 26 27 36 46 45 inner-city 21 18 10 31 20 41 39 50 36 34 42 52 53 49 64 ----------------------------------------------------------------- a. Construct both profile plots for this two-way data structure using length of course and type of beat on the horizontal axis. Comment on possible main effects and interactions. b. Write out the statistical model for this two-way classification c. Carry out the two-way anova and conduct the series of hypothesis tests for main effects and interaction. Keep your overall error rate at or below .05 for the 3 tests. State conclusions and interpretation. d. If appropriate, investigate further the interpretation of main effects analyzing row effects at each level of the column factor and/or vice versa. Procedures for doing so were illustrated in problem 5 of HW2 for a 2x3 design. Follow that approach for this data structure by restricting attention (for purposes of this problem) to just the upper-class and inner-city beats (temporarily setting aside the middle-class beat data). For this subset of the data (now a 2x3 design) compare the results of procedures based on pairwise Tukey intervals and Bonferroni intervals [Instructor note: setting aside the middle-class beat data is a simplification merely for the purposes of keeping the work of this problem managable. In real life, inferences comparing all three beats could be constructed at each level of training duration for example] ------------------------------------------------------------------------- 3. IQ scores and reading ability The file readiq.dat contains data (from a text) on 60 elementary school boys, 30 of whom were rated as poor or very poor readers--at least 2 years below grade level. The remaining 30 boys read normally, but otherwise resembled the poor readers in terms of schools, age, family background, and other variables. The 30 boys with reading problems consisted of 11 "very poor" readers and 19 who were merely "poor" readers. In the data file c1= 1 for very poor; c1 = 2 for poor; c1 = 3 for normal. The relation of reading disability to IQ measures is currently seen not to be as simple as "poor readers have lower intelligence". We have in column c4 the full-scale WISC-R IQ score. In c2 we have the attention/concentration sub-scale score (composed of arithmetic, digit-span, coding subtests). In c3 we have the spatial ability sub-scale score (composed of picture completion, block design, object assembly subtests). a) Use the Minitab to obtain a scatterplots for the attention/concentration and spatial ability scores for each of the reading ability level (1,2,3) in c1 . Obtain the sample correlation coefficients for each of the three scatterplots. For the normal readers construct an interval estimate with .95 confidence coefficient for the correlation. b) For the normal readers, use the subscale scores in c2 and c3 to form a prediction equation for the full-scale WISC-R scores in c4. What are the coefficients and squared multiple correlation for this regression fit? Plot the residuals versus the fits for this regression. Obtain a 95% prediction interval for the full-scale score for an individual having attention/concentration score of 32 and spatial ability score of 30. ---------------------------------------------------------- 4. SLEEP Looking forward to getting more? A simple experiment compared the effectiveness of two sedatives in promoting length of sleep, labelled here as Drug A and Drug B. Two groups of size 10 were formed by random assignment (in c3 group membership is coded A = 1, B =2) . The outcome measure in c1 is the number of hours of sleep obtained by the subject after taking the drug; the covariate in c2 is the number of hours of sleep obtained by the subject normally with no medication. Data reside in file sleep.dat. ------------ a. Construct a 90% confidence interval for difference of group means on the outcome measure in c1 (i.e. do not use covariate c2 information). b. Now consider use of covariate information in c2. What are the sample within-group c2-on-c1 slopes? Carry out a preliminary test of the ancova assumption of equal c2 on c1 slopes in each group with Type I error rate .10. c. Obtain a point and interval estimate for the analysis of covariance treatment effect. Use confidence coefficient .90. Compare the width of this confidence interval with part (a). Did use of the covariate help in the estimation? Comment. ------------------------------------------------ 5. But would you want to matriculate? We consider data on admissions for Fall 1973 graduate study at U.C. Berkeley in the six largest departments. The data on each applicant consists of the applicants gender (G), whether admitted (A) and major department (D). Whether admitted, male Whether admitted, female Dept Yes No Yes No a 512 313 89 19 b 353 207 17 8 c 120 205 202 391 d 138 279 131 244 e 53 138 94 299 f 22 351 24 317 Total 1198 1493 557 1278 a) For the males, which department has the largest proportion admitted? Carry out a statistical test that for the males, the proportion admitted is constant for the six departments considered here. b) construct a 99% confidence interval for the population proportion of females admitted (pooling over departments). c) Construct a 2x2 table of gender by admit status. Carry out a test for independence for this table. What might this result be taken to indicate about gender equity etc in the admit process? ------------------------------------------------- 6. If you're not crazy yet, you'll do ok. A psychologist conducted a study to examine the nature of the relation if any, between an employee's emotional stability (C2) and the employee's ability to perform in a task group (C1). Data on 27 employees are in file stable.dat . Emotional stability was measured by a written test, and ability to perform in a task group (C1 = 1 if able, C1 = 0 if unable) was evaluated by the supervisor. a. From an OLS fit for a straight-line relation for predicting C1 from C2, what level of emotional stability seems necessary for a probability of successful performance of .75. b. Use Minitab blog to obtain a fit for a logistic response function to these data What is the predicted probability of success for an employee with the median value of emotional stability? For the logistic fit, what level of emotional stability seems necessary for a probability of successful performance of .75? c. For both the OLS regression and the logistic curve estimation, list the fitted-values for probability of success using the emotional stability values in these data (C2). Comment on the similarity (or lack thereof) of these two fits. ------------------------- END