Sociology 388

 

Homework 1

 

Due Tuesday, Oct 9, in class.

 

Late homeworks will generally not be accepted, because I will post answers to my website soon after the homework is due.  If you're stuck, email me or the TA.  If you still can't figure it out, just do the best you can and don't panic. 

 

General Notes:  Feel free to discuss this assignment with other students.  Each student must turn in their own work.  Answers to the questions should be accompanied by Excel Worksheets and edited Stata logs (by edited, I mean you should remove the wrong turns and just give me the important parts.  The Stata log is just a text file, so you can edit it in MS Word or any text editor.  Feel free to add comments to the log file).

 

Reading: Agresti Ch1-2, Hout Ch 1.

 

Important ideas: independence, degrees of freedom, goodness of fit, odds ratio

 

Consider the 2 datasets from the syllabus:

 

A) Occupation by Race, USA 2000

 

 

Race

 

 

 

White

Non White

Occupational Class

Other

42,012

7,146

 

White Collar

17,216

2,361

 

and

 

B) Intermarriage, LA 1990

 

Wives

 

 

 

 

Husbands:

NH Black

Mexican

Other Hisp

All Others

NH White

Non Hisp Black

4074

63

32

42

215

Mexican

25

3947

143

95

1009

Other Hispanic

16

132

239

18

304

All Others

19

78

18

1022

360

Non Hisp White

103

1156

373

492

28453

 

1) For dataset A, calculate the log odds ratio and the standard error of the log odds ratio, using Excel (See Agresti P. 22-24).  Is the log odds ratio significantly different from zero?  What does that mean about the association between race and occupational class in America?  Besides the statistical significance of the log odds ratio, do you think the magnitude of the effect is a large enough to be potentially socially significant effect, or not?

 

2) For dataset A, how is the log odds ratio for non-White representation in the White collar sector related to the log odds ratio for White representation in the White collar sector?

 

3) For BOTH dataset A and B, use excel to generate the 'Independence' Model.  Without using any statistics, how close do you think the "independence" model is to the actual data for A and B? (any reasonable opinion is fine here).

 

4) When non statisticians talk about over representation, and under representation, they frequently talk in terms of observed and expected percentages.  Use the 'Independence Model' (see Question 3) to generate expected percentage of non-Whites, and Whites in White Collar jobs.  Then divide observed percentage by the expected percentage to get a crude measure of over or under-representation.  How can you compare the measure for Whites and non-Whites?  Can you think of any reasons why this method is less satisfactory than the odds ratio method?

 

5) Use Stata to generate the "Independence" model for both datasets A and B.  How many terms are in the model?  How many degrees of freedom are in the likelihood ratio chi-square test (Stata option poisgof after you have run the poisson regression).  What does the likelihood ratio chi-square test tell you about how well the 'independence' model fits the data?  Now use the tabulate command, with the lrchi2 option (and don't forget to use the weights as in [fweight=count].  How do these two measures of independence compare?

 

6) Using Excel and dataset A, find the log odds ratio of White representation in White Collar jobs, from the predicted values of the Independence model (see Question 3).  How do you interpret this?

 

7) Use Stata to generate the 'saturated' model for dataset A, which is simply the "Independence" model plus one additional term.  How many terms are in the model?  How many degrees of freedom are in the likelihood ratio chi-square test?  What is the value and standard error of the new interaction term?  How do these values compare to what you calculated by hand in question 1? Use the predict command in Stata to generate predicted values from this model. How do the predicted values compare to the actual data? Explain why the predicted values fit the actual data so well.