HW 3, Soc 388, updated Oct 11.

Due Tuesday, Oct 30

Late homeworks will generally not be accepted, because I will post answers to my website soon after the homework is due.  If you're stuck, email me or the TA.

NOTE: All homeworks should include an edited STATA log.

Previous Reading Assignments: Hout Chapters 1-4; Agresti Ch 1-2, 6.

New Reading Assignment:  Agresti, Ch 3

Once again, (since it isn't defined in either text):  BIC= LRT- df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual degrees of freedom, and N is the sample size from the whole dataset.  The syllabus contains references that define BIC (Raftery 1986) and critique it (Weakliem 1999).  Lower BIC indicates better fit, and BIC < 0 indicates a model that is preferred to the saturated model.

ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity

50(abs(predicted/N)-(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.

Important ideas:  Goodness of fit measures, hypothesis testing, inference across many dimensions, different kinds of controls, hierarchical variables.

The data are available from my website, as well as my public folder via ftp (/afs/ir/users/m/r/mrosenfe/public) under the name "70-80-90 MR intermar.dta" (Stata ver 6) or "70-80-90 MR intermar.xls" if you'd rather start with the excel file and copy it into Stata.

The data have 225 cells, and 5 variables (not including count).  There 649,821 couples in the dataset (it's intermarriage data, surprise surprise).  The data consist of married people age 20-29 at the time of the census.  The variables are meth (husband's ethnicity) and feth (wife's ethnicity), with the same 5 categories we have seen before (non Hispanic Black, non Hispanic White, Mexican, Other Hispanic, non Hispanic Other).  There is a variable for census year (70, 80, and 90), and there is a variable for nativity of each spouse (born in the US vs Foreign born).  The dataset includes 3 of the possible 4 combinations of nativity; couples that are both foreign born are excluded.  The number of cells= 5*5*3*3=225.

In the following table, BW is the gender symmetric Black- White interaction;  MOh is the gender symmetric Mexican- Other Hispanic interaction; ethintdm is the dummy variable that treats all 5 kinds of ethnic endogamy the same, ethintct is the categorical variable that treats each kind of ethnic intermarriage differently.

In model descriptions, " year*meth*mgen" is a hierarchical description which mean that the interaction of the 3 variables, as well as all combinations of dual interactions and single variables are included. I'll explain more about this in class.

Note: in Model 7a, the '@' in front of year indicates (to desmat) that year should be treated as a continuous variable there.

Fill in the following Table

 Model # Model Description Terms in model Residual df Goodness of fit Chi-square Goodness of fit Chi-square P BIC ID 1 Constant only 2 year*meth year*feth 3 year*meth*mgen year*feth*fgen 4 year*meth*mgen year*feth*fgen BW, MOh 5 year*meth*mgen year*feth*fgen ethintdm 6 year*meth*mgen year*feth*fgen ethintct 7a year*meth*mgen year*feth*fgen ethintct*@year 7b year*meth*mgen year*feth*fgen ethintct*year 8 year*meth*mgen year*feth*fgen ethintct*year BW MOh 9 Your best fitting model here

1) Fill in the above table, models 1-8

2) Does racial endogamy vary significantly between groups? What is the statistical test that answers that question?

3) Does racial endogamy vary significantly over time? More so for some groups than for others?

4) Does US nativity effect racial endogamy? Describe the model(s), and the results you need to answer this question.

5) Based on models 1-8, which would you say is a more powerful force in the marriage market- racial endogamy or the division between Blacks and Whites?  Why?

6) Which of the models 1-8 fits the best by LRT and by BIC?  Do any of them fit reasonably well?

7) What is the difference between treating year as a continuous vs categorical variable in interactions with ethnic endogamy? How do models 7a and 7b differ? How do you interpret this difference?

8) Construct a model that fits better (by BIC or LRT) than any of the models 1-8. What have you added to the previous models?

9) Now here are some more abstract questions about a hypothetical dataset with 3 variables: A (5 categories) B(4 Categories) and C (3 categories).  Total number of cells is 5*4*3=60.  Fill in the following table.

 Model # Model Description Terms in model Residual df 1 A 2 A,B 3 A*B 4 A*B,C 5 A*B, B*C, A*C 6 A*B*C