HW 2, Soc 388
Due Thursday, October 18, in class
Late homeworks will generally not be accepted, because I will post answers to my website soon after the homework is due. If you're stuck, email me. If you still can't figure it out, just do the best you can and don't panic.
Note: What I refer to
as the independence model, Hout refers to as the model of 'perfect mobility',
and my Model 4, the
Also note that some of the characteristics of the models
that Hout describes, especially the parts about coefficients summing to zero,
are not characteristics of the models, but rather characteristics of the way in
which some programs construct dummy variables.
Stata's built in xi function always constructs dummy variables with one
excluded category (equal to zero) in each variable. The user-written Stata function desmat can
construct dummy variables in any number of ways, including the way hout
describes (this is the dev option), and by default the same way xi does it
(this is the
BIC (since it
isn't defined in either text): BIC= LRT-
df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual
degrees of freedom, and N is the sample size from the whole dataset. The syllabus contains references that define
BIC (Raftery 1986) and critique it (Weakliem 1999).
ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity
50(abs(predicted/N)-(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.
Important ideas: Goodness of fit measures, hypothesis testing.
Consider the Los Angeles intermarriage dataset:
Intermarriage, LA 1990
|
Wives |
|
|
|
|
Husbands: |
NH Black |
Mexican |
Other Hisp |
All Others |
NH White |
Non Hisp Black |
4074 |
63 |
32 |
42 |
215 |
Mexican |
25 |
3947 |
143 |
95 |
1009 |
Other Hispanic |
16 |
132 |
239 |
18 |
304 |
All Others |
19 |
78 |
18 |
1022 |
360 |
Non Hisp White |
103 |
1156 |
373 |
492 |
28453 |
Fill in the following Table
Model # |
Model
Description |
Terms in
model |
Residual
df |
Goodness
of fit Chi-square |
Goodness
of fit Chi-square P |
BIC |
ID |
Notes |
1 |
Constant
only |
|
|
|
|
|
|
|
2 |
Independence
Model |
|
|
|
|
|
|
|
3 |
Independence
plus single level of endogamy (same for all groups) |
|
|
|
|
|
|
|
4 |
Independence
plus separate endogamy term for each group |
|
|
|
|
|
|
|
5 |
Same as 4,
plus Black- White and Mexican- Other Hispanic interactions |
|
|
|
|
|
|
|
6 |
Your best
fitting model here |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1) Fill in the above table, models 1-5, leave the 'notes' column blank for now. For model 5 the Black- White and Mexican- Other Hispanic terms are gender symmetric.
2) Verify that model 1, the 'constant' model is the comparison model for the likelihood ratio chi-square that Stata lists as the second line of output for each subsequent model. How do you interpret that chi-square test?
3) Does racial endogamy vary significantly between groups? What is the statistical test that answers that question?
4) In model 4 which is the group with the strongest ethnic or racial endogamy? Which group has the weakest endogamy? Is the difference between the strongest and weakest statistically significant?
5) Generate the predicted values for Model 5. Where do the predicted values and the actual values correspond exactly?
6) How do you interpret the coefficients for Black- White and Mexican- Other Hispanic intermarriage in Model 5?
7) If you add a gender specific dimension to Black- White intermarriage in Model 5, is it significant?
8) Make a new model 7, which consists of Model 2, the independence model, plus the gender symmetric Black- White interaction term. Compare the resulting Black- White interaction term to same term from Model 5. Why is it different? Think about how the comparison group is different in the two cases.
9) Of models 1-5, which is the best fitting by BIC? Which fits best by the goodness of fit chi-square? Which fits best by Index of Dissimilarity?
10) Find a model that fits better than model 5 by either BIC or the goodness of fit chi-square.