---------------------------------------------------------------------------------- log: C:\AAA Miker Files\current class files\methods tabular arrays\some lo > gistic - loglin comparison stuff.log log type: text opened on: 11 Nov 2002, 09:07:37 . edit (3 vars, 4 obs pasted into editor) - preserve . table race occ, contents (sum count) row col ------------------------------- | occ race | Oth WC Total ----------+-------------------- n | 7146 2361 9507 w | 42012 17216 59228 | Total | 49158 19577 68735 ------------------------------- . *This is the White- Black labor market data from HW1 . desmat: poisson count race occ ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 4 Initial log likelihood: -26656.550 Log likelihood: -59.074 LR chi square: 53194.953 Model degrees of freedom: 2 Pseudo R-squared: 0.998 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. ------------------------------------------------------------------------------- count race 1 w 1.829** 0.011 occ 2 WC -0.921** 0.008 3 _cons 8.825** 0.011 ------------------------------------------------------------------------------- * p < .05 ** p < .01 . poisgof Goodness-of-fit chi2 = 73.77235 Prob > chi2(1) = 0.0000 . poisgof, pearson Goodness-of-fit chi2 = 72.06174 Prob > chi2(1) = 0.0000 . *And this is the independence model. . . desmat: poisson count race*occ ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 4 Initial log likelihood: -26656.550 Log likelihood: -22.196 LR chi square: 53268.708 Model degrees of freedom: 3 Pseudo R-squared: 0.999 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. ------------------------------------------------------------------------------- count race 1 w 1.771** 0.013 occ 2 WC -1.107** 0.024 race.occ 3 w.WC 0.215** 0.025 4 _cons 8.874** 0.012 ------------------------------------------------------------------------------- * p < .05 ** p < .01 . *And this is the saturated model. The log odds ratio of interaction is 0.215 > , SE 0.025. . *Now a look at logistic.. . desmat: logistic race occ [fweight=count] r(2000); . *That's right, logistic needs 1-0 variables. . desmat race occ Desmat generated the following design matrix: nr Variables Term Parameterization First Last 1 _x_1 race ind(1) 2 _x_2 occ ind(1) . logistic _x_1 _x_2 [fweight=count] Logit estimates Number of obs = 68735 LR chi2(1) = 73.76 Prob > chi2 = 0.0000 Log likelihood = -27587.081 Pseudo R2 = 0.0013 ------------------------------------------------------------------------------ _x_1 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _x_2 | 1.240298 .0315088 8.48 0.000 1.180054 1.303617 ------------------------------------------------------------------------------ . logistic _x_1 _x_2 [fweight=count], coef Logit estimates Number of obs = 68735 LR chi2(1) = 73.76 Prob > chi2 = 0.0000 Log likelihood = -27587.081 Pseudo R2 = 0.0013 ------------------------------------------------------------------------------ _x_1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _x_2 | .2153514 .0254043 8.48 0.000 .16556 .2651428 _cons | 1.771403 .0127961 138.43 0.000 1.746323 1.796483 ------------------------------------------------------------------------------ . *The interaction coefficient is the same here. . lfit Logistic model for _x_1, goodness-of-fit test number of observations = 68735 number of covariate patterns = 2 Pearson chi2(0) = 0.00 Prob > chi2 = . . *This is the saturated model, so the goodness of fit test has no df. The LR > Chisquare listed at the top of the model is the difference between the satura > ted model, and the model without the interaction term, which we know as the i > ndependence model. . logistic _x_1 [fweight=count], coef Logit estimates Number of obs = 68735 LR chi2(0) = 0.00 Prob > chi2 = . Log likelihood = -27623.959 Pseudo R2 = 0.0000 ------------------------------------------------------------------------------ _x_1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 1.829366 .0110485 165.58 0.000 1.807711 1.851021 ------------------------------------------------------------------------------ . lfit Logistic model for _x_1, goodness-of-fit test number of observations = 68735 number of covariate patterns = 1 Pearson chi2(0) = 0.00 Prob > chi2 = . . *The goodness of fit test for this logistic regression is either wrong, or lo > gistic is ignoring the second variable all together. The simple independence > model cannot be created using logistic regression, even in the simplest 2x2 > case. . exit, clear ---------------------------------------------------------------------------------- log: C:\AAA Miker Files\current class files\methods tabular arrays\some lo > gistic - loglin comparison stuff.log log type: text opened on: 11 Nov 2002, 13:59:34 . use "C:\AAA Miker Files\current class files\methods tabular arrays\HW2.dta", cle > ar . *Now I'm going to look at the Husb race * wife race intermarriage dataset from H > W2 . table wife husb, contents (sum count) row col ------------------------------------------------------------------------------ | husb wife | black mexican oth hisp all others white Total -----------+------------------------------------------------------------------ black | 4074 25 16 19 103 4237 mexican | 63 3947 132 78 1156 5376 oth hisp | 32 143 239 18 373 805 all others | 42 95 18 1022 492 1669 white | 215 1009 304 360 28453 30341 | Total | 4426 5219 709 1497 30577 42428 ------------------------------------------------------------------------------ . desmat: poisson count wife husb ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 25 Initial log likelihood: -80138.505 Log likelihood: -22065.255 LR chi square: 116146.499 Model degrees of freedom: 8 Pseudo R-squared: 0.725 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. ------------------------------------------------------------------------------- count wife 1 mexican 0.238** 0.021 2 oth hisp -1.661** 0.038 3 all others -0.932** 0.029 4 white 1.969** 0.016 husb 5 mexican 0.165** 0.020 6 oth hisp -1.831** 0.040 7 all others -1.084** 0.030 8 white 1.933** 0.016 9 _cons 6.091** 0.021 ------------------------------------------------------------------------------- * p < .05 ** p < .01 . poisgof Goodness-of-fit chi2 = 43952.7 Prob > chi2(16) = 0.0000 . poisgof, pearson Goodness-of-fit chi2 = 79604.85 Prob > chi2(16) = 0.0000 . tabulate wife husb [fweight=count], chi2 lrchi2 exact row col | husb wife | black mexican oth hisp all other white | Total -----------+-------------------------------------------------------+---------- black | 4074 25 16 19 103 | 4237 | 96.15 0.59 0.38 0.45 2.43 | 100.00 | 92.05 0.48 2.26 1.27 0.34 | 9.99 -----------+-------------------------------------------------------+---------- mexican | 63 3947 132 78 1156 | 5376 | 1.17 73.42 2.46 1.45 21.50 | 100.00 | 1.42 75.63 18.62 5.21 3.78 | 12.67 -----------+-------------------------------------------------------+---------- oth hisp | 32 143 239 18 373 | 805 | 3.98 17.76 29.69 2.24 46.34 | 100.00 | 0.72 2.74 33.71 1.20 1.22 | 1.90 -----------+-------------------------------------------------------+---------- all others | 42 95 18 1022 492 | 1669 | 2.52 5.69 1.08 61.23 29.48 | 100.00 | 0.95 1.82 2.54 68.27 1.61 | 3.93 -----------+-------------------------------------------------------+---------- white | 215 1009 304 360 28453 | 30341 | 0.71 3.33 1.00 1.19 93.78 | 100.00 | 4.86 19.33 42.88 24.05 93.05 | 71.51 -----------+-------------------------------------------------------+---------- Total | 4426 5219 709 1497 30577 | 42428 | 10.43 12.30 1.67 3.53 72.07 | 100.00 | 100.00 100.00 100.00 100.00 100.00 | 100.00 Pearson chi2(16) = 79604.8564 Pr = 0.000 likelihood-ratio chi2(16) = 43952.7234 Pr = 0.000 --Break-- r(1); . tabulate wife husb [fweight=count], chi2 lrchi2 exact | husb wife | black mexican oth hisp all other white | Total -----------+-------------------------------------------------------+---------- black | 4074 25 16 19 103 | 4237 mexican | 63 3947 132 78 1156 | 5376 oth hisp | 32 143 239 18 373 | 805 all others | 42 95 18 1022 492 | 1669 white | 215 1009 304 360 28453 | 30341 -----------+-------------------------------------------------------+---------- Total | 4426 5219 709 1497 30577 | 42428 Pearson chi2(16) = 79604.8564 Pr = 0.000 likelihood-ratio chi2(16) = 43952.7234 Pr = 0.000 --Break-- r(1); . tabulate wife husb [fweight=count], chi2 lrchi2 exact | husb wife | black mexican oth hisp all other white | Total -----------+-------------------------------------------------------+---------- black | 4074 25 16 19 103 | 4237 mexican | 63 3947 132 78 1156 | 5376 oth hisp | 32 143 239 18 373 | 805 all others | 42 95 18 1022 492 | 1669 white | 215 1009 304 360 28453 | 30341 -----------+-------------------------------------------------------+---------- Total | 4426 5219 709 1497 30577 | 42428 Pearson chi2(16) = 79604.8564 Pr = 0.000 likelihood-ratio chi2(16) = 43952.7234 Pr = 0.000 --Break-- r(1); . *OK, two points. First point is that the loglinear model for independence ge > nerates, as its goodness of fit statitics, the chisquare tests for independen > ce that we are used to seeing elsewhere, and which can be generated by hand w > ithout actually doing the loglinear model (see HW1). . . *Second point had to do with Fischer's exact test for the goodness of fit. > The pearson and LRchisquare are asymptotically chisquare, but not exactly chi > square. The exact test is exact, of course, but it is too computationally in > tensive to be of much use, even with modern hardware. . *I had to interrupt STATA before I got an exact test out of it. . clear all . edit (3 vars, 4 obs pasted into editor) - preserve . *Now I'm going to try an exact test with a smaller table (4 cells instead of > 25) . tabulate race occ [fweight=count] | occ race | Oth WC | Total -----------+----------------------+---------- n | 7146 2361 | 9507 w | 42012 17216 | 59228 -----------+----------------------+---------- Total | 49158 19577 | 68735 . desmat: poisson count race occ ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 4 Initial log likelihood: -26656.550 Log likelihood: -59.074 LR chi square: 53194.953 Model degrees of freedom: 2 Pseudo R-squared: 0.998 Prob: 0.000 ------------------------------------------------------------------------------- nr Effect Coeff s.e. ------------------------------------------------------------------------------- count race 1 w 1.829** 0.011 occ 2 WC -0.921** 0.008 3 _cons 8.825** 0.011 ------------------------------------------------------------------------------- * p < .05 ** p < .01 . poisgof Goodness-of-fit chi2 = 73.77235 Prob > chi2(1) = 0.0000 . poisgof, pearson Goodness-of-fit chi2 = 72.06174 Prob > chi2(1) = 0.0000 . tabulat race occ [fweight=count], chi2 lrchi2 exact | occ race | Oth WC | Total -----------+----------------------+---------- n | 7146 2361 | 9507 w | 42012 17216 | 59228 -----------+----------------------+---------- Total | 49158 19577 | 68735 Pearson chi2(1) = 72.0617 Pr = 0.000 likelihood-ratio chi2(1) = 73.7553 Pr = 0.000 Fisher's exact = 0.000 1-sided Fisher's exact = 0.000 . *stata only gives you the P value for fisher's exact test. So here it isn't > easy to compare that P value to the other chisquare P values, since they're a > ll zero to 3 digits. . edit - preserve . clear all . edit (3 vars, 4 obs pasted into editor) - preserve . *Now we have a dataset whose independence model is more reasonable.. . tabulate color live [fweight=count] | live Color | L W | Total -----------+----------------------+---------- B | 23 27 | 50 G | 10 15 | 25 -----------+----------------------+---------- Total | 33 42 | 75 . desmat: poisson count color live ------------------------------------------------------------------------------- poisson ------------------------------------------------------------------------------- Dependent variable count Number of observations: 4 Initial log likelihood: -14.328 Log likelihood: -9.540 LR chi square: 9.578 Model degrees of freedom: 2 Pseudo R-squared: 0.334 Prob: 0.008 ------------------------------------------------------------------------------- nr Effect Coeff s.e. ------------------------------------------------------------------------------- count color 1 G -0.693** 0.245 live 2 W 0.241 0.233 3 _cons 3.091** 0.192 ------------------------------------------------------------------------------- * p < .05 ** p < .01 . poisgof Goodness-of-fit chi2 = .2445188 Prob > chi2(1) = 0.6210 . poisgof, pearson Goodness-of-fit chi2 = .2435065 Prob > chi2(1) = 0.6217 . tabulate color live [fweight=count], chi2 lrchi2 exact | live Color | L W | Total -----------+----------------------+---------- B | 23 27 | 50 G | 10 15 | 25 -----------+----------------------+---------- Total | 33 42 | 75 Pearson chi2(1) = 0.2435 Pr = 0.622 likelihood-ratio chi2(1) = 0.2445 Pr = 0.621 Fisher's exact = 0.805 1-sided Fisher's exact = 0.404 . *Note that the exact test gives a different P for the reasonableness of the i > ndependence model, than the chisquare tests do. . exit, clear