log type: text
opened on: 11 Oct 2005, 15:52:23
. edit
(3 vars, 4 obs pasted into editor)
- preserve
. tabulate race occ [fweight=count
invalid syntax
r(198);
. tabulate race occ [fweight=count]
| occ
race | Oth WC | Total
-----------+----------------------+----------
n | 7,146 2,361 | 9,507
w | 42,012 17,216 | 59,228
-----------+----------------------+----------
Total | 49,158 19,577 | 68,735
. tabulate race occ [fweight=count], lrchi2 chi2
| occ
race | Oth WC | Total
-----------+----------------------+----------
n | 7,146 2,361 | 9,507
w | 42,012 17,216 | 59,228
-----------+----------------------+----------
Total | 49,158 19,577 | 68,735
Pearson chi2(1) = 72.0617 Pr = 0.000
likelihood-ratio chi2(1) = 73.7553 Pr = 0.000
. desmat: poisson count race occ
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 4
Initial log likelihood: -26656.550
Log likelihood: -59.074
LR chi square: 53194.953
Model degrees of freedom: 2
Pseudo R-squared: 0.998
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
race
1 w 1.829** 0.011
occ
2 WC -0.921** 0.008
3 _cons 8.825** 0.011
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 73.77235
Prob > chi2(1) = 0.0000
. poisgof, pearson
Goodness-of-fit chi2 = 72.06174
Prob > chi2(1) = 0.0000
. *It takes 3 terms to generate the independence model, leaving 1 term for residual df
. *The tabulate command and the poisgof after loglinear model yield essentially the same statistics, 72.06 (Pearson) or 73.77 (LR chi2) on 1 df compared to the saturated model. This indicates that the independence model does not fit the data very well by the likelihood ratio test, which is consistent with the high degree of statistical significance we found in calculating the log odds ratio of this dataset by hand; the interaction between race and occupation is presicely what is not accounted for here. It is interesting to note that the eyeball test (Question 3) showed that the independence model and the actual data were not too far apart. Given large sample sizes, the LRT generates tests that some scholars consider to be overestimates of the power to distinguish between two competing hypotheses.
. *What is the probability of chisquare 73.77 on one df?
. display chi2tail(1,77.3)
1.469e-18
. *SMALL.
. predict A_independence
(option n assumed; predicted number of events)
. table race occ, contents (sum count sum A_independence) row col
----------------------------------------
| occ
race | Oth WC Total
----------+-----------------------------
n | 7146 2361 9507
| 6799.23 2707.77 9507
|
w | 42012 17216 59228
| 42358.77 16869.23 59228
|
Total | 49158 19577 68735
| 49158 19577 68735
----------------------------------------
. *That's a comparison of the actual data and the data under the assumption of independence.
. *Now on to question 7:
. desmat: poisson count race*occ
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 4
Initial log likelihood: -26656.550
Log likelihood: -22.196
LR chi square: 53268.708
Model degrees of freedom: 3
Pseudo R-squared: 0.999
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
race
1 w 1.771** 0.013
occ
2 WC -1.107** 0.024
race.occ
3 w.WC 0.215** 0.025
4 _cons 8.874** 0.012
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = .0170694
Prob > chi2(0) = .
. *Of course the residual degrees of freedom are zero, since the model has 4 terms. And the model fits the data exactly since the model has as many terms as the data itself. The goodness of fit statistic differs from zero only because the iterative fitting of the model doesn't end up fitting the data absolutely exactly.
. *And note, of course, that the interaction term is simply the log odds ratio and associated standard error we calculated by hand.
. predict A_saturated
(option n assumed; predicted number of events)
. table race occ, contents (sum count sum A_saturated) row col
-------------------------------
| occ
race | Oth WC Total
----------+--------------------
n | 7146 2361 9507
| 7146 2361 9507
|
w | 42012 17216 59228
| 42012 17216 59228
|
Total | 49158 19577 68735
| 49158 19577 68735
-------------------------------
. clear all.
. *Now let me load dataset B and get back to question 5.
. edit
(3 vars, 25 obs pasted into editor)
- preserve
. *starting with the independence model for dataset B
. desmat: poisson count husb wife
-------------------------------------------------------------------------------
Poisson regression
-------------------------------------------------------------------------------
Dependent variable count
Optimization: ml
Number of observations: 25
Initial log likelihood: -80138.505
Log likelihood: -22065.255
LR chi square: 116146.499
Model degrees of freedom: 8
Pseudo R-squared: 0.725
Prob: 0.000
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
count
husb
1 Black 1.084** 0.030
2 Mexican 1.249** 0.029
3 Oth Hisp -0.747** 0.046
4 White 3.017** 0.026
wife
5 Black 0.932** 0.029
6 Mexican 1.170** 0.028
7 Oth Hisp -0.729** 0.043
8 White 2.900** 0.025
9 _cons 4.076** 0.035
-------------------------------------------------------------------------------
* p < .05
** p < .01
. poisgof
Goodness-of-fit chi2 = 43952.7
Prob > chi2(16) = 0.0000
. *indpendence has r+c-1=9 terms, so residual df is 16.
. display chi2tail(16,43953)
0
. *The probability here is zero. We already noted that dataset B seems to be much further from independence than dataset A, though the goodness of fit chisquare for the independence model from dataset A has a fairly small probabilityas well.
. tabulate husb wife [fweight=count], lrchi2
| wife
husb | All Other Black Mexican Oth Hisp White | Total
-----------+-------------------------------------------------------+----------
All Others | 1,022 19 78 18 360 | 1,497
Black | 42 4,074 63 32 215 | 4,426
Mexican | 95 25 3,947 143 1,009 | 5,219
Oth Hisp | 18 16 132 239 304 | 709
White | 492 103 1,156 373 28,453 | 30,577
-----------+-------------------------------------------------------+----------
Total | 1,669 4,237 5,376 805 30,341 | 42,428
likelihood-ratio chi2(16) = 4.4e+04 Pr = 0.000
. scalar lrt_B=r(chi2_lr)
. display lrt_B
43952.723
. *The output from the tabulate command gave us only two significant digits for the chisquare statistic, but you can get the exact figure from stata. See the help or the manuals for this.
. *Anyway, the point is that the chisquare test for independence that accompanies every table you have ever seen is really a likelihood ratio test comparison of the actual data to the independence model.
. log close
log type: text
closed on: 11 Oct 2005, 16:26:12
-------------------------------------------------------------------------------