log type:  text

opened on:  11 Oct 2005, 15:52:23

. edit

(3 vars, 4 obs pasted into editor)

- preserve

. tabulate race occ [fweight=count

invalid syntax

r(198);

. tabulate race occ [fweight=count]

|          occ

race |       Oth         WC |     Total

-----------+----------------------+----------

n |     7,146      2,361 |     9,507

w |    42,012     17,216 |    59,228

-----------+----------------------+----------

Total |    49,158     19,577 |    68,735

. tabulate race occ [fweight=count], lrchi2 chi2

|          occ

race |       Oth         WC |     Total

-----------+----------------------+----------

n |     7,146      2,361 |     9,507

w |    42,012     17,216 |    59,228

-----------+----------------------+----------

Total |    49,158     19,577 |    68,735

Pearson chi2(1) =  72.0617   Pr = 0.000

likelihood-ratio chi2(1) =  73.7553   Pr = 0.000

. desmat: poisson count race occ

-------------------------------------------------------------------------------

Poisson regression

-------------------------------------------------------------------------------

Dependent variable                                                    count

Optimization:                                                            ml

Number of observations:                                                   4

Initial log likelihood:                                          -26656.550

Log likelihood:                                                     -59.074

LR chi square:                                                    53194.953

Model degrees of freedom:                                                 2

Pseudo R-squared:                                                     0.998

Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

count

race

1      w                                                     1.829**     0.011

occ

2      WC                                                   -0.921**     0.008

3    _cons                                                   8.825**     0.011

-------------------------------------------------------------------------------

*  p < .05

** p < .01

. poisgof

Goodness-of-fit chi2  =  73.77235

Prob > chi2(1)        =    0.0000

. poisgof, pearson

Goodness-of-fit chi2  =  72.06174

Prob > chi2(1)        =    0.0000

. *It takes 3 terms to generate the independence model, leaving 1 term for residual df

. *The tabulate command and the poisgof after loglinear model yield essentially the same statistics, 72.06 (Pearson) or 73.77 (LR chi2) on 1 df compared to the saturated model. This indicates that the independence model does not fit the data very well by the likelihood ratio test, which is consistent with the high degree of statistical significance we found in calculating the log odds ratio of this dataset by hand; the interaction between race and occupation is presicely what is not accounted for here. It is interesting to note that the eyeball test (Question 3) showed that the independence model and the actual data were not too far apart. Given large sample sizes, the LRT generates tests that some scholars consider to be overestimates of the power to distinguish between two competing hypotheses.

. *What is the probability of chisquare 73.77 on one df?

. display chi2tail(1,77.3)

1.469e-18

. *SMALL.

. predict A_independence

(option n assumed; predicted number of events)

. table race occ, contents (sum count sum  A_independence) row col

----------------------------------------

|             occ

race |      Oth        WC     Total

----------+-----------------------------

n |     7146      2361      9507

|  6799.23   2707.77      9507

|

w |    42012     17216     59228

| 42358.77  16869.23     59228

|

Total |    49158     19577     68735

|    49158     19577     68735

----------------------------------------

. *That's a comparison of the actual data and the data under the assumption of independence.

. *Now on to question 7:

. desmat: poisson count race*occ

-------------------------------------------------------------------------------

Poisson regression

-------------------------------------------------------------------------------

Dependent variable                                                    count

Optimization:                                                            ml

Number of observations:                                                   4

Initial log likelihood:                                          -26656.550

Log likelihood:                                                     -22.196

LR chi square:                                                    53268.708

Model degrees of freedom:                                                 3

Pseudo R-squared:                                                     0.999

Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

count

race

1      w                                                     1.771**     0.013

occ

2      WC                                                   -1.107**     0.024

race.occ

3      w.WC                                                  0.215**     0.025

4    _cons                                                   8.874**     0.012

-------------------------------------------------------------------------------

*  p < .05

** p < .01

. poisgof

Goodness-of-fit chi2  =  .0170694

Prob > chi2(0)        =         .

. *Of course the residual degrees of freedom are zero, since the model has 4 terms. And the model fits the data exactly since the model has as many terms as the data itself. The goodness of fit statistic differs from zero only because the iterative fitting of the model doesn't end up fitting the data absolutely exactly.

. *And note, of course, that the interaction term is simply the log odds ratio and associated standard error we calculated by hand.

. predict A_saturated

(option n assumed; predicted number of events)

. table race occ, contents (sum count sum   A_saturated) row col

-------------------------------

|         occ

race |   Oth     WC  Total

----------+--------------------

n |  7146   2361   9507

|  7146   2361   9507

|

w | 42012  17216  59228

| 42012  17216  59228

|

Total | 49158  19577  68735

| 49158  19577  68735

-------------------------------

. clear all.

. *Now let me load dataset B and get back to question 5.

. edit

(3 vars, 25 obs pasted into editor)

- preserve

. *starting with the independence model for dataset B

. desmat: poisson count husb wife

-------------------------------------------------------------------------------

Poisson regression

-------------------------------------------------------------------------------

Dependent variable                                                    count

Optimization:                                                            ml

Number of observations:                                                  25

Initial log likelihood:                                          -80138.505

Log likelihood:                                                  -22065.255

LR chi square:                                                   116146.499

Model degrees of freedom:                                                 8

Pseudo R-squared:                                                     0.725

Prob:                                                                 0.000

-------------------------------------------------------------------------------

nr Effect                                                    Coeff        s.e.

-------------------------------------------------------------------------------

count

husb

1      Black                                                 1.084**     0.030

2      Mexican                                               1.249**     0.029

3      Oth Hisp                                             -0.747**     0.046

4      White                                                 3.017**     0.026

wife

5      Black                                                 0.932**     0.029

6      Mexican                                               1.170**     0.028

7      Oth Hisp                                             -0.729**     0.043

8      White                                                 2.900**     0.025

9    _cons                                                   4.076**     0.035

-------------------------------------------------------------------------------

*  p < .05

** p < .01

. poisgof

Goodness-of-fit chi2  =   43952.7

Prob > chi2(16)       =    0.0000

. *indpendence has r+c-1=9 terms, so residual df is 16.

. display chi2tail(16,43953)

0

. *The probability here is zero. We already noted that dataset B seems to be much further from independence than dataset A, though the goodness of fit chisquare for the independence model from dataset A has a fairly small probabilityas well.

. tabulate husb wife [fweight=count], lrchi2

|                          wife

husb | All Other      Black    Mexican   Oth Hisp      White |     Total

-----------+-------------------------------------------------------+----------

All Others |     1,022         19         78         18        360 |     1,497

Black |        42      4,074         63         32        215 |     4,426

Mexican |        95         25      3,947        143      1,009 |     5,219

Oth Hisp |        18         16        132        239        304 |       709

White |       492        103      1,156        373     28,453 |    30,577

-----------+-------------------------------------------------------+----------

Total |     1,669      4,237      5,376        805     30,341 |    42,428

likelihood-ratio chi2(16) =  4.4e+04   Pr = 0.000

. scalar lrt_B=r(chi2_lr)

. display lrt_B

43952.723

. *The output from the tabulate command gave us only two significant digits for the chisquare statistic, but you can get the exact figure from stata. See the help or the manuals for this.

. *Anyway, the point is that the chisquare test for independence that accompanies every table you have ever seen is really a likelihood ratio test comparison of the actual data to the independence model.

. log close

log type:  text

closed on:  11 Oct 2005, 16:26:12

-------------------------------------------------------------------------------