-----------------------------------------------------------------------------------------------

log:  C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\third_class_notes.lo

> g

log type:  text

opened on:   2 Oct 2007, 11:03:04

. set linesize 75

. *We have talked a bit about likelihood ratio tests and comparing models,

> now let's look at the output from Stata and see what that looks like.

. use "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\frogs. dta", clear

. *frogs dataset

. *first, let's look at the indepdence model.

. desmat: poisson count color live

---------------------------------------------------------------------------

Poisson regression

---------------------------------------------------------------------------

Dependent variable                                                count

Optimization:                                                        ml

Number of observations:                                               4

Initial log likelihood:                                         -14.328

Log likelihood:                                                  -9.540

LR chi square:                                                    9.578

Model degrees of freedom:                                             2

Pseudo R-squared:                                                 0.334

Prob:                                                             0.008

---------------------------------------------------------------------------

nr Effect                                                Coeff        s.e.

---------------------------------------------------------------------------

count

color

1      Green                                            -0.693**     0.245

live

2      Water                                             0.241       0.233

3    _cons                                               3.091**     0.192

---------------------------------------------------------------------------

*  p < .05

** p < .01

. *at the top of every poisson regression, we get a LRT statistic that compares to the constant only model, which is usually not very interesting because the constant only model is usually rather stupid.

. *For this test between the indepdence model and the constant only model, we get a statistic of 9.578 on 2df, which rejects the constant only model but not dramatically.

. display chi2tail(2,9.578)

.00832077

. *The answer is a little less than 1 percent. So we are reasonably sure that the constant only model doesn't fit the data, but given that the number of frogs is not that great, and the distribution across the 4 cells was not dramatically skewed, there is a remote chance (1 in 100 or so) that a constant only distribution (with random variation) could yield as much deviation from evenness as we saw in this frog dataset.

. *The more interesting test is the comparison to the saturated model, or the actual data, and that we get by adding commands after we run the poisson regression. This comparison requires the “poisgof” command after we run the poisson regression.

. poisgof

Goodness-of-fit chi2  =  .2445188

Prob > chi2(1)        =    0.6210

. poisgof, pearson

Goodness-of-fit chi2  =  .2435065

Prob > chi2(1)        =    0.6217

. *because the statistic here is smaller than 1 on 1df, the p value is greater than .5, meaning we cannot reject that the data actually came from the independence distribution.

. *Another way of saying this is that if the data were generated from an independence model, we would expect this much deviation from independence 62% of the time, which is a lot.

. *This is another way of saying that there doesn't seem to be a significant interaction between frog color and where the frog lives, which we tested a different way by looking at the odds ratio of interaction, which was also insignificant.

.

. *That is the basic start.

. *One thing to keep straight in Stata is the difference between tabulate and table.

. *Both can give you frequency cross-tabulations

. *Table can actually put any statistic you like in the table, whereas tabulate can give you fit statistics

. tabulate color live [fweight=count], lrchi2 chi2

|         live

Color |     Lilly      Water |     Total

-----------+----------------------+----------

Blue |        23         27 |        50

Green |        10         15 |        25

-----------+----------------------+----------

Total |        33         42 |        75

Pearson chi2(1) =   0.2435   Pr = 0.622

likelihood-ratio chi2(1) =   0.2445   Pr = 0.621

. *You don't need to run a log linear model to calculate the chisquare statistic for goodness of fit. In fact, Tabulate in Stata and any frequency table program will generate these statistics for you. Note that the goodness of fit tests were exactly what we got from poisgof after our independence model.

. desmat: poisson count color live, verbose

Desmat generated the following design matrix:

nr   Variables       Term                        Parameterization

First    Last

1    _x_1           color                       ind(1)

2    _x_2           live                        ind(1)

Iteration 0:   log likelihood = -9.5395876

Iteration 1:   log likelihood = -9.5395873

Poisson regression                                Number of obs   =

>  4

LR chi2(2)      =       9.

> 58

Prob > chi2     =     0.00

> 83

Log likelihood = -9.5395873                       Pseudo R2       =     0.33

> 42

----------------------------------------------------------------------------

> --

count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interva

> l]

-------------+--------------------------------------------------------------

> --

_x_1 |  -.6931472    .244949    -2.83   0.005    -1.173238    -.2130

> 56

_x_2 |   .2411621   .2326211     1.04   0.300    -.2147668    .69709

> 09

_cons |   3.091042   .1922751    16.08   0.000      2.71419    3.4678

> 95

----------------------------------------------------------------------------

> --

----------------------------------------------------------------------------

Poisson regression

----------------------------------------------------------------------------

Dependent variable                                                 count

Optimization:                                                         ml

Number of observations:                                                4

Initial log likelihood:                                          -14.328

Log likelihood:                                                   -9.540

LR chi square:                                                     9.578

Model degrees of freedom:                                              2

Pseudo R-squared:                                                  0.334

Prob:                                                              0.008

----------------------------------------------------------------------------

nr Effect                                                 Coeff        s.e.

----------------------------------------------------------------------------

count

color

1      Green                                             -0.693**     0.245

live

2      Water                                              0.241       0.233

3    _cons                                                3.091**     0.192

----------------------------------------------------------------------------

*  p < .05

** p < .01

* Verbose option after desmat shows us the creation of the dummy variables, and how many steps it takes the software to find the set of parameters that maximize the likelihood. In this case, because it is an easy model, it took only one step. But big datasets with sparse data can take thousands of steps or can fail to converge altogether.

. *in the department of learning stata, table can do some nice things with Table.

. table color live, contents(mean  _x_1)

------------------------

|     live

Color | Lilly  Water

----------+-------------

Blue |     0      0

Green |     1      1

------------------------

. *in this case showing us what the dummy variable for color looks like.

. exit, clear