-----------------------------------------------------------------------------------------------

       log:  C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\third_class_notes.lo

> g

  log type:  text

 opened on:   2 Oct 2007, 11:03:04

 

. set linesize 75

 

. *We have talked a bit about likelihood ratio tests and comparing models,

> now let's look at the output from Stata and see what that looks like.

. use "C:\AAA Miker Files\newer web pages\soc_388_notes\soc_388_2007\frogs. dta", clear

 

. *frogs dataset

. *first, let's look at the indepdence model.

. desmat: poisson count color live

---------------------------------------------------------------------------

   Poisson regression

---------------------------------------------------------------------------

   Dependent variable                                                count

   Optimization:                                                        ml

   Number of observations:                                               4

   Initial log likelihood:                                         -14.328

   Log likelihood:                                                  -9.540

   LR chi square:                                                    9.578

   Model degrees of freedom:                                             2

   Pseudo R-squared:                                                 0.334

   Prob:                                                             0.008

---------------------------------------------------------------------------

nr Effect                                                Coeff        s.e.

---------------------------------------------------------------------------

   count

     color

1      Green                                            -0.693**     0.245

     live

2      Water                                             0.241       0.233

3    _cons                                               3.091**     0.192

---------------------------------------------------------------------------

*  p < .05

** p < .01

 

. *at the top of every poisson regression, we get a LRT statistic that compares to the constant only model, which is usually not very interesting because the constant only model is usually rather stupid.

. *For this test between the indepdence model and the constant only model, we get a statistic of 9.578 on 2df, which rejects the constant only model but not dramatically.

. display chi2tail(2,9.578)

.00832077

 

. *The answer is a little less than 1 percent. So we are reasonably sure that the constant only model doesn't fit the data, but given that the number of frogs is not that great, and the distribution across the 4 cells was not dramatically skewed, there is a remote chance (1 in 100 or so) that a constant only distribution (with random variation) could yield as much deviation from evenness as we saw in this frog dataset.

. *The more interesting test is the comparison to the saturated model, or the actual data, and that we get by adding commands after we run the poisson regression. This comparison requires the “poisgof” command after we run the poisson regression.

 

. poisgof

 

         Goodness-of-fit chi2  =  .2445188

         Prob > chi2(1)        =    0.6210

 

. poisgof, pearson

 

         Goodness-of-fit chi2  =  .2435065

         Prob > chi2(1)        =    0.6217

 

. *because the statistic here is smaller than 1 on 1df, the p value is greater than .5, meaning we cannot reject that the data actually came from the independence distribution.

. *Another way of saying this is that if the data were generated from an independence model, we would expect this much deviation from independence 62% of the time, which is a lot.

. *This is another way of saying that there doesn't seem to be a significant interaction between frog color and where the frog lives, which we tested a different way by looking at the odds ratio of interaction, which was also insignificant.

.

. *That is the basic start.

. *One thing to keep straight in Stata is the difference between tabulate and table.

. *Both can give you frequency cross-tabulations

. *Table can actually put any statistic you like in the table, whereas tabulate can give you fit statistics

. tabulate color live [fweight=count], lrchi2 chi2

 

           |         live

     Color |     Lilly      Water |     Total

-----------+----------------------+----------

      Blue |        23         27 |        50

     Green |        10         15 |        25

-----------+----------------------+----------

     Total |        33         42 |        75

 

          Pearson chi2(1) =   0.2435   Pr = 0.622

 likelihood-ratio chi2(1) =   0.2445   Pr = 0.621

 

. *You don't need to run a log linear model to calculate the chisquare statistic for goodness of fit. In fact, Tabulate in Stata and any frequency table program will generate these statistics for you. Note that the goodness of fit tests were exactly what we got from poisgof after our independence model.

. desmat: poisson count color live, verbose

 

Desmat generated the following design matrix:

 

nr   Variables       Term                        Parameterization

     First    Last

 

 1    _x_1           color                       ind(1)

 2    _x_2           live                        ind(1)

 

Iteration 0:   log likelihood = -9.5395876 

Iteration 1:   log likelihood = -9.5395873 

 

Poisson regression                                Number of obs   =        

>  4

                                                  LR chi2(2)      =       9.

> 58

                                                  Prob > chi2     =     0.00

> 83

Log likelihood = -9.5395873                       Pseudo R2       =     0.33

> 42

 

----------------------------------------------------------------------------

> --

       count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interva

> l]

-------------+--------------------------------------------------------------

> --

        _x_1 |  -.6931472    .244949    -2.83   0.005    -1.173238    -.2130

> 56

        _x_2 |   .2411621   .2326211     1.04   0.300    -.2147668    .69709

> 09

       _cons |   3.091042   .1922751    16.08   0.000      2.71419    3.4678

> 95

----------------------------------------------------------------------------

> --

----------------------------------------------------------------------------

   Poisson regression

----------------------------------------------------------------------------

   Dependent variable                                                 count

   Optimization:                                                         ml

   Number of observations:                                                4

   Initial log likelihood:                                          -14.328

   Log likelihood:                                                   -9.540

   LR chi square:                                                     9.578

   Model degrees of freedom:                                              2

   Pseudo R-squared:                                                  0.334

   Prob:                                                              0.008

----------------------------------------------------------------------------

nr Effect                                                 Coeff        s.e.

----------------------------------------------------------------------------

   count

     color

1      Green                                             -0.693**     0.245

     live

2      Water                                              0.241       0.233

3    _cons                                                3.091**     0.192

----------------------------------------------------------------------------

*  p < .05

** p < .01

 

* Verbose option after desmat shows us the creation of the dummy variables, and how many steps it takes the software to find the set of parameters that maximize the likelihood. In this case, because it is an easy model, it took only one step. But big datasets with sparse data can take thousands of steps or can fail to converge altogether.

 

. *in the department of learning stata, table can do some nice things with Table.

. table color live, contents(mean  _x_1)

 

------------------------

          |     live   

    Color | Lilly  Water

----------+-------------

     Blue |     0      0

    Green |     1      1

------------------------

 

. *in this case showing us what the dummy variable for color looks like.

. exit, clear