--------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2011_381_logs\class5.log

  log type:  text

 opened on:  11 Oct 2011, 13:02:26

 

* first of all, when I later get to talking about methods for creating dummy variables, one useful free add-in to get is desmat.

 

. ssc install desmat, replace

checking desmat consistency and verifying not already installed...

 

the following files will be replaced:

    c:\ado\plus\d\desmat.ado

    c:\ado\plus\d\desmat.hlp

    c:\ado\plus\d\desrep.ado

    c:\ado\plus\d\desrep.hlp

    c:\ado\plus\d\destest.ado

    c:\ado\plus\d\destest.hlp

    c:\ado\plus\o\outshee2.ado

    c:\ado\plus\o\outshee2.hlp

    c:\ado\plus\s\showtrms.ado

    c:\ado\plus\s\showtrms.hlp

 

installing into c:\ado\plus\...

installation complete.

 

. which desmat

c:\ado\plus\d\desmat.ado

*! version 3.2, 17Sep2004, John_Hendrickx@yahoo.com 

. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

 

* take a look at the Excel page for “understanding dummy variables”

 

. table metro if age >=30 & age <=64 & sex==1, contents (freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

 

 

. * update all

* the “update all” command is something you ought to do on your Stata installation, every once in a while.

 

. codebook metro

 

--------------------------------------------------------------------------------

metro                                           Metropolitan central city status

--------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 labelmetrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

. regress incwage metro if age >=30 & age<=64 & sex==1 & metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  1, 29239) =  400.78

       Model |  6.0432e+11     1  6.0432e+11           Prob > F      =  0.0000

    Residual |  4.4088e+13 29239  1.5078e+09           R-squared     =  0.0135

-------------+------------------------------           Adj R-squared =  0.0135

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38831

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |   4563.546   227.9541    20.02   0.000     4116.745    5010.346

       _cons |   25213.42    605.392    41.65   0.000     24026.83    26400.02

------------------------------------------------------------------------------

 

* Please don’t do this! Given that metro is a categorical variable with nominal (rather than ordinal) categories, and given that regress assumes that predictor variables are continuous unless you tell it otherwise, this regression command is a crime against proper data analysis, using a categorical variable as a continuous one. In order to treat metro properly as a categorical variable, we need STATA to generate proper dummy variables for us. One way is to use STATA’s built-in i.variable syntax, which is what STATA calls factor variable syntax, and can be looked up on help for STATA versions 11 and up, under fvvarlist.

 

. regress incwage i.metro if age>=30 & age <=64 & sex==1 & metro ~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          2  |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

          3  |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

          4  |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

             |

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

* STATA’s built-in factor variable syntax has some advantages, like it is easy to set the excluded category value, ib#. On the other hand, the built-in factor syntax does not produce dummy variables in your variable list, which are sometimes handy to have.

* Note that, and this is important, the excluded category is arbitrary, and the model fit is the same (same R-square, same F-test), but the output just looks different because the comparison category is changed.

 

. regress incwage ib2.metro if age>=30 & age <=64 & sex==1 & metro ~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          1  |  -7255.712   668.0533   -10.86   0.000    -8565.127   -5946.297

          3  |   8757.676   591.1938    14.81   0.000      7598.91    9916.443

          4  |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

             |

       _cons |   34445.36   470.6309    73.19   0.000      33522.9    35367.82

------------------------------------------------------------------------------

 

* The older STATA syntax for creating dummy variables is xi (this works on all versions through STATA 11, not sure about STATA 12). xi can be stand-alone or as the prefix for a regression command.

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_2 omitted)

 

* With the xi syntax, you need a separate command to specify which category value is going to be the excluded, or omitted category.

 

. char metro[omit] 0

 

. xi i.metro

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

 

. table metro, contents(mean  _Imetro_1 mean  _Imetro_2 mean  _Imetro_3 mean  _I metro_4)

 

--------------------------------------------------------------------------------

Metropolitan central city   |

status                      |    __000002     __000003     __000004     __000005

----------------------------+---------------------------------------------------

           Not identifiable |           0            0            0            0

          Not in metro area |           1            0            0            0

               Central city |           0            1            0            0

       Outside central city |           0            0            1            0

Central city status unknown |           0            0            0            1

--------------------------------------------------------------------------------

 

* What the dummy variables actually look like.

 

. char metro[omit] 1

 

* Change the excluded category, then run the regression again with xi creating the dummies.

 

. xi: regress incwage i.metro if age>=30 & age <=64 & sex==1 & metro ~=0

i.metro           _Imetro_0-4         (naturally coded; _Imetro_1 omitted)

note: _Imetro_0 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |  (omitted)

   _Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

   _Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

   _Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

* When we have the dummy variables on hand, as we do in STATA 11, we can test other alternative contrasts in addition to the contrasts against the excluded category, which is what one gets from the regression output.

 

. lincom _Imetro_4- _Imetro_2

 

 ( 1)  - _Imetro_2 + _Imetro_4 = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

------------------------------------------------------------------------------

 

* Now using desmat. Note that when you use the desmat prefix, STATA assumes that all predictor variables will be categorical unless you use the prefix “@”.

 

. desmat: regress incwage metro=ind(2) if age>=30 & age <=64 & sex==1 & metro ~=0

--------------------------------------------------------------------------------

   Linear regression

--------------------------------------------------------------------------------

   Dependent variable                                                   incwage

   Number of observations:                                                29241

   F statistic:                                                         252.703

   Model degrees of freedom:                                                  3

   Residual degrees of freedom:                                           29237

   R-squared:                                                             0.025

   Adjusted R-squared:                                                    0.025

   Root MSE                                                           38600.339

   Prob:                                                                  0.000

--------------------------------------------------------------------------------

nr Effect                                                     Coeff        s.e.

--------------------------------------------------------------------------------

   metro

1    Not identifiable                                         0.000           .

2    Central city                                          7255.712**   668.053

3    Outside central city                                 16013.388**   593.985

4    Central city status unknown                           8368.313**   758.706

5  _cons                                                  27189.646**   474.133

--------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. lincom _x_4-_x_2

 

 ( 1)  - _x_2 + _x_4 = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

------------------------------------------------------------------------------

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web

> pages\soc_meth_proj3\fall_2011_381_logs\class5.log

  log type:  text

 closed on:  11 Oct 2011, 15:34:29

--------------------------------------------------------------------------------