--------------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_pro

> j3\2011_180B_logs\class6.log

  log type:  text

 opened on:  10 Feb 2011, 12:12:49

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

 

*take a look at the worksheet "understanding dummy variables" in my Excel file.

 

 

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

. codebook metro

 

---------------------------------------------------------------------------------------

metro                                                  Metropolitan central city status

---------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

. regress incwage ib3.metro if age>29 & age<65 & sex==1& metro!=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          1  |  -16013.39   593.9852   -26.96   0.000    -17177.63   -14849.15

          2  |  -8757.676   591.1938   -14.81   0.000    -9916.443    -7598.91

          4  |  -7645.075   691.9894   -11.05   0.000    -9001.405   -6288.744

             |

       _cons |   43203.03   357.7942   120.75   0.000     42501.74    43904.33

------------------------------------------------------------------------------

 

* Here metro=3 is the comparison category (set by ib3), and note that the constant therefore is the value for metro==3, or suburbs, 43203, and everything else is compared to that.

 

. regress incwage ib2.metro if age>29 & age<65 & sex==1& metro!=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          1  |  -7255.712   668.0533   -10.86   0.000    -8565.127   -5946.297

          3  |   8757.676   591.1938    14.81   0.000      7598.91    9916.443

          4  |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

             |

       _cons |   34445.36   470.6309    73.19   0.000      33522.9    35367.82

------------------------------------------------------------------------------

 

* If we change the comparison category we get different constant, and different comparison coefficients, and different t values, but the R square is the same ( 2.53%) because it is the same model

 

* A second syntax for dummy variables is to use the xi: syntax, and this syntax requires us to set the omitted value before hand:

 

. char metro[omit] 3

 

. xi: regress incwage i.metro if age>29 & age<65 & sex==1 & metro~=0

i.metro           _Imetro_0-4         (naturally coded; _Imetro_3 omitted)

note: _Imetro_0 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_0 |  (omitted)

   _Imetro_1 |  -16013.39   593.9852   -26.96   0.000    -17177.63   -14849.15

   _Imetro_2 |  -8757.676   591.1938   -14.81   0.000    -9916.443    -7598.91

   _Imetro_4 |  -7645.075   691.9894   -11.05   0.000    -9001.405   -6288.744

       _cons |   43203.03   357.7942   120.75   0.000     42501.74    43904.33

------------------------------------------------------------------------------

 

* Same as above. But with xi Stata generates the dummy variables and also adds them to your variable list, so you can examine them and manipulate them.

 

 

. table metro, contents(mean  _Imetro_1 mean  _Imetro_2 mean   _Imetro_4)

 

----------------------------------------------------------------------------

Metropolitan central city   |

status                      | mean(_Imetr~1)  mean(_Imetr~2)  mean(_Imetr~4)

----------------------------+-----------------------------------------------

           Not identifiable |              0               0               0

          Not in metro area |              1               0               0

               Central city |              0               1               0

       Outside central city |              0               0               0

Central city status unknown |              0               0               1

----------------------------------------------------------------------------

 

* You can also look at the t statistic for any contrast among the categories which may not have been the contrast highlighted by the regression given the comparison category you picked.

 

. lincom  _Imetro_4- _Imetro_2

 

 ( 1)  - _Imetro_2 + _Imetro_4 = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

------------------------------------------------------------------------------

 

* Here we get  a contrast between metro2 and metro4, which is the same contrast and t-statistic that we got above when metro=2 was the comparison category. lincom is a post-estimation command, meaning it only can be used after regression.

 

* One last syntax for generating dummy variables is desmat, which you would have to install (for free). Unlike xi: and i. syntaxes which assume that predictors are continuous unless you put the i. in front of the variable to indicate a continuous predictor variable, desmat assumes all predictors are categorical, and you would need to put the @ in front of any continuous predictor. See Stata help for more details.

 

. *net install desmat

 

. desmat: regress incwage metro=ind(2) if age>29 & age<65 & sex==1 & metro~=0

---------------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------------

   Dependent variable                                                          incwage

   Number of observations:                                                       29241

   F statistic:                                                                252.703

   Model degrees of freedom:                                                         3

   Residual degrees of freedom:                                                  29237

   R-squared:                                                                    0.025

   Adjusted R-squared:                                                           0.025

   Root MSE                                                                  38600.339

   Prob:                                                                         0.000

---------------------------------------------------------------------------------------

nr Effect                                                            Coeff        s.e.

---------------------------------------------------------------------------------------

   metro

1    Not identifiable                                                0.000           .

2    Central city                                                 7255.712**   668.053

3    Outside central city                                        16013.388**   593.985

4    Central city status unknown                                  8368.313**   758.706

5  _cons                                                         27189.646**   474.133

---------------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s

> oc_meth_proj3\2011_180B_logs\class6.log

  log type:  text

 closed on:  10 Feb 2011, 15:05:37

---------------------------------------------------------------------------------------