--------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381_l

> ogs\class6.log

  log type:  text

 opened on:  11 Oct 2012, 12:09:02

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

 

. *class starts here...

 

*A couple of  examples of box plots:

 

. graph box age if occ1990==178| occ1990==95| occ1990==125, over(occ1990)

 

. graph hbox age if occ1990==178| occ1990==95| occ1990==125, over(occ1990)

 

. codebook metro

 

------------------------------------------------------------------------------------

metro                                               Metropolitan central city status

------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

. table metro if age>29 & age<65 & sex==1, contents( freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

*OK, we have 5 categories of metro, of which the first (“not identifiable”) is not really useful, so we will discard it in analyses below.

 

* When dealing with categorical variable predictors, one thing you never ever want to do is treat them as continuous predictors.

 

. regress incwage metro if age>29 & age<65

 

      Source |       SS       df       MS              Number of obs =   60477

-------------+------------------------------           F(  1, 60475) =  464.31

       Model |  5.0002e+11     1  5.0002e+11           Prob > F      =  0.0000

    Residual |  6.5126e+13 60475  1.0769e+09           R-squared     =  0.0076

-------------+------------------------------           Adj R-squared =  0.0076

       Total |  6.5626e+13 60476  1.0852e+09           Root MSE      =   32816

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |   2870.889   133.2332    21.55   0.000     2609.752    3132.027

       _cons |   20308.34   353.9993    57.37   0.000      19614.5    21002.18

------------------------------------------------------------------------------

 

* This above regression is wrong in so many ways…

 

* On the subject of how to use the syntax to make dummy variables, see the “understanding dummy vars” page of my class Excel file.

 

 

. xi: regress incwage i.metro if age>29 & age<65 & sex==1 &metro~=0

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

note: _Imetro_1 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_1 |  (omitted)

   _Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

   _Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

   _Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

* Note that the constant is the actual value of income for rural men, and the other coefficients are each area minus the average of rural men.

 

* What do the dummy variables actually look like? They are

 

. table metro, contents(mean  _Imetro_1 mean  _Imetro_2 mean  _Imetro_3 mean  _Imetro_4)

 

------------------------------------------------------------------------------------

Metropolitan central city   |

status                      |     __000002      __000003      __000004      __000005

----------------------------+-------------------------------------------------------

           Not identifiable |            0             0             0             0

          Not in metro area |            1             0             0             0

               Central city |            0             1             0             0

       Outside central city |            0             0             1             0

Central city status unknown |            0             0             0             1

------------------------------------------------------------------------------------

 

. table metro if age>29 & age<65 & sex==1, contents( freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

. xi: regress incwage i.metro if age>29 & age<65 & sex==1 &metro~=0

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

note: _Imetro_1 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_1 |  (omitted)

   _Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

   _Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

   _Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

* using the ib#.variable_name syntax, it is easy to change the comparison category.

 

. regress incwage ib2.metro if age>29 & age<65 & sex==1 &metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          1  |  -7255.712   668.0533   -10.86   0.000    -8565.127   -5946.297

          3  |   8757.676   591.1938    14.81   0.000      7598.91    9916.443

          4  |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

             |

       _cons |   34445.36   470.6309    73.19   0.000      33522.9    35367.82

------------------------------------------------------------------------------

 

. regress incwage ib1.metro if age>29 & age<65 & sex==1 &metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |

          2  |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

          3  |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

          4  |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

             |

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

* But notice: changing the comparison category changes all the coefficients, but the regression goodness of fit is the same, and each specific comparison, when recovered, is exactly the same. The models are identical, just expressed differently.

 

. lincom 2.metro-3.metro

 

 ( 1)  2.metro - 3.metro = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |  -8757.676   591.1938   -14.81   0.000    -9916.443    -7598.91

------------------------------------------------------------------------------

 

* my favorite dummy variable syntax is from the free add-on, desmat.

* try ssc install desmat, replace

 

 

. desmat: regress incwage metro=ind(3) if age>29 & age<65 & sex==1 & metro~=0,desrep (zval ci)

------------------------------------------------------------------------------------

   Linear regression

------------------------------------------------------------------------------------

   Dependent variable                                                       incwage

   Number of observations:                                                    29241

   F statistic:                                                             252.703

   Model degrees of freedom:                                                      3

   Residual degrees of freedom:                                               29237

   R-squared:                                                                 0.025

   Adjusted R-squared:                                                        0.025

   Root MSE                                                               38600.339

   Prob:                                                                      0.000

------------------------------------------------------------------------------------

nr Effect                           Coeff        s.e.       t      lo 95%    hi 95%

------------------------------------------------------------------------------------

   metro

1    Not identifiable               0.000           .         .         .         .

2    Not in metro area          -7255.712**   668.053   -10.861 -8565.127 -5946.297

3    Outside central city        8757.676**   591.194    14.814  7598.910  9916.443

4    ntral city status unknown   1112.602     756.522     1.471  -370.216  2595.419

5  _cons                        34445.358**   470.631    73.190 33522.901 35367.816

------------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* For homework 2, it will be easiest to make the dummy variables by hand, in part because occupation has many hundred categories besides the 3 we are interested in.

 

. gen byte nurses=0

 

. replace nurses=1 if occ1990==95

(966 real changes made)

 

. gen byte lawyers=0

 

. replace lawyers=1 if occ1990==178

(441 real changes made)

 

. gen byte sociologists=0

 

. replace sociologists=1 if occ1990==125

(6 real changes made)

 

. table occ1990 if occ1990==178| occ1990==95| occ1990==125, contents (freq mean inctot)

 

--------------------------------------------------

Occupation, 1990      |

basis                 |        Freq.  mean(inctot)

----------------------+---------------------------

    Registered nurses |          966    40787.1677

Sociology instructors |            6   44363.33333

              Lawyers |          441   99242.58277

--------------------------------------------------

 

. regress inctot lawyers if occ1990==178|occ1990==95

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  522.88

       Model |  1.0346e+12     1  1.0346e+12           Prob > F      =  0.0000

    Residual |  2.7800e+12  1405  1.9787e+09           R-squared     =  0.2712

-------------+------------------------------           Adj R-squared =  0.2707

       Total |  3.8146e+12  1406  2.7131e+09           Root MSE      =   44482

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

     lawyers |   58455.42   2556.381    22.87   0.000     53440.68    63470.15

       _cons |   40787.17   1431.192    28.50   0.000     37979.66    43594.67

------------------------------------------------------------------------------

 

. regress inctot nurses if occ1990==178|occ1990==95

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  522.88

       Model |  1.0346e+12     1  1.0346e+12           Prob > F      =  0.0000

    Residual |  2.7800e+12  1405  1.9787e+09           R-squared     =  0.2712

-------------+------------------------------           Adj R-squared =  0.2707

       Total |  3.8146e+12  1406  2.7131e+09           Root MSE      =   44482

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      nurses |  -58455.42   2556.381   -22.87   0.000    -63470.15   -53440.68

       _cons |   99242.58   2118.201    46.85   0.000     95087.41    103397.8

------------------------------------------------------------------------------

 

*notice how these two regressions above are the same, but with comparison categories reversed.

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381

> _logs\class6.log

  log type:  text

 closed on:  11 Oct 2012, 15:51:46

------------------------------------------------------------------------------------