name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38

> 1_logs\class6.log

  log type:  text

 opened on:  10 Oct 2013, 13:43:13

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

 

* One left over point from HW1 is that Q4 called for a direct comparison between the income of veterans and non-veterans, and most student HW that I read skipped over this. But the simple comparison is important, and revealing.

 

. tabulate vetlast

 

     Veteran's most recent |

         period of service |      Freq.     Percent        Cum.

---------------------------+-----------------------------------

                       NIU |     30,904       23.11       23.11

                No service |     91,149       68.17       91.28

              World War II |      2,428        1.82       93.10

                Korean War |      1,716        1.28       94.38

               Vietnam Era |      3,683        2.75       97.14

             Other service |      3,830        2.86      100.00

---------------------------+-----------------------------------

                     Total |    133,710      100.00

 

. gen byte veteran=0 if vetlast~=0

(30904 missing values generated)

 

. replace veteran=1 if vetlast>1

(11657 real changes made)

 

. tabulate vetlast veteran

 

Veteran's most recent |        veteran

    period of service |         0          1 |     Total

----------------------+----------------------+----------

           No service |    91,149          0 |    91,149

         World War II |         0      2,428 |     2,428

           Korean War |         0      1,716 |     1,716

          Vietnam Era |         0      3,683 |     3,683

        Other service |         0      3,830 |     3,830

----------------------+----------------------+----------

                Total |    91,149     11,657 |   102,806

 

 

. table veteran [aweight= perwt_rounded] , contents (mean inctot)

 

------------------------

  veteran | mean(inctot)

----------+-------------

        0 |  25052.93274

        1 |   38866.1566

------------------------

 

* So note: the veterans have a lot more income (on average) than the non-veterans. Why? Because the veterans are more likely to be male, and more likely to be older, when earnings peak.

 

. graph box age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

 

. graph hbox age if occ1990==178| occ1990==95 | occ1990==125, over (occ1990)

 

* Two orientations of the box plot. Look up graph boxplot in the Stata manual for an explanation of how the outliers and whiskers are calculated.

 

*Now on to a brief discussion of dummy variables with metro as the predictor. Note that this is covered in more detail in my Excel sheet, “understanding dummy variables.”

 

. codebook metro

 

-----------------------------------------------------------------------

metro                                  Metropolitan central city status

-----------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

 

. regress incwage metro if age>29 & age<65

 

      Source |       SS       df       MS              Number of obs =   60477

-------------+------------------------------           F(  1, 60475) =  464.31

       Model |  5.0002e+11     1  5.0002e+11           Prob > F      =  0.0000

    Residual |  6.5126e+13 60475  1.0769e+09           R-squared     =  0.0076

-------------+------------------------------           Adj R-squared =  0.0076

       Total |  6.5626e+13 60476  1.0852e+09           Root MSE      =   32816

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       metro |   2870.889   133.2332    21.55   0.000     2609.752    3132.027

       _cons |   20308.34   353.9993    57.37   0.000      19614.5    21002.18

------------------------------------------------------------------------------

* Please don’t ever do this: don’t treat the categorical variable like a continuous variable and just plug it in to the regression. Stata will let you, but it is wrong, wrong, wrong. One way to think about how wrong it is: what are the units of metro? If metro doesn’t have units, you need to go the dummy variable route.

 

* First, using the old syntax of xi: and i.variable to generate the dummy variables.

 

. xi: regress incwage i.metro if age>29 & age<65 & sex==1 & metro~=0

i.metro           _Imetro_0-4         (naturally coded; _Imetro_0 omitted)

note: _Imetro_1 omitted because of collinearity

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   _Imetro_1 |          0  (omitted)

   _Imetro_2 |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

   _Imetro_3 |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

   _Imetro_4 |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

       _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------

 

* Note that the coefficients correspond to the actual differences of mean values between the categories, here everything is compared to central city, because I left category zero (not identified) out of the analysis.

 

. table metro, contents (mean _Imetro_1 mean _Imetro_2 mean _Imetro_3 mean _Imetro_4)

 

-------------------------------------------------------------------------------------

Metropolitan central city   |

status                      |      __000002      __000003      __000004      __000005

----------------------------+--------------------------------------------------------

           Not identifiable |             0             0             0             0

          Not in metro area |             1             0             0             0

               Central city |             0             1             0             0

       Outside central city |             0             0             1             0

Central city status unknown |             0             0             0             1

-------------------------------------------------------------------------------------

 

* What the dummy variables actually look like.

 

. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)

 

----------------------------------------------------------

Metropolitan central city   |

status                      |         Freq.  mean(incwage)

----------------------------+-----------------------------

           Not identifiable |            94    31743.04255

          Not in metro area |         6,628     27189.6465

               Central city |         6,727    34445.35841

       Outside central city |        11,639     43203.0348

Central city status unknown |         4,247    35557.95997

----------------------------------------------------------

 

. regress incwage ib2.metro if age>29 & age<65 & sex==1 & metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------------

           incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------------+----------------------------------------------------------------

             metro |

Not in metro area  |  -7255.712   668.0533   -10.86   0.000    -8565.127   -5946.297

Outside central..  |   8757.676   591.1938    14.81   0.000      7598.91    9916.443

Central city st..  |   1112.602   756.5223     1.47   0.141    -370.2164    2595.419

                   |

             _cons |   34445.36   470.6309    73.19   0.000      33522.9    35367.82

------------------------------------------------------------------------------------

*First, compared to city center (ib2 means compared to base value=2)

 

. regress incwage i.metro if age>29 & age<65 & sex==1 & metro~=0

 

      Source |       SS       df       MS              Number of obs =   29241

-------------+------------------------------           F(  3, 29237) =  252.70

       Model |  1.1296e+12     3  3.7652e+11           Prob > F      =  0.0000

    Residual |  4.3563e+13 29237  1.4900e+09           R-squared     =  0.0253

-------------+------------------------------           Adj R-squared =  0.0252

       Total |  4.4692e+13 29240  1.5285e+09           Root MSE      =   38600

 

------------------------------------------------------------------------------------

           incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------------+----------------------------------------------------------------

             metro |

     Central city  |   7255.712   668.0533    10.86   0.000     5946.297    8565.127

Outside central..  |   16013.39   593.9852    26.96   0.000     14849.15    17177.63

Central city st..  |   8368.313   758.7058    11.03   0.000     6881.216    9855.411

                   |

             _cons |   27189.65   474.1327    57.35   0.000     26260.33    28118.97

------------------------------------------------------------------------------------

* Next compared to rural. The above 2 regressions have different comparison category for metro, so the coefficients are all different, but the model is the same and the same contrasts can be recovered:

 

 

. lincom 2.metro-3.metro

 

 ( 1)  2.metro - 3.metro = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |  -8757.676   591.1938   -14.81   0.000    -9916.443    -7598.91

------------------------------------------------------------------------------

* The suburban-urban contrast.

 

 

*generating the 3 occupational dummy vars by hand, which is highly recommended.

. gen byte nurses=0

 

. replace nurses=1 if occ1990==95

(966 real changes made)

 

. gen byte lawyers=0

 

. replace lawyers=1 if occ1990==178

(441 real changes made)

 

. gen byte sociologists=0

 

. replace sociologists=1 if occ1990==125

(6 real changes made)

 

. table occ1990 if occ1990==178| occ1990==95 | occ1990==125, contents (freq mean inctot)

 

--------------------------------------------------

Occupation, 1990      |

basis                 |        Freq.  mean(inctot)

----------------------+---------------------------

    Registered nurses |          966    40787.1677

Sociology instructors |            6   44363.33333

              Lawyers |          441   99242.58277

--------------------------------------------------

 

. regress inctot nurses if occ1990==178| occ1990==95

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  522.88

       Model |  1.0346e+12     1  1.0346e+12           Prob > F      =  0.0000

    Residual |  2.7800e+12  1405  1.9787e+09           R-squared     =  0.2712

-------------+------------------------------           Adj R-squared =  0.2707

       Total |  3.8146e+12  1406  2.7131e+09           Root MSE      =   44482

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      nurses |  -58455.42   2556.381   -22.87   0.000    -63470.15   -53440.68

       _cons |   99242.58   2118.201    46.85   0.000     95087.41    103397.8

------------------------------------------------------------------------------

*nurses compared to lawyers.

 

. regress inctot lawyers if occ1990==178| occ1990==95

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  522.88

       Model |  1.0346e+12     1  1.0346e+12           Prob > F      =  0.0000

    Residual |  2.7800e+12  1405  1.9787e+09           R-squared     =  0.2712

-------------+------------------------------           Adj R-squared =  0.2707

       Total |  3.8146e+12  1406  2.7131e+09           Root MSE      =   44482

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

     lawyers |   58455.42   2556.381    22.87   0.000     53440.68    63470.15

       _cons |   40787.17   1431.192    28.50   0.000     37979.66    43594.67

------------------------------------------------------------------------------

*lawyers compared to nurses.

 

 

*without restricting the sample, we would get nurses compared to everyone else, which is not what we want in this case.

. regress inctot nurses

 

      Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  1,103224) =  207.52

       Model |  2.1289e+11     1  2.1289e+11           Prob > F      =  0.0000

    Residual |  1.0590e+14103224  1.0259e+09           R-squared     =  0.0020

-------------+------------------------------           Adj R-squared =  0.0020

       Total |  1.0611e+14103225  1.0279e+09           Root MSE      =   32029

 

------------------------------------------------------------------------------

      inctot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      nurses |   14915.35   1035.387    14.41   0.000        12886    16944.69

       _cons |   25871.82   100.1605   258.30   0.000     25675.51    26068.13

------------------------------------------------------------------------------

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_381_

> logs\class6.log

  log type:  text

 closed on:  10 Oct 2013, 15:51:04

-------------------------------------------------------------------------------------