---------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_met

> h_proj3\fall_2010_s381_logs\class9.log

  log type:  text

 opened on:  19 Oct 2010, 13:54:49

 

*Note: the early part of the log was stuff I did before class…

 

. label val  union_adj union_adj

 

. label var  union_adj "union, with missing or unknown values set to missing"

 

. tabulate union  union_adj

 

                      |  union, with missing

                      | or unknown values set

                      |      to missing

     Union membership | non Union      Union |     Total

----------------------+----------------------+----------

    No union coverage |    11,383          0 |    11,383

Member of labor union |         0      1,883 |     1,883

----------------------+----------------------+----------

                Total |    11,383      1,883 |    13,266

 

 

. tabulate union  union_adj, miss

 

                      |  union, with missing or unknown

                      |      values set to missing

     Union membership | non Union      Union          . |     Total

----------------------+---------------------------------+----------

                  NIU |         0          0    120,249 |   120,249

    No union coverage |    11,383          0          0 |    11,383

Member of labor union |         0      1,883          0 |     1,883

Covered by union but  |         0          0        195 |       195

----------------------+---------------------------------+----------

                Total |    11,383      1,883    120,444 |   133,710

 

* The union variable has a lot of missing values.

 

 

. regress incwage lawyer yrsed if age>24 & age<65

 

      Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  2, 69302) = 4251.20

       Model |  7.6321e+12     2  3.8161e+12           Prob > F      =  0.0000

    Residual |  6.2209e+13 69302   897646975           R-squared     =  0.1093

-------------+------------------------------           Adj R-squared =  0.1093

       Total |  6.9841e+13 69304  1.0077e+09           Root MSE      =   29961

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

     lawyers |   36894.98    1490.42    24.75   0.000     33973.76     39816.2

       yrsed |   3274.522   38.00591    86.16   0.000     3200.031    3349.014

       _cons |  -17352.28   519.5092   -33.40   0.000    -18370.51   -16334.04

------------------------------------------------------------------------------

 

. regress incwage i.lawyer yrsed i.union_adj if age>24 & age<65

 

      Source |       SS       df       MS              Number of obs =   10833

-------------+------------------------------           F(  3, 10829) =  484.60

       Model |  1.1919e+12     3  3.9729e+11           Prob > F      =  0.0000

    Residual |  8.8780e+12 10829   819837960           R-squared     =  0.1184

-------------+------------------------------           Adj R-squared =  0.1181

       Total |  1.0070e+13 10832   929643520           Root MSE      =   28633

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

   1.lawyers |   43120.76   3192.596    13.51   0.000     36862.68    49378.83

       yrsed |   3454.145   102.2866    33.77   0.000     3253.645    3654.646

 1.union_adj |   3512.319    745.304     4.71   0.000     2051.387    4973.252

       _cons |  -13502.27   1439.835    -9.38   0.000    -16324.61   -10679.93

------------------------------------------------------------------------------

 

*when we put union into the regression, the missing values (if appropriately set to missing) all drop out, and our sample size is much smaller.

 

. summarize incwage if age>24 & age<65

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |     69305    26602.35    31745.03          0     364302

 

 

. desmat: regress incwage lawyer @yrsed metro=ind(2) if age>24 & age<65

---------------------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------------------

   Dependent variable                                                                incwage

   Number of observations:                                                             69305

   F statistic:                                                                     1558.061

   Model degrees of freedom:                                                               6

   Residual degrees of freedom:                                                        69298

   R-squared:                                                                          0.119

   Adjusted R-squared:                                                                 0.119

   Root MSE                                                                        29799.949

   Prob:                                                                               0.000

---------------------------------------------------------------------------------------------

nr Effect                                                                  Coeff        s.e.

---------------------------------------------------------------------------------------------

   lawyer

1    1                                                                 36261.543**  1483.334

2  based on educrec                                                     3202.844**    37.906

   metro

3    Not identifiable                                                   5012.005*   2158.816

4    Central city                                                       4875.786**   334.675

5    Outside central city                                               8277.317**   303.468

6    Central city status unknown                                        4317.507**   383.190

7  _cons                                                              -21458.367**   551.138

---------------------------------------------------------------------------------------------

*  p < .05

** p < .01

 

*Note that the constant term is negative, even though incwage is never negative for ages 25-64.

 

. codebook metro

 

---------------------------------------------------------------------------------------------

metro                                                        Metropolitan central city status

---------------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  metrolbl

 

                 range:  [0,4]                        units:  1

         unique values:  5                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                           340         0  Not identifiable

                         29658         1  Not in metro area

                         32481         2  Central city

                         51468         3  Outside central city

                         19763         4  Central city status unknown

 

. lincom  _x_5- _x_4

 

 ( 1)  - _x_4 + _x_5 = 0

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

         (1) |   3401.531   293.0987    11.61   0.000     2827.058    3976.004

------------------------------------------------------------------------------

* The suburb- central city comparison.

 

*now here is where class actually started.

 

. regress incwage yrsed if age>24 & age<65

 

      Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  1, 69303) = 7820.56

       Model |  7.0821e+12     1  7.0821e+12           Prob > F      =  0.0000

    Residual |  6.2759e+13 69303   905571287           R-squared     =  0.1014

-------------+------------------------------           Adj R-squared =  0.1014

       Total |  6.9841e+13 69304  1.0077e+09           Root MSE      =   30093

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |   3361.393   38.01022    88.43   0.000     3286.893    3435.893

       _cons |  -18294.31   520.3955   -35.15   0.000    -19314.28   -17274.34

------------------------------------------------------------------------------

 

. regress incwage  monthsed if age>24 & age<65

 

      Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  1, 69303) = 7820.56

       Model |  7.0821e+12     1  7.0821e+12           Prob > F      =  0.0000

    Residual |  6.2759e+13 69303   905571287           R-squared     =  0.1014

-------------+------------------------------           Adj R-squared =  0.1014

       Total |  6.9841e+13 69304  1.0077e+09           Root MSE      =   30093

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

    monthsed |   280.1161   3.167519    88.43   0.000     273.9078    286.3244

       _cons |  -18294.31   520.3955   -35.15   0.000    -19314.28   -17274.34

------------------------------------------------------------------------------

 

* Take a look at what changes and what doesn't change in regression, posted on my website. Changing units of X1 does not change the goodness of fit of the model, or the relevant T statistic.

 

 

. gen twice_incwage=incwage*2

(30484 missing values generated)

 

* How about if we change the units of Y?

 

. regress twice_incwage yrsed if age>24 & age<65

 

      Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  1, 69303) = 7820.56

       Model |  2.8328e+13     1  2.8328e+13           Prob > F      =  0.0000

    Residual |  2.5104e+14 69303  3.6223e+09           R-squared     =  0.1014

-------------+------------------------------           Adj R-squared =  0.1014

       Total |  2.7936e+14 69304  4.0310e+09           Root MSE      =   60185

 

------------------------------------------------------------------------------

twice_incw~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |   6722.787   76.02045    88.43   0.000     6573.787    6871.787

       _cons |  -36588.62   1040.791   -35.15   0.000    -38628.57   -34548.67

------------------------------------------------------------------------------

 

*No difference, to T stats or to R-square. It does change the units of B1, though.

 

. regress incwage yrsed if age>24 & age<65

 

      Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  1, 69303) = 7820.56

       Model |  7.0821e+12     1  7.0821e+12           Prob > F      =  0.0000

    Residual |  6.2759e+13 69303   905571287           R-squared     =  0.1014

-------------+------------------------------           Adj R-squared =  0.1014

       Total |  6.9841e+13 69304  1.0077e+09           Root MSE      =   30093

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

       yrsed |   3361.393   38.01022    88.43   0.000     3286.893    3435.893

       _cons |  -18294.31   520.3955   -35.15   0.000    -19314.28   -17274.34

------------------------------------------------------------------------------

 

* What if we change the excluded category of one variable?

 

. desmat: regress incwage @yrsed sex=-ind(1) if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 69305

   F statistic:                                                         6982.245

   Model degrees of freedom:                                                   2

   Residual degrees of freedom:                                            69302

   R-squared:                                                              0.168

   Adjusted R-squared:                                                     0.168

   Root MSE                                                            28961.412

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3344.693**    36.582

   sex

2    Female                                               -16356.280**   220.128

3  _cons                                                   -9645.429**   514.180

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

. summarize incwage if age>24 & age<65

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |     69305    26602.35    31745.03          0     364302

 

. desmat: regress incwage @yrsed sex=-ind(2) if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 69305

   F statistic:                                                         6982.245

   Model degrees of freedom:                                                   2

   Residual degrees of freedom:                                            69302

   R-squared:                                                              0.168

   Adjusted R-squared:                                                     0.168

   Root MSE                                                            28961.412

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3344.693**    36.582

   sex

2    Male                                                  16356.280**   220.128

3  _cons                                                  -26001.709**   511.461

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* The coefficient for yrsed doesn't care what the excluded category of gender is.

 

. desmat: regress incwage @yrsed sex=-ind(2) [aweight= perwt_rounded] if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 69305

   aweight:                                                        perwt_rounded

   F statistic:                                                         7165.374

   Model degrees of freedom:                                                   2

   Residual degrees of freedom:                                            69302

   R-squared:                                                              0.171

   Adjusted R-squared:                                                     0.171

   Root MSE                                                            29527.875

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3594.318**    38.604

   sex

2    Male                                                  16742.425**   224.386

3  _cons                                                  -29212.258**   543.215

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* Aweights give us slightly different coefficients, slightly different standard errors, therefore slightly different T-statistics. Also, R-square is a little different.

 

. desmat: regress incwage @yrsed sex=-ind(2) [fweight= perwt_rounded] if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                             142609350

   fweight:                                                        perwt_rounded

   F statistic:                                                     14744875.139

   Model degrees of freedom:                                                   2

   Residual degrees of freedom:                                        142609347

   R-squared:                                                              0.171

   Adjusted R-squared:                                                     0.171

   Root MSE                                                            29527.237

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3594.318**     0.851

   sex

2    Male                                                  16742.425**     4.946

3  _cons                                                  -29212.258**    11.975

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* perwt does not change our R-square, or our coefficients from the aweight case, but it does drop the standard errors dramatically, which would thus increase our T-statistics dramatically. And of course the number of observations is multiplied by approximately 2000.

 

. desmat: regress incwage @yrsed sex=-ind(2)  union_adj if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 10833

   F statistic:                                                          770.079

   Model degrees of freedom:                                                   3

   Residual degrees of freedom:                                            10829

   R-squared:                                                              0.176

   Adjusted R-squared:                                                     0.176

   Root MSE                                                            27683.913

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3710.291**    98.433

   sex

2    Male                                                  16460.064**   533.973

   union_adj

3    Union                                                  1544.136*    722.263

4  _cons                                                  -24829.410**  1422.855

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* Again, putting union in reduces sample size dramatically.

 

. desmat: regress incwage @yrsed sex=-ind(2) if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 69305

   F statistic:                                                         6982.245

   Model degrees of freedom:                                                   2

   Residual degrees of freedom:                                            69302

   R-squared:                                                              0.168

   Adjusted R-squared:                                                     0.168

   Root MSE                                                            28961.412

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3344.693**    36.582

   sex

2    Male                                                  16356.280**   220.128

3  _cons                                                  -26001.709**   511.461

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

*class ended here.

 

. desmat: regress incwage @yrsed sex=-ind(2) union if age>24 & age<65

---------------------------------------------------------------------------------

   Linear regression

---------------------------------------------------------------------------------

   Dependent variable                                                    incwage

   Number of observations:                                                 69305

   F statistic:                                                         2953.313

   Model degrees of freedom:                                                   5

   Residual degrees of freedom:                                            69299

   R-squared:                                                              0.176

   Adjusted R-squared:                                                     0.176

   Root MSE                                                            28823.438

   Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

1  based on educrec                                         3283.470**    36.491

   sex

2    Male                                                  16210.792**   219.236

   union

3    No union coverage                                      7519.901**   325.953

4    Member of labor union                                  9115.449**   697.180

5    Covered by union but not a member                      3646.325    2146.293

6  _cons                                                  -26339.022**   509.243

---------------------------------------------------------------------------------

*  p < .05

** p < .01

 

* But if we don't properly account for the missing values of union, that is we treat the NIU codes as just another union classification, we would incorrectly get the same sample size…

 

. tabulate union

 

                 Union membership |      Freq.     Percent        Cum.

----------------------------------+-----------------------------------

                              NIU |    120,249       89.93       89.93

                No union coverage |     11,383        8.51       98.45

            Member of labor union |      1,883        1.41       99.85

Covered by union but not a member |        195        0.15      100.00

----------------------------------+-----------------------------------

                            Total |    133,710      100.00

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_pr

> oj3\fall_2010_s381_logs\class9.log

  log type:  text

 closed on:  19 Oct 2010, 16:12:08

-------------------------------------------------------------------------------------------------