--------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\
> soc_meth_proj3\fall_2010_s381_logs\class13b.log
log type: text
opened on: 4 Nov 2010, 12:41:37
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear
. regress incwage vietnam_vet male age age_sq yrsed wkswork1 if age>24 & age<65
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 6, 69298) = 4795.02
Model | 2.0489e+13 6 3.4149e+12 Prob > F = 0.0000
Residual | 4.9352e+13 69298 712167136 R-squared = 0.2934
-------------+------------------------------ Adj R-squared = 0.2933
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26686
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
vietnam_vet | 421.8738 482.5732 0.87 0.382 -523.9687 1367.716
male | 11708.68 213.2031 54.92 0.000 11290.8 12126.56
age | 1616.429 80.65353 20.04 0.000 1458.348 1774.509
age_sq | -16.25437 .9189218 -17.69 0.000 -18.05546 -14.45329
yrsed | 2594.228 34.48973 75.22 0.000 2526.628 2661.828
wkswork1 | 560.5039 5.327079 105.22 0.000 550.0628 570.945
_cons | -73597.04 1730.189 -42.54 0.000 -76988.2 -70205.87
------------------------------------------------------------------------------
. stepwise, pe(.05) pr(.1) forward: regress incwage vietnam_vet male age age_sq yrsed wkswork1 if age>24 & age<65
begin with empty model
p = 0.0000 < 0.0500 adding male
p = 0.0000 < 0.0500 adding yrsed
p = 0.0000 < 0.0500 adding wkswork1
p = 0.0000 < 0.0500 adding age
p = 0.0000 < 0.0500 adding age_sq
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 5, 69299) = 5753.89
Model | 2.0489e+13 5 4.0977e+12 Prob > F = 0.0000
Residual | 4.9352e+13 69299 712164714 R-squared = 0.2934
-------------+------------------------------ Adj R-squared = 0.2933
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26686
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 11751.57 207.4817 56.64 0.000 11344.91 12158.23
yrsed | 2595.948 34.43355 75.39 0.000 2528.458 2663.437
wkswork1 | 560.4922 5.327053 105.22 0.000 550.0512 570.9332
age | 1621.023 80.48197 20.14 0.000 1463.279 1778.768
age_sq | -16.2854 .9182346 -17.74 0.000 -18.08514 -14.48566
_cons | -73754.5 1720.784 -42.86 0.000 -77127.24 -70381.77
------------------------------------------------------------------------------
* Forward stepwise
. stepwise, pe(.05) pr(.1) forward: regress incwage vietnam_vet male (age age_sq) yrsed wkswork1 if age>24 & age<65
begin with empty model
p = 0.0000 < 0.0500 adding male
p = 0.0000 < 0.0500 adding yrsed
p = 0.0000 < 0.0500 adding wkswork1
p = 0.0000 < 0.0500 adding age age_sq
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 5, 69299) = 5753.89
Model | 2.0489e+13 5 4.0977e+12 Prob > F = 0.0000
Residual | 4.9352e+13 69299 712164714 R-squared = 0.2934
-------------+------------------------------ Adj R-squared = 0.2933
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26686
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 11751.57 207.4817 56.64 0.000 11344.91 12158.23
yrsed | 2595.948 34.43355 75.39 0.000 2528.458 2663.437
wkswork1 | 560.4922 5.327053 105.22 0.000 550.0512 570.9332
age | 1621.023 80.48197 20.14 0.000 1463.279 1778.768
age_sq | -16.2854 .9182346 -17.74 0.000 -18.08514 -14.48566
_cons | -73754.5 1720.784 -42.86 0.000 -77127.24 -70381.77
------------------------------------------------------------------------------
* If there are terms that belong together, Stata groups them together (as above with age and age_sq), and stata either adds or removes them as a group, testing their joint significance.
. stepwise, pe(.05) pr(.1): regress incwage vietnam_vet male (age age_sq) yrsed wkswork1 if age>24 & age<65
begin with full model
p = 0.3820 >= 0.1000 removing vietnam_vet
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 5, 69299) = 5753.89
Model | 2.0489e+13 5 4.0977e+12 Prob > F = 0.0000
Residual | 4.9352e+13 69299 712164714 R-squared = 0.2934
-------------+------------------------------ Adj R-squared = 0.2933
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26686
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wkswork1 | 560.4922 5.327053 105.22 0.000 550.0512 570.9332
male | 11751.57 207.4817 56.64 0.000 11344.91 12158.23
age | 1621.023 80.48197 20.14 0.000 1463.279 1778.768
age_sq | -16.2854 .9182346 -17.74 0.000 -18.08514 -14.48566
yrsed | 2595.948 34.43355 75.39 0.000 2528.458 2663.437
_cons | -73754.5 1720.784 -42.86 0.000 -77127.24 -70381.77
------------------------------------------------------------------------------
*what about categorical variables in stepwise? First let's generate the categorical variables.
. desmat race metro=ind(2)
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_3 race ind(100)
2 _x_4 _x_7 metro ind(1)
* Then we will enter the race and metro terms as groups.
. stepwise, pe(.05) pr(.1) forward: regress incwage vietnam_vet male (age age_sq) yrsed wkswork1 (_x_1-_x_3) (_x_4-_x_7) if age>24 & age<65
begin with empty model
p = 0.0000 < 0.0500 adding male
p = 0.0000 < 0.0500 adding yrsed
p = 0.0000 < 0.0500 adding wkswork1
p = 0.0000 < 0.0500 adding _x_4 _x_5 _x_6 _x_7
p = 0.0000 < 0.0500 adding age age_sq
p = 0.0000 < 0.0500 adding _x_1 _x_2 _x_3
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 12, 69292) = 2520.66
Model | 2.1223e+13 12 1.7686e+12 Prob > F = 0.0000
Residual | 4.8618e+13 69292 701636523 R-squared = 0.3039
-------------+------------------------------ Adj R-squared = 0.3038
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26488
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 11736.29 206.038 56.96 0.000 11332.46 12140.12
yrsed | 2523.478 34.3151 73.54 0.000 2456.221 2590.736
wkswork1 | 558.1918 5.292014 105.48 0.000 547.8195 568.5642
_x_4 | 4410.837 1919.035 2.30 0.022 649.5327 8172.141
_x_5 | 6421.715 304.7193 21.07 0.000 5824.466 7018.964
_x_6 | 8328.797 271.2832 30.70 0.000 7797.083 8860.512
_x_7 | 4266.681 341.3702 12.50 0.000 3597.596 4935.766
age | 1644.661 79.94449 20.57 0.000 1487.97 1801.352
age_sq | -16.42797 .9118828 -18.02 0.000 -18.21526 -14.64068
_x_1 | -3195.205 350.8409 -9.11 0.000 -3882.852 -2507.557
_x_2 | -1046.089 906.2542 -1.15 0.248 -2822.345 730.1677
_x_3 | -1093.86 540.5724 -2.02 0.043 -2153.381 -34.33916
_cons | -78514.23 1723.63 -45.55 0.000 -81892.54 -75135.92
------------------------------------------------------------------------------
. desmat race metro=ind(2)
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 _x_3 race ind(100)
2 _x_4 _x_7 metro ind(1)
. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace
file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved
. stepwise, pe(.01) pr(.1) forward: regress incwage vietnam_vet male (age age_sq) yrsed wkswork1 _x_* if age>24 & age<65
begin with empty model
p = 0.0000 < 0.0100 adding male
p = 0.0000 < 0.0100 adding yrsed
p = 0.0000 < 0.0100 adding wkswork1
p = 0.0000 < 0.0100 adding age age_sq
p = 0.0000 < 0.0100 adding _x_6
p = 0.0000 < 0.0100 adding _x_5
p = 0.0000 < 0.0100 adding _x_7
p = 0.0000 < 0.0100 adding _x_1
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 9, 69295) = 3359.33
Model | 2.1216e+13 9 2.3573e+12 Prob > F = 0.0000
Residual | 4.8625e+13 69295 701714098 R-squared = 0.3038
-------------+------------------------------ Adj R-squared = 0.3037
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26490
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 11738.8 206.0421 56.97 0.000 11334.96 12142.64
yrsed | 2521.935 34.27647 73.58 0.000 2454.753 2589.117
wkswork1 | 558.5336 5.289941 105.58 0.000 548.1654 568.9019
age | 1644.306 79.94615 20.57 0.000 1487.611 1801
age_sq | -16.41599 .9118984 -18.00 0.000 -18.20331 -14.62867
_x_6 | 8264.421 268.9224 30.73 0.000 7737.333 8791.508
_x_5 | 6319.9 301.6441 20.95 0.000 5728.678 6911.122
_x_7 | 4226.566 339.9594 12.43 0.000 3560.247 4892.886
_x_1 | -3123.385 349.4589 -8.94 0.000 -3808.324 -2438.446
_cons | -78508.63 1723.007 -45.56 0.000 -81885.72 -75131.54
------------------------------------------------------------------------------
*another way to deal with categorical variables in stepwise is to use stepwise and xi together…
. xi: stepwise, pe(.01) pr(.1) forward: regress incwage vietnam_vet male (age age_sq) yrsed wkswork1 i.race i.metro if age>24 & age<65
i.race _Irace_100-650 (naturally coded; _Irace_100 omitted)
i.metro _Imetro_0-4 (naturally coded; _Imetro_0 omitted)
begin with empty model
p = 0.0000 < 0.0100 adding male
p = 0.0000 < 0.0100 adding yrsed
p = 0.0000 < 0.0100 adding wkswork1
p = 0.0000 < 0.0100 adding age age_sq
p = 0.0000 < 0.0100 adding _Imetro_1
p = 0.0000 < 0.0100 adding _Imetro_3
p = 0.0000 < 0.0100 adding _Irace_200
p = 0.0000 < 0.0100 adding _Imetro_2
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 9, 69295) = 3360.18
Model | 2.1219e+13 9 2.3577e+12 Prob > F = 0.0000
Residual | 4.8622e+13 69295 701660264 R-squared = 0.3038
-------------+------------------------------ Adj R-squared = 0.3037
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 26489
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 11738.54 206.0338 56.97 0.000 11334.72 12142.37
yrsed | 2521.323 34.27599 73.56 0.000 2454.142 2588.503
wkswork1 | 558.4997 5.289757 105.58 0.000 548.1318 568.8676
age | 1645.217 79.94374 20.58 0.000 1488.527 1801.906
age_sq | -16.4273 .9118703 -18.01 0.000 -18.21457 -14.64004
_Imetro_1 | -4285.404 338.9 -12.65 0.000 -4949.647 -3621.16
_Imetro_3 | 4035.274 306.6309 13.16 0.000 3434.278 4636.27
_Irace_200 | -3123.361 349.4428 -8.94 0.000 -3808.268 -2438.454
_Imetro_2 | 2090.469 334.854 6.24 0.000 1434.155 2746.782
_cons | -74286.69 1727.303 -43.01 0.000 -77672.21 -70901.18
------------------------------------------------------------------------------
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_p
> roj3\fall_2010_s381_logs\class13b.log
log type: text
closed on: 4 Nov 2010, 15:45:31
------------------------------------------------------------------------------------------------