-----------------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\

> 2011_180B_logs\class5.log

log type:  text

opened on:   8 Feb 2011, 14:00:48

* Mostly in class 5 we talked about the class Excel file, about regression and best fit lines. Here below are the 3 models that correspond to the worksheet in my Excel file which I have named "regression graphs and fits"

. regress incwage age if age>24 & age<65 [aweight=perwt_rounded]

(sum of wgt is   1.4261e+08)

Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  1, 69303) =    0.37

Model |   384750491     1   384750491           Prob > F      =  0.5454

Residual |  7.2919e+13 69303  1.0522e+09           R-squared     =  0.0000

Total |  7.2919e+13 69304  1.0522e+09           Root MSE      =   32437

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

age |  -7.044124   11.64878    -0.60   0.545    -29.87572    15.78747

_cons |   27764.45   511.3078    54.30   0.000     26762.29    28766.61

------------------------------------------------------------------------------

* age by itself is not a good predictor of incwage. The T statistic is smaller (in absolute value) than 1, because the standard error of this coefficient is larger than the coefficient itself. We cannot discard the null hypothesis that there is no linear relationship between age and incwage. The R-square is zero, because this model explains none of the variance in incwage.

* Notice also that in this model and in the following models, the constant term is not particularly meaningful or helpful, because we don't really care what the predicted income is of people who are zero years old- it is not a meaningful number.

. gen age_sq=age^2

* one thing we can notice when we graphed age versus income is that the relationship is like an upside down U, or a parabola. We need a second order age term to fit a parabolic shape.

. regress incwage age age_sq if age>24 & age<65 [aweight=perwt_rounded]

(sum of wgt is   1.4261e+08)

Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  2, 69302) =  589.92

Model |  1.2206e+12     2  6.1032e+11           Prob > F      =  0.0000

Residual |  7.1698e+13 69302  1.0346e+09           R-squared     =  0.0167

Total |  7.2919e+13 69304  1.0522e+09           Root MSE      =   32165

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

age |   3252.025   95.59684    34.02   0.000     3064.655    3439.395

age_sq |  -37.33998   1.087252   -34.34   0.000    -39.47099   -35.20897

_cons |  -39131.12   2012.747   -19.44   0.000    -43076.11   -35186.14

------------------------------------------------------------------------------

* Once age-squared is included as a predictor, both age and age squared are highly significant. R-square is still only 1.67%, but that is better than zero for sure.

. regress incwage age age_sq yrsed if age>24 & age<65 [aweight=perwt_rounded]

(sum of wgt is   1.4261e+08)

Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  3, 69301) = 3083.86

Model |  8.5881e+12     3  2.8627e+12           Prob > F      =  0.0000

Residual |  6.4331e+13 69301   928282733           R-squared     =  0.1178

Total |  7.2919e+13 69304  1.0522e+09           Root MSE      =   30468

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

age |   2891.918   90.64298    31.90   0.000     2714.258    3069.578

age_sq |  -32.46307   1.031339   -31.48   0.000     -34.4845   -30.44165

yrsed |   3561.537   39.97782    89.09   0.000     3483.181    3639.894

_cons |  -81235.26   1964.252   -41.36   0.000    -85085.19   -77385.33

------------------------------------------------------------------------------

* The way we would interpret this yrsed coefficient is that each additional year of education adds \$3561 to a person's annual income for 1999, net of the effects of age. That is, each predictor is calculated net of the other predictors.

. log close

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s

> oc_meth_proj3\2011_180B_logs\class5.log

log type:  text

closed on:   8 Feb 2011, 15:36:35

---------------------------------------------------------------------------------------