----------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth
> _proj3\2011_180B_logs\class8.log
log type: text
opened on: 17 Feb 2011, 12:46:12
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new_extras.dta", clear
*In this class we mostly talked about the "what changes and doesn't change in regression" page on my website. Here below are some examples of how regression changes when you alter the inputs.
. table sex if age >=25 & age<=34, contents(freq mean yrsed sd yrsed)
-------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed)
----------+--------------------------------------
Male | 9,027 13.31212 2.967666
Female | 9,511 13.55657 2.854472
-------------------------------------------------
* A familiar male-female education gap.
. regress yrsed male if age>=25 & age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
* Note that the women's educational attainment is the constant, and the men are compared to that.
. regress yrsed female if age>=25 & age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .2444469 .0427623 5.72 0.000 .1606289 .3282649
_cons | 13.31212 .0306297 434.62 0.000 13.25208 13.37216
------------------------------------------------------------------------------
* When we reverse the comparison category for gender, we get the same coefficient and T-statistic (but with opposite sign), but sign doesn't matter in T-statistics- the substantive meaning would be the same. Note that now the constant is the average educational attainment for men. The R-square is the same because it is the same model.
. regress monthsed female if age>=25 & age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 39850.9104 1 39850.9104 Prob > F = 0.0000
Residual | 22605108.7 18536 1219.52464 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 22644959.6 18537 1221.60865 Root MSE = 34.922
------------------------------------------------------------------------------
monthsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 2.933363 .5131471 5.72 0.000 1.927547 3.939178
_cons | 159.7454 .367556 434.62 0.000 159.025 160.4659
------------------------------------------------------------------------------
* What if we change the units of the dependent variable by multiplying the dependent variable by 12 (monthsed is just yrsed*12)? The coefficient, constant, and their standard errors are all 12 times bigger, the T statistics and R-square are exactly the same as before, because this is the same model.
. regress incwage yrsed male ib3.metro if age<=25 & age<=64 & metro~=0
Source | SS df MS Number of obs = 19649
-------------+------------------------------ F( 5, 19643) = 710.96
Model | 3.8515e+11 5 7.7029e+10 Prob > F = 0.0000
Residual | 2.1282e+12 19643 108345204 R-squared = 0.1532
-------------+------------------------------ Adj R-squared = 0.1530
Total | 2.5134e+12 19648 127919983 Root MSE = 10409
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 1582.685 27.6587 57.22 0.000 1528.471 1636.898
male | 2823.806 148.9492 18.96 0.000 2531.853 3115.759
|
metro |
1 | -885.8211 201.2487 -4.40 0.000 -1280.286 -491.3565
2 | 12.15895 189.503 0.06 0.949 -359.2831 383.601
4 | -275.0225 226.7472 -1.21 0.225 -719.4662 169.4212
|
_cons | -12039.34 358.3797 -33.59 0.000 -12741.8 -11336.89
------------------------------------------------------------------------------
. regress incwage yrsed female ib3.metro if age<=25 & age<=64 & metro~=0
Source | SS df MS Number of obs = 19649
-------------+------------------------------ F( 5, 19643) = 710.96
Model | 3.8515e+11 5 7.7029e+10 Prob > F = 0.0000
Residual | 2.1282e+12 19643 108345204 R-squared = 0.1532
-------------+------------------------------ Adj R-squared = 0.1530
Total | 2.5134e+12 19648 127919983 Root MSE = 10409
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 1582.685 27.6587 57.22 0.000 1528.471 1636.898
female | -2823.806 148.9492 -18.96 0.000 -3115.759 -2531.853
|
metro |
1 | -885.8211 201.2487 -4.40 0.000 -1280.286 -491.3565
2 | 12.15895 189.503 0.06 0.949 -359.2831 383.601
4 | -275.0225 226.7472 -1.21 0.225 -719.4662 169.4212
|
_cons | -9215.538 347.9368 -26.49 0.000 -9897.523 -8533.552
------------------------------------------------------------------------------
* Reversing the comparison category of female and male in the multiple regression context leaves all the other variables and the R-Square unchanged, but the constant changes (because the constant is reflects the predicted value of cases who are zeros for all the variables).
. regress incwage yrsed female ib2.metro if age<=25 & age<=64 & metro~=0
Source | SS df MS Number of obs = 19649
-------------+------------------------------ F( 5, 19643) = 710.96
Model | 3.8515e+11 5 7.7029e+10 Prob > F = 0.0000
Residual | 2.1282e+12 19643 108345204 R-squared = 0.1532
-------------+------------------------------ Adj R-squared = 0.1530
Total | 2.5134e+12 19648 127919983 Root MSE = 10409
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 1582.685 27.6587 57.22 0.000 1528.471 1636.898
female | -2823.806 148.9492 -18.96 0.000 -3115.759 -2531.853
|
metro |
1 | -897.98 214.8246 -4.18 0.000 -1319.055 -476.9055
3 | -12.15895 189.503 -0.06 0.949 -383.601 359.2831
4 | -287.1814 238.8192 -1.20 0.229 -755.2874 180.9245
|
_cons | -9203.379 357.8541 -25.72 0.000 -9904.803 -8501.954
------------------------------------------------------------------------------
* This model above is the same as the one above it, all we have changed is the comparison category of the metro variable (here category 2 instead of category 3), which then in turn also changes the constant. But the other coefficients and the R-square are unchanged.
. desmat: regress incwage @yrsed female metro=ind(2) if age<=25 & age<=64 & metro~=0
---------------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 19649
F statistic: 710.963
Model degrees of freedom: 5
Residual degrees of freedom: 19643
R-squared: 0.153
Adjusted R-squared: 0.153
Root MSE 10408.900
Prob: 0.000
---------------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------------
1 based on educrec 1582.685** 27.659
female
2 1 -2823.806** 148.949
metro
3 Not identifiable 0.000 .
4 Central city 897.980** 214.825
5 Outside central city 885.821** 201.249
6 Central city status unknown 610.799* 248.100
7 _cons -10101.359** 353.967
---------------------------------------------------------------------------------------
* p < .05
** p < .01
* And here I am using desmat so that I can recover the dummy variables and compare them. And I will be comparing _x_6 (central city status unknown) to _x_5 (outside central city= suburbs).
. lincom _x_6- _x_5
( 1) - _x_5 + _x_6 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -275.0225 226.7472 -1.21 0.225 -719.4662 169.4212
------------------------------------------------------------------------------
. desmat: regress incwage @yrsed female metro=ind(3) if age<=25 & age<=64 & metro~=0
---------------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 19649
F statistic: 710.963
Model degrees of freedom: 5
Residual degrees of freedom: 19643
R-squared: 0.153
Adjusted R-squared: 0.153
Root MSE 10408.900
Prob: 0.000
---------------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------------
1 based on educrec 1582.685** 27.659
female
2 1 -2823.806** 148.949
metro
3 Not identifiable 0.000 .
4 Not in metro area -897.980** 214.825
5 Outside central city -12.159 189.503
6 Central city status unknown -287.181 238.819
7 _cons -9203.379** 357.854
---------------------------------------------------------------------------------------
* p < .05
** p < .01
* Note that in the above regression, the Not in metro area contrast is just like the central city coefficient in the regression before. And if we look at the same contrast, we always get the same result. So changing the comparison category of the categorical variables *seems* to change things, but in reality nothing really changes if you compare apples to apples…
. lincom _x_6- _x_5
( 1) - _x_5 + _x_6 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -275.0225 226.7472 -1.21 0.225 -719.4662 169.4212
------------------------------------------------------------------------------
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s
> oc_meth_proj3\2011_180B_logs\class8.log
log type: text
closed on: 17 Feb 2011, 15:43:36
---------------------------------------------------------------------------------------