----------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth

> _proj3\2011_180B_logs\class8.log

log type:  text

opened on:  17 Feb 2011, 12:46:12

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new_extras.dta", clear

*In this class we mostly talked about the "what changes and doesn't change in regression" page on my website. Here below are some examples of how regression changes when you alter the inputs.

. table sex if age >=25 & age<=34, contents(freq mean yrsed sd yrsed)

-------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)

----------+--------------------------------------

Male |       9,027     13.31212     2.967666

Female |       9,511     13.55657     2.854472

-------------------------------------------------

* A familiar male-female education gap.

. regress yrsed male if age>=25 & age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

_cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

* Note that the women's educational attainment is the constant, and the men are compared to that.

. regress yrsed female if age>=25 & age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

* When we reverse the comparison category for gender, we get the same coefficient and T-statistic (but with opposite sign), but sign doesn't matter in T-statistics- the substantive meaning would be the same. Note that now the constant is the average educational attainment for men. The R-square is the same because it is the same model.

. regress monthsed female if age>=25 & age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  39850.9104     1  39850.9104           Prob > F      =  0.0000

Residual |  22605108.7 18536  1219.52464           R-squared     =  0.0018

Total |  22644959.6 18537  1221.60865           Root MSE      =  34.922

------------------------------------------------------------------------------

monthsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   2.933363   .5131471     5.72   0.000     1.927547    3.939178

_cons |   159.7454    .367556   434.62   0.000      159.025    160.4659

------------------------------------------------------------------------------

* What if we change the units of the dependent variable by multiplying the dependent variable by 12 (monthsed is just yrsed*12)? The coefficient, constant, and their standard errors are all 12 times bigger, the T statistics and R-square are exactly the same as before, because this is the same model.

. regress  incwage yrsed male ib3.metro if age<=25 & age<=64 & metro~=0

Source |       SS       df       MS              Number of obs =   19649

-------------+------------------------------           F(  5, 19643) =  710.96

Model |  3.8515e+11     5  7.7029e+10           Prob > F      =  0.0000

Residual |  2.1282e+12 19643   108345204           R-squared     =  0.1532

Total |  2.5134e+12 19648   127919983           Root MSE      =   10409

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   1582.685    27.6587    57.22   0.000     1528.471    1636.898

male |   2823.806   148.9492    18.96   0.000     2531.853    3115.759

|

metro |

1  |  -885.8211   201.2487    -4.40   0.000    -1280.286   -491.3565

2  |   12.15895    189.503     0.06   0.949    -359.2831     383.601

4  |  -275.0225   226.7472    -1.21   0.225    -719.4662    169.4212

|

_cons |  -12039.34   358.3797   -33.59   0.000     -12741.8   -11336.89

------------------------------------------------------------------------------

. regress  incwage yrsed female ib3.metro if age<=25 & age<=64 & metro~=0

Source |       SS       df       MS              Number of obs =   19649

-------------+------------------------------           F(  5, 19643) =  710.96

Model |  3.8515e+11     5  7.7029e+10           Prob > F      =  0.0000

Residual |  2.1282e+12 19643   108345204           R-squared     =  0.1532

Total |  2.5134e+12 19648   127919983           Root MSE      =   10409

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   1582.685    27.6587    57.22   0.000     1528.471    1636.898

female |  -2823.806   148.9492   -18.96   0.000    -3115.759   -2531.853

|

metro |

1  |  -885.8211   201.2487    -4.40   0.000    -1280.286   -491.3565

2  |   12.15895    189.503     0.06   0.949    -359.2831     383.601

4  |  -275.0225   226.7472    -1.21   0.225    -719.4662    169.4212

|

_cons |  -9215.538   347.9368   -26.49   0.000    -9897.523   -8533.552

------------------------------------------------------------------------------

* Reversing the comparison category of female and male in the multiple regression context leaves all the other variables and the R-Square unchanged, but the constant changes (because the constant is reflects the predicted value of cases who are zeros for all the variables).

. regress  incwage yrsed female ib2.metro if age<=25 & age<=64 & metro~=0

Source |       SS       df       MS              Number of obs =   19649

-------------+------------------------------           F(  5, 19643) =  710.96

Model |  3.8515e+11     5  7.7029e+10           Prob > F      =  0.0000

Residual |  2.1282e+12 19643   108345204           R-squared     =  0.1532

Total |  2.5134e+12 19648   127919983           Root MSE      =   10409

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

yrsed |   1582.685    27.6587    57.22   0.000     1528.471    1636.898

female |  -2823.806   148.9492   -18.96   0.000    -3115.759   -2531.853

|

metro |

1  |    -897.98   214.8246    -4.18   0.000    -1319.055   -476.9055

3  |  -12.15895    189.503    -0.06   0.949     -383.601    359.2831

4  |  -287.1814   238.8192    -1.20   0.229    -755.2874    180.9245

|

_cons |  -9203.379   357.8541   -25.72   0.000    -9904.803   -8501.954

------------------------------------------------------------------------------

* This model above is the same as the one above it, all we have changed is the comparison category of the metro variable (here category 2 instead of category 3), which then in turn also changes the constant. But the other coefficients and the R-square are unchanged.

. desmat: regress  incwage @yrsed female metro=ind(2) if age<=25 & age<=64 & metro~=0

---------------------------------------------------------------------------------------

Linear regression

---------------------------------------------------------------------------------------

Dependent variable                                                          incwage

Number of observations:                                                       19649

F statistic:                                                                710.963

Model degrees of freedom:                                                         5

Residual degrees of freedom:                                                  19643

R-squared:                                                                    0.153

Root MSE                                                                  10408.900

Prob:                                                                         0.000

---------------------------------------------------------------------------------------

nr Effect                                                            Coeff        s.e.

---------------------------------------------------------------------------------------

1  based on educrec                                               1582.685**    27.659

female

2    1                                                           -2823.806**   148.949

metro

3    Not identifiable                                                0.000           .

4    Central city                                                  897.980**   214.825

5    Outside central city                                          885.821**   201.249

6    Central city status unknown                                   610.799*    248.100

7  _cons                                                        -10101.359**   353.967

---------------------------------------------------------------------------------------

*  p < .05

** p < .01

* And here I am using desmat so that I can recover the dummy variables and compare them. And I will be comparing _x_6 (central city status unknown) to _x_5 (outside central city= suburbs).

. lincom _x_6- _x_5

( 1)  - _x_5 + _x_6 = 0

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

(1) |  -275.0225   226.7472    -1.21   0.225    -719.4662    169.4212

------------------------------------------------------------------------------

. desmat: regress  incwage @yrsed female metro=ind(3) if age<=25 & age<=64 & metro~=0

---------------------------------------------------------------------------------------

Linear regression

---------------------------------------------------------------------------------------

Dependent variable                                                          incwage

Number of observations:                                                       19649

F statistic:                                                                710.963

Model degrees of freedom:                                                         5

Residual degrees of freedom:                                                  19643

R-squared:                                                                    0.153

Root MSE                                                                  10408.900

Prob:                                                                         0.000

---------------------------------------------------------------------------------------

nr Effect                                                            Coeff        s.e.

---------------------------------------------------------------------------------------

1  based on educrec                                               1582.685**    27.659

female

2    1                                                           -2823.806**   148.949

metro

3    Not identifiable                                                0.000           .

4    Not in metro area                                            -897.980**   214.825

5    Outside central city                                          -12.159     189.503

6    Central city status unknown                                  -287.181     238.819

7  _cons                                                         -9203.379**   357.854

---------------------------------------------------------------------------------------

*  p < .05

** p < .01

* Note that in the above regression, the Not in metro area contrast is just like the central city coefficient in the regression before. And if we look at the same contrast, we always get the same result. So changing the comparison category of the categorical variables *seems* to change things, but in reality nothing really changes if you compare apples to apples…

. lincom _x_6- _x_5

( 1)  - _x_5 + _x_6 = 0

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

(1) |  -275.0225   226.7472    -1.21   0.225    -719.4662    169.4212

------------------------------------------------------------------------------

. log close

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s

> oc_meth_proj3\2011_180B_logs\class8.log

log type:  text

closed on:  17 Feb 2011, 15:43:36

---------------------------------------------------------------------------------------