---------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_met
> h_proj3\fall_2010_s381_logs\class9.log
log type: text
opened on: 19 Oct 2010, 13:54:49
*Note: the early part of the log was stuff I did before class…
. label val union_adj union_adj
. label var union_adj "union, with missing or unknown values set to missing"
. tabulate union union_adj
| union, with missing
| or unknown values set
| to missing
Union membership | non Union Union | Total
----------------------+----------------------+----------
No union coverage | 11,383 0 | 11,383
Member of labor union | 0 1,883 | 1,883
----------------------+----------------------+----------
Total | 11,383 1,883 | 13,266
. tabulate union union_adj, miss
| union, with missing or unknown
| values set to missing
Union membership | non Union Union . | Total
----------------------+---------------------------------+----------
NIU | 0 0 120,249 | 120,249
No union coverage | 11,383 0 0 | 11,383
Member of labor union | 0 1,883 0 | 1,883
Covered by union but | 0 0 195 | 195
----------------------+---------------------------------+----------
Total | 11,383 1,883 120,444 | 133,710
* The union variable has a lot of missing values.
. regress incwage lawyer yrsed if age>24 & age<65
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 2, 69302) = 4251.20
Model | 7.6321e+12 2 3.8161e+12 Prob > F = 0.0000
Residual | 6.2209e+13 69302 897646975 R-squared = 0.1093
-------------+------------------------------ Adj R-squared = 0.1093
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 29961
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36894.98 1490.42 24.75 0.000 33973.76 39816.2
yrsed | 3274.522 38.00591 86.16 0.000 3200.031 3349.014
_cons | -17352.28 519.5092 -33.40 0.000 -18370.51 -16334.04
------------------------------------------------------------------------------
. regress incwage i.lawyer yrsed i.union_adj if age>24 & age<65
Source | SS df MS Number of obs = 10833
-------------+------------------------------ F( 3, 10829) = 484.60
Model | 1.1919e+12 3 3.9729e+11 Prob > F = 0.0000
Residual | 8.8780e+12 10829 819837960 R-squared = 0.1184
-------------+------------------------------ Adj R-squared = 0.1181
Total | 1.0070e+13 10832 929643520 Root MSE = 28633
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.lawyers | 43120.76 3192.596 13.51 0.000 36862.68 49378.83
yrsed | 3454.145 102.2866 33.77 0.000 3253.645 3654.646
1.union_adj | 3512.319 745.304 4.71 0.000 2051.387 4973.252
_cons | -13502.27 1439.835 -9.38 0.000 -16324.61 -10679.93
------------------------------------------------------------------------------
*when we put union into the regression, the missing values (if appropriately set to missing) all drop out, and our sample size is much smaller.
. summarize incwage if age>24 & age<65
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 69305 26602.35 31745.03 0 364302
. desmat: regress incwage lawyer @yrsed metro=ind(2) if age>24 & age<65
---------------------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
F statistic: 1558.061
Model degrees of freedom: 6
Residual degrees of freedom: 69298
R-squared: 0.119
Adjusted R-squared: 0.119
Root MSE 29799.949
Prob: 0.000
---------------------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------------------
lawyer
1 1 36261.543** 1483.334
2 based on educrec 3202.844** 37.906
metro
3 Not identifiable 5012.005* 2158.816
4 Central city 4875.786** 334.675
5 Outside central city 8277.317** 303.468
6 Central city status unknown 4317.507** 383.190
7 _cons -21458.367** 551.138
---------------------------------------------------------------------------------------------
* p < .05
** p < .01
*Note that the constant term is negative, even though incwage is never negative for ages 25-64.
. codebook metro
---------------------------------------------------------------------------------------------
metro Metropolitan central city status
---------------------------------------------------------------------------------------------
type: numeric (byte)
label: metrolbl
range: [0,4] units: 1
unique values: 5 missing .: 0/133710
tabulation: Freq. Numeric Label
340 0 Not identifiable
29658 1 Not in metro area
32481 2 Central city
51468 3 Outside central city
19763 4 Central city status unknown
. lincom _x_5- _x_4
( 1) - _x_4 + _x_5 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 3401.531 293.0987 11.61 0.000 2827.058 3976.004
------------------------------------------------------------------------------
* The suburb- central city comparison.
*now here is where class actually started.
. regress incwage yrsed if age>24 & age<65
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 1, 69303) = 7820.56
Model | 7.0821e+12 1 7.0821e+12 Prob > F = 0.0000
Residual | 6.2759e+13 69303 905571287 R-squared = 0.1014
-------------+------------------------------ Adj R-squared = 0.1014
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 30093
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 3361.393 38.01022 88.43 0.000 3286.893 3435.893
_cons | -18294.31 520.3955 -35.15 0.000 -19314.28 -17274.34
------------------------------------------------------------------------------
. regress incwage monthsed if age>24 & age<65
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 1, 69303) = 7820.56
Model | 7.0821e+12 1 7.0821e+12 Prob > F = 0.0000
Residual | 6.2759e+13 69303 905571287 R-squared = 0.1014
-------------+------------------------------ Adj R-squared = 0.1014
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 30093
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
monthsed | 280.1161 3.167519 88.43 0.000 273.9078 286.3244
_cons | -18294.31 520.3955 -35.15 0.000 -19314.28 -17274.34
------------------------------------------------------------------------------
* Take a look at what changes and what doesn't change in regression, posted on my website. Changing units of X1 does not change the goodness of fit of the model, or the relevant T statistic.
. gen twice_incwage=incwage*2
(30484 missing values generated)
* How about if we change the units of Y?
. regress twice_incwage yrsed if age>24 & age<65
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 1, 69303) = 7820.56
Model | 2.8328e+13 1 2.8328e+13 Prob > F = 0.0000
Residual | 2.5104e+14 69303 3.6223e+09 R-squared = 0.1014
-------------+------------------------------ Adj R-squared = 0.1014
Total | 2.7936e+14 69304 4.0310e+09 Root MSE = 60185
------------------------------------------------------------------------------
twice_incw~e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 6722.787 76.02045 88.43 0.000 6573.787 6871.787
_cons | -36588.62 1040.791 -35.15 0.000 -38628.57 -34548.67
------------------------------------------------------------------------------
*No difference, to T stats or to R-square. It does change the units of B1, though.
. regress incwage yrsed if age>24 & age<65
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 1, 69303) = 7820.56
Model | 7.0821e+12 1 7.0821e+12 Prob > F = 0.0000
Residual | 6.2759e+13 69303 905571287 R-squared = 0.1014
-------------+------------------------------ Adj R-squared = 0.1014
Total | 6.9841e+13 69304 1.0077e+09 Root MSE = 30093
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yrsed | 3361.393 38.01022 88.43 0.000 3286.893 3435.893
_cons | -18294.31 520.3955 -35.15 0.000 -19314.28 -17274.34
------------------------------------------------------------------------------
* What if we change the excluded category of one variable?
. desmat: regress incwage @yrsed sex=-ind(1) if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
F statistic: 6982.245
Model degrees of freedom: 2
Residual degrees of freedom: 69302
R-squared: 0.168
Adjusted R-squared: 0.168
Root MSE 28961.412
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3344.693** 36.582
sex
2 Female -16356.280** 220.128
3 _cons -9645.429** 514.180
---------------------------------------------------------------------------------
* p < .05
** p < .01
. summarize incwage if age>24 & age<65
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 69305 26602.35 31745.03 0 364302
. desmat: regress incwage @yrsed sex=-ind(2) if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
F statistic: 6982.245
Model degrees of freedom: 2
Residual degrees of freedom: 69302
R-squared: 0.168
Adjusted R-squared: 0.168
Root MSE 28961.412
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3344.693** 36.582
sex
2 Male 16356.280** 220.128
3 _cons -26001.709** 511.461
---------------------------------------------------------------------------------
* p < .05
** p < .01
* The coefficient for yrsed doesn't care what the excluded category of gender is.
. desmat: regress incwage @yrsed sex=-ind(2) [aweight= perwt_rounded] if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
aweight: perwt_rounded
F statistic: 7165.374
Model degrees of freedom: 2
Residual degrees of freedom: 69302
R-squared: 0.171
Adjusted R-squared: 0.171
Root MSE 29527.875
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3594.318** 38.604
sex
2 Male 16742.425** 224.386
3 _cons -29212.258** 543.215
---------------------------------------------------------------------------------
* p < .05
** p < .01
* Aweights give us slightly different coefficients, slightly different standard errors, therefore slightly different T-statistics. Also, R-square is a little different.
. desmat: regress incwage @yrsed sex=-ind(2) [fweight= perwt_rounded] if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 142609350
fweight: perwt_rounded
F statistic: 14744875.139
Model degrees of freedom: 2
Residual degrees of freedom: 142609347
R-squared: 0.171
Adjusted R-squared: 0.171
Root MSE 29527.237
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3594.318** 0.851
sex
2 Male 16742.425** 4.946
3 _cons -29212.258** 11.975
---------------------------------------------------------------------------------
* p < .05
** p < .01
* perwt does not change our R-square, or our coefficients from the aweight case, but it does drop the standard errors dramatically, which would thus increase our T-statistics dramatically. And of course the number of observations is multiplied by approximately 2000.
. desmat: regress incwage @yrsed sex=-ind(2) union_adj if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 10833
F statistic: 770.079
Model degrees of freedom: 3
Residual degrees of freedom: 10829
R-squared: 0.176
Adjusted R-squared: 0.176
Root MSE 27683.913
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3710.291** 98.433
sex
2 Male 16460.064** 533.973
union_adj
3 Union 1544.136* 722.263
4 _cons -24829.410** 1422.855
---------------------------------------------------------------------------------
* p < .05
** p < .01
* Again, putting union in reduces sample size dramatically.
. desmat: regress incwage @yrsed sex=-ind(2) if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
F statistic: 6982.245
Model degrees of freedom: 2
Residual degrees of freedom: 69302
R-squared: 0.168
Adjusted R-squared: 0.168
Root MSE 28961.412
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3344.693** 36.582
sex
2 Male 16356.280** 220.128
3 _cons -26001.709** 511.461
---------------------------------------------------------------------------------
* p < .05
** p < .01
*class ended here.
. desmat: regress incwage @yrsed sex=-ind(2) union if age>24 & age<65
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
F statistic: 2953.313
Model degrees of freedom: 5
Residual degrees of freedom: 69299
R-squared: 0.176
Adjusted R-squared: 0.176
Root MSE 28823.438
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
1 based on educrec 3283.470** 36.491
sex
2 Male 16210.792** 219.236
union
3 No union coverage 7519.901** 325.953
4 Member of labor union 9115.449** 697.180
5 Covered by union but not a member 3646.325 2146.293
6 _cons -26339.022** 509.243
---------------------------------------------------------------------------------
* p < .05
** p < .01
* But if we don't properly account for the missing values of union, that is we treat the NIU codes as just another union classification, we would incorrectly get the same sample size…
. tabulate union
Union membership | Freq. Percent Cum.
----------------------------------+-----------------------------------
NIU | 120,249 89.93 89.93
No union coverage | 11,383 8.51 98.45
Member of labor union | 1,883 1.41 99.85
Covered by union but not a member | 195 0.15 100.00
----------------------------------+-----------------------------------
Total | 133,710 100.00
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_pr
> oj3\fall_2010_s381_logs\class9.log
log type: text
closed on: 19 Oct 2010, 16:12:08
-------------------------------------------------------------------------------------------------