-----------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38
> 1_logs\class5.log
log type: text
opened on: 8 Oct 2013, 13:37:43
. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear
. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.31212 2.967666 9,027
Female | 13.55657 2.854472 9,511
-------------------------------------------------
* Our old friend, the young men and women’s table of average educational attainments.
. table sex if age>24 & age<35, contents (mean yrsed sd yrsed semean yrsed freq)
--------------------------------------------------------------
Sex | mean(yrsed) sd(yrsed) sem(yrsed) Freq.
----------+---------------------------------------------------
Male | 13.31212 2.967666 .0312351 9,027
Female | 13.55657 2.854472 .0292693 9,511
--------------------------------------------------------------
* With Standard error of the mean, which you recall is semean=sd/sqrt(n)
. ttest yrsed if age>24 & age<35, by (sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Here are the equal(above) and unequal (below) versions of the ttest, very similar in outcome because the standard errors of the means of the educations of men and women are so similar…
. ttest yrsed if age>24 & age<35, by (sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
*First check: change of scale of outcome variable.
. gen months_ed=yrsed*12
(30484 missing values generated)
. ttest months_ed if age>24 & age<35, by (sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 159.7454 .3748215 35.61199 159.0107 160.4802
Female | 9511 162.6788 .3512319 34.25366 161.9903 163.3673
---------+--------------------------------------------------------------------
combined | 18538 161.2504 .2567052 34.95152 160.7472 161.7536
---------+--------------------------------------------------------------------
diff | -2.933363 .5131471 -3.939178 -1.927547
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Under change of scale, mean and std error are changed (because they are in the units of Y, whatever Y is), but the t-statistic remains the same, because t statistic is unit-free.
. gen byte male=0
. replace male=1 if sex==1
(64791 real changes made)
* generate a new dummy variable for male, which we will use in our regressions.
. tabulate sex male
| male
Sex | 0 1 | Total
-----------+----------------------+----------
Male | 0 64,791 | 64,791
Female | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
. regress months_ed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 39850.9104 1 39850.9104 Prob > F = 0.0000
Residual | 22605108.7 18536 1219.52464 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 22644959.6 18537 1221.60865 Root MSE = 34.922
------------------------------------------------------------------------------
months_ed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -2.933363 .5131471 -5.72 0.000 -3.939178 -1.927547
_cons | 162.6788 .3580818 454.31 0.000 161.9769 163.3807
------------------------------------------------------------------------------
. table sex if age>24 & age<35, contents (mean yrsed sd yrsed semean yrsed freq)
--------------------------------------------------------------
Sex | mean(yrsed) sd(yrsed) sem(yrsed) Freq.
----------+---------------------------------------------------
Male | 13.31212 2.967666 .0312351 9,027
Female | 13.55657 2.854472 .0292693 9,511
--------------------------------------------------------------
. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed semean yrsed freq)
--------------------------------------------------------------
Sex | mean(yrsed) sd(yrsed) sem(yrsed) Freq.
----------+---------------------------------------------------
Male | 13.5574 2.819247 .029673 9,027
Female | 13.76295 2.720855 .0278992 9,511
--------------------------------------------------------------
* aweighted data has similar (but not exactly the same) mean and sd, and weighted N exactly the same as unweighted N, because aweights (or “analytical weights”) rescale the weights so that the average weight is 1, in order to leave sample size unchanged.
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
* aweighted regression is similar to unweighted regression, but not exactly the same (because the application of the weights makes some cases relatively more important, and some cases less important…
. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]
Source | SS df MS Number of obs =37785945
-------------+------------------------------ F( 1,37785943) =52018.00
Model | 398979.047 1 398979.047 Prob > F = 0.0000
Residual | 28981891037785943 7.67001924 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0014
Total | 29021788937785944 7.68057796 Root MSE = 2.7695
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0009012 -228.07 0.000 -.2073109 -.2037782
_cons | 13.76294 .0006317 2.2e+04 0.000 13.76171 13.76418
------------------------------------------------------------------------------
* regression with fweights has the same coefficients as the aweighted regression, but has sample size increased by a factor of about 2000, meaning t-statistic increased by a factor of sqrt(2000), or about 43 times, to -228. The key thing to know about this is that the fweighted regression produces a wildly unrealistically large t-statistic, because here we are pretending that we really have 37 million young people in our sample, instead of the 18 thousand we really do have. Fweights are useful and correct for some applications (we use them with the CPS to generate national totals), but used in this way, the fweighted regression is wrong and misleading.
. gen random_uniform=uniform()
. summarize random_uniform
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
random_uni~m | 133710 .5006203 .2884588 .0000219 .9999971
. display 0.2884588^2
.08320848
* Just to recall, we proved earlier that the mean of the uniform distribution would be 0.5, and the variance would be 1/12, nice to see that both are still true.
* We cannot legitimately increase our sample size, but we can decrease the sample size arbitrarily.
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
. regress yrsed male if age>24 & age<35 & random_uniform <=0.25 [aweight= perwt_rounded]
(sum of wgt is 9.2846e+06)
Source | SS df MS Number of obs = 4578
-------------+------------------------------ F( 1, 4576) = 4.52
Model | 34.4468815 1 34.4468815 Prob > F = 0.0336
Residual | 34890.4634 4576 7.62466419 R-squared = 0.0010
-------------+------------------------------ Adj R-squared = 0.0008
Total | 34924.9102 4577 7.63052441 Root MSE = 2.7613
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.1735029 .0816285 -2.13 0.034 -.3335342 -.0134715
_cons | 13.72653 .057329 239.43 0.000 13.61413 13.83892
------------------------------------------------------------------------------
* If we arbitrarily limit ourselves to ¼ of the data in the CPS, we expect the T-statistic to be half as large, but since this is a random sub-sample, it can be bigger or small than we expect it to be.
. regress yrsed male if age>24 & age<35 & random_uniform >=0.75 [aweight= perwt_rounded]
(sum of wgt is 9.6623e+06)
Source | SS df MS Number of obs = 4719
-------------+------------------------------ F( 1, 4717) = 11.48
Model | 87.7286053 1 87.7286053 Prob > F = 0.0007
Residual | 36055.7534 4717 7.64378914 R-squared = 0.0024
-------------+------------------------------ Adj R-squared = 0.0022
Total | 36143.482 4718 7.66076345 Root MSE = 2.7647
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2727883 .0805211 -3.39 0.001 -.4306472 -.1149294
_cons | 13.8423 .0561835 246.38 0.000 13.73216 13.95245
------------------------------------------------------------------------------
* And here is a different random ¼ sample, note that the results are somewhat different than the previous, but would still lead to the same substantive answer (that young women in the US have significantly more education than young men).
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38
> 1_logs\class5.log
log type: text
closed on: 8 Oct 2013, 16:03:12
-----------------------------------------------------------------------------------