-------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381_logs\c
> lass5.log
log type: text
opened on: 9 Oct 2012, 13:30:28
. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear
. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.31212 2.967666 9,027
Female | 13.55657 2.854472 9,511
* all the t-tests below and the regression coefficients and their t-statistics are based entirely on mean, SD, and N of 2 samples.
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. ttest yrsed if age>24 & age<35, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* for education, unequal and equal variance t-test are similar because the variances of the 2 subsamples are so similar to begin with.
. gen months_ed=yrsed*12
(30484 missing values generated)
. ttest months_ed if age>24 & age<35, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 159.7454 .3748215 35.61199 159.0107 160.4802
Female | 9511 162.6788 .3512319 34.25366 161.9903 163.3673
---------+--------------------------------------------------------------------
combined | 18538 161.2504 .2567052 34.95152 160.7472 161.7536
---------+--------------------------------------------------------------------
diff | -2.933363 .5136682 -3.9402 -1.926525
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
*note the effect of change of scale on mean and SD, but not on the T-statistic which is unit free.
. tabulate sex male
| male
Sex | 0 1 | Total
-----------+----------------------+----------
Male | 0 64,791 | 64,791
Female | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
* generate a dummy variable for gender.
. ttest months_ed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 159.7454 .3748215 35.61199 159.0107 160.4802
Female | 9511 162.6788 .3512319 34.25366 161.9903 163.3673
---------+--------------------------------------------------------------------
combined | 18538 161.2504 .2567052 34.95152 160.7472 161.7536
---------+--------------------------------------------------------------------
diff | -2.933363 .5131471 -3.939178 -1.927547
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* regression is the same as the equal variance t-test.
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.31212 2.967666 9,027
Female | 13.55657 2.854472 9,511
*Now weight by analytic weights, yielding the same sample size, but slightly different mean and SD
. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819247 9,027
Female | 13.76295 2.720855 9,511
-------------------------------------------------
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
* regression with aweights is similar, but not exactly the same as unweighted regression.
. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819247 9,027
Female | 13.76295 2.720855 9,511
-------------------------------------------------
* perwt gives the same mean as aweight, but multiplies the N by about 2000
. table sex if age>24 & age<35 [fweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819091 1.86e+07
Female | 13.76295 2.720712 1.92e+07
-------------------------------------------------
. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]
Source | SS df MS Number of obs =37785945
-------------+------------------------------ F( 1,37785943) =52018.00
Model | 398979.047 1 398979.047 Prob > F = 0.0000
Residual | 28981891037785943 7.67001924 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0014
Total | 29021788937785944 7.68057796 Root MSE = 2.7695
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0009012 -228.07 0.000 -.2073109 -.2037782
_cons | 13.76294 .0006317 2.2e+04 0.000 13.76171 13.76418
------------------------------------------------------------------------------
* fweighted regression yields a T statistic larger by sqrt(2000), or about 43 times larger, and totally unrealistic and unreasonable.
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819247 9,027
Female | 13.76295 2.720855 9,511
-------------------------------------------------
. gen random_uniform_2=uniform()
* generate a uniform random variable.
. summarize random_uniform_2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
random_uni~2 | 133710 .5006203 .2884588 .0000219 .9999971
* use that uniform random variable to reduce sample size to ¼ the prior size; note that means and SDs change a little bit, because of randomness.
. table sex if age>24 & age<35 & random_uniform_2 <=.25 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.55302 2.804218 2,248
Female | 13.72653 2.718835 2,330
-------------------------------------------------
. regress yrsed male if age>24 & age<35 & random_uniform_2<=.25 [aweight= perwt_rounded]
(sum of wgt is 9.2846e+06)
Source | SS df MS Number of obs = 4578
-------------+------------------------------ F( 1, 4576) = 4.52
Model | 34.4468815 1 34.4468815 Prob > F = 0.0336
Residual | 34890.4634 4576 7.62466419 R-squared = 0.0010
-------------+------------------------------ Adj R-squared = 0.0008
Total | 34924.9102 4577 7.63052441 Root MSE = 2.7613
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.1735029 .0816285 -2.13 0.034 -.3335342 -.0134715
_cons | 13.72653 .057329 239.43 0.000 13.61413 13.83892
------------------------------------------------------------------------------
* T-statistic roughly one half as large i.e., sqrt(1/4) times as large as before.
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381
> _logs\class5.log
log type: text
closed on: 9 Oct 2012, 15:47:14
------------------------------------------------------------------------------------