---------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class5.log
log type: text
opened on: 24 Jan 2013, 12:00:36
. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. table sex if age>24 & age<35, contents(freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 9,027 13.31212 2.967666 .0312351
Female | 9,511 13.55657 2.854472 .0292693
--------------------------------------------------------------
* Reviewing the mean, sd, and standard error of yrsed by gender.
. display 2.967666/(sqrt(9027))
.03123513
* Remember that standard error of the mean is a simple function of sd/sqrt(n).
. table sex if age>24 & age<35, contents(freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 9,027 13.31212 2.967666 .0312351
Female | 9,511 13.55657 2.854472 .0292693
--------------------------------------------------------------
. table sex if age>24 & age<35 [aweight=perwt_rounded], contents(freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 9,027 13.5574 2.819247 .029673
Female | 9,511 13.76295 2.720855 .0278992
--------------------------------------------------------------
* When we apply the weights the sample size is unchanged because aweights, also known as analytic weights rescales the weights. But the weighted data have somewhat different mean and somewhat different sd, and therefore somewhat different standard error, because the weights put more emphasis on some observations than on others.
. table sex if age>24 & age<35 [fweight=perwt_rounded], contents(freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 1.86e+07 13.5574 2.819091 .0006543
Female | 1.92e+07 13.76295 2.720712 .0006205
--------------------------------------------------------------
* When we apply fweights, we are telling stata that each observation really counts for 2000 observations. This means that our sample size goes up dramatically (by a factor of 2000 compared to the aweight version above), but the mean and sd are the same. The standard error is reduced by a factor of sqrt(2000), or about 42. Note the way that mean and sd are not functions of sample size, but standard error is.
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Back to our favorite t-test. How do we interpret the t-statistic of -5.7164? What probability do we attach to this statistic? The t-test reports a probability, which is the middle probability above, Pr(|T| > |t|) = 0.0000 , which corresponds to a 2-tail test. But the t-test output does not give us enough digits to quantify the probability. So, let’s ask Stata to quantify it for us.
. display normal(5.716)
.99999999
* That is the normal left-cumulative probability, we want the right, and both tails, so:
. display 2*(1-normal(5.716))
1.091e-08
. display ttail(18536,5.7164)
5.524e-09
* ttail gives the right hand cumulative probability to start, that is the probability from 5.7164 to infinity, which is the relevant tail for us.
. display 2*ttail(18536,5.7164)
1.105e-08
* and that, 1-in-100 million is our two-tailed T probability of finding a statistic this large by chance if the null hypothesis were true. So we reject the null hypothesis.
*Finding the key values of the normal distribution:
. display invnormal(1-.025)
1.959964
*and comparing to the t-distribution, as df increases the T becomes more Normal.
. display invttail(10, .025)
2.2281389
. display invttail(100, .025)
1.9839715
. display invttail(20000, .025)
1.9600826
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. tabulate sex, nolab
Sex | Freq. Percent Cum.
------------+-----------------------------------
1 | 64,791 48.46 48.46
2 | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
*now I am going to generate a dummy variable for male, to use in regression.
. gen byte male=0
. replace male=1 if sex==1
(64791 real changes made)
. tabulate sex male
| male
Sex | 0 1 | Total
-----------+----------------------+----------
Male | 0 64,791 | 64,791
Female | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
*Note that the t-statistic produced by the (equal variance) t-test and the t-statistic produced by regression are the same. Regression is just a generalization of the t-test.
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
* aweighted regression yields a different coefficient and t-statistic, but they are of the same order of magnitude. Aweight is one way of applying the weights but making sure that the standard errors reflect the actual sample size you have.
. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]
Source | SS df MS Number of obs =37785945
-------------+------------------------------ F( 1,37785943) =52018.00
Model | 398979.047 1 398979.047 Prob > F = 0.0000
Residual | 28981891037785943 7.67001924 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0014
Total | 29021788937785944 7.68057796 Root MSE = 2.7695
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0009012 -228.07 0.000 -.2073109 -.2037782
_cons | 13.76294 .0006317 2.2e+04 0.000 13.76171 13.76418
------------------------------------------------------------------------------
* fweighted regression increases the sample size by a factor of 2000, and increases the t-statistic by a factor of sqrt(2000), or about 42.
. table occ1990 if occ1990==95| occ1990==125| occ1990==178, contents (freq mean inctot sd inctot)
----------------------------------------------------------------
Occupation, 1990 |
basis | Freq. mean(inctot) sd(inctot)
----------------------+-----------------------------------------
Registered nurses | 966 40787.1677 22941.43
Sociology instructors | 6 44363.33333 6497.989
Lawyers | 441 99242.58277 71860.66
----------------------------------------------------------------
. graph box inctot if occ1990==95| occ1990==125| occ1990==178, over( occ1990)
* a quick look at how to do make box plots.
* one way to do ttests testing for differences in some variable between two occupations:
. ttest yrsed if occ1990==95| occ1990==125, by(occ1990)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Register | 966 15.54762 .0516706 1.605951 15.44622 15.64902
Sociolog | 6 17 0 0 17 17
---------+--------------------------------------------------------------------
combined | 972 15.55658 .0514811 1.605022 15.45556 15.65761
---------+--------------------------------------------------------------------
diff | -1.452381 .6559623 -2.73965 -.1651122
------------------------------------------------------------------------------
diff = mean(Register) - mean(Sociolog) t = -2.2141
Ho: diff = 0 degrees of freedom = 970
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0135 Pr(|T| > |t|) = 0.0271 Pr(T > t) = 0.9865
* generate a dummy variable for nurses.
. gen nurses=0
. replace nurses=1 if occ1990==95
(966 real changes made)
. regress yrsed nurses if occ1990==95| occ1990==125
Source | SS df MS Number of obs = 972
-------------+------------------------------ F( 1, 970) = 4.90
Model | 12.5783363 1 12.5783363 Prob > F = 0.0271
Residual | 2488.80952 970 2.56578301 R-squared = 0.0050
-------------+------------------------------ Adj R-squared = 0.0040
Total | 2501.38786 971 2.5760946 Root MSE = 1.6018
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nurses | -1.452381 .6559623 -2.21 0.027 -2.73965 -.1651122
_cons | 17 .6539346 26.00 0.000 15.71671 18.28329
------------------------------------------------------------------------------
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
* What if we changed the units of our variables? What if instead of years of education, we had months?
. gen monthsed=yrsed*12
(30484 missing values generated)
. regress monthsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 39850.9104 1 39850.9104 Prob > F = 0.0000
Residual | 22605108.7 18536 1219.52464 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 22644959.6 18537 1221.60865 Root MSE = 34.922
------------------------------------------------------------------------------
monthsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -2.933363 .5131471 -5.72 0.000 -3.939178 -1.927547
_cons | 162.6788 .3580818 454.31 0.000 161.9769 163.3807
------------------------------------------------------------------------------
* note that change of scale effects our coefficient and standard error (which are in the units of whatever the dependent variable are in), but the t-statistic is unchanged, because the t-statistic is unit free.
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\cl
> ass5.log
log type: text
closed on: 24 Jan 2013, 15:54:55
------------------------------------------------------------------------------------------