------------------------------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381_logs\class4.log

log type:  text

opened on:   4 Oct 2012, 12:10:27

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

Male |       9,027     13.31212     2.967666     .0312351

Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

*note that the standard error of the mean is the SD/(sqrt(N))

. display 2.967666/(sqrt(9027))

.03123513

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* So, this T-value of -5.7164, is it big? What is the probability of getting a value this far away from zero by chance?

. display ttail(18356, -5.7164)

.99999999

* ttail gives the cumulative distribution value from t to positive infinity. If we want the tail, we have to subtract this from one. For the syntax of ttail, invttail, invnormal and normal, look up Stata help.

. display 1-ttail(18356, -5.7164)

5.525e-09

. display 2*(1-ttail(18356, -5.7164))

1.105e-08

* we usually care about both tails at the same time, that is we make no prior assumptions about which group should be larger, and we are interested in outliers in either direction. So we take P value of one tail, and multiply by 2.

. gen byte female=0

. replace female=1 if sex==2

(68919 real changes made)

. label define female 0 "male" 1 "female"

label female already defined

r(110);

. label val female female

. tabulate sex female

|        female

Sex |      male     female |     Total

-----------+----------------------+----------

Male |    64,791          0 |    64,791

Female |         0     68,919 |    68,919

-----------+----------------------+----------

Total |    64,791     68,919 |   133,710

. tabulate sex female, nolab

|        female

Sex |         0          1 |     Total

-----------+----------------------+----------

1 |    64,791          0 |    64,791

2 |         0     68,919 |    68,919

-----------+----------------------+----------

Total |    64,791     68,919 |   133,710

* We need to generate this new 0-1 dummy variable in order to put the variable into the regression.

. regress yrsed female if age>=25 & age <=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

* Notice that the coefficient, std error, and t-statistic are the same in the regression as in the (equal variance) t-test. The Null hypothesis, H0 for the female coefficient is that the female coefficient is zero (meaning the means of men and women are the same). This null hypothesis is rejected because the P value associated with this test is less than 0.05. Note also that there is a test for the constant. What is the null hypothesis of this second test? The null hypothesis is that men’s average education in the US is zero, which is of course a ridiculous null hypothesis, and one which the data reject very dramatically.

. display 2*ttail(18356, 5.7164)

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

* The T and Normal values for 2-tail tests when the statistics are 5.7164 are very similar because the T-distribution with df>25 is very similar to the Normal distribution (and with df=18000, it is very very similar).

. display invnormal(1-.025)

1.959964

* What values of df do we need to get in order for the critical value of the T-distribution to approach the critical value from the normal distribution? With df=10, it is already reasonably close. Note that invttail syntax is (df, statistic corresponding to this right hand tail probability).

. display invttail(2, .025)

4.3026527

. display invttail(10, .025)

2.2281389

. display invttail(100, .025)

1.9839715

. display invttail(1000, .025)

1.9623391

. display invttail(18000, .025)

1.9600958

. log close

name:  <unnamed>

log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381_logs\c

> lass4.log

log type:  text

closed on:   4 Oct 2012, 15:44:47

-------------------------------------------------------------------------------------------