class 4

------------------------------------------------------------------------------------------------------------------

name: <unnamed>

log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381_logs\class4.log

log type: text

opened on: 4 Oct 2012, 12:10:27

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)

----------+---------------------------------------------------

Male | 9,027 13.31212 2.967666 .0312351

Female | 9,511 13.55657 2.854472 .0292693

--------------------------------------------------------------

*note that the standard error of the mean is the SD/(sqrt(N))

. display 2.967666/(sqrt(9027))

.03123513

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

* So, this T-value of -5.7164, is it big? What is the probability of getting a value this far away from zero by chance?

. display ttail(18356, -5.7164)

.99999999

* ttail gives the cumulative distribution value from t to positive infinity. If we want the tail, we have to subtract this from one. For the syntax of ttail, invttail, invnormal and normal, look up Stata help.

. display 1-ttail(18356, -5.7164)

5.525e-09

. display 2*(1-ttail(18356, -5.7164))

1.105e-08

* we usually care about both tails at the same time, that is we make no prior assumptions about which group should be larger, and we are interested in outliers in either direction. So we take P value of one tail, and multiply by 2.

. gen byte female=0

. replace female=1 if sex==2

(68919 real changes made)

. label define female 0 "male" 1 "female"

label female already defined

r(110);

. label val female female

. tabulate sex female

| female

Sex | male female | Total

-----------+----------------------+----------

Male | 64,791 0 | 64,791

Female | 0 68,919 | 68,919

-----------+----------------------+----------

Total | 64,791 68,919 | 133,710

. tabulate sex female, nolab

| female

Sex | 0 1 | Total

-----------+----------------------+----------

1 | 64,791 0 | 64,791

2 | 0 68,919 | 68,919

-----------+----------------------+----------

Total | 64,791 68,919 | 133,710

* We need to generate this new 0-1 dummy variable in order to put the variable into the regression.

. regress yrsed female if age>=25 & age <=34

Source | SS df MS Number of obs = 18538

-------------+------------------------------ F( 1, 18536) = 32.68

Model | 276.742433 1 276.742433 Prob > F = 0.0000

Residual | 156979.922 18536 8.46892111 R-squared = 0.0018

-------------+------------------------------ Adj R-squared = 0.0017

Total | 157256.664 18537 8.48339343 Root MSE = 2.9101

------------------------------------------------------------------------------

yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

female | .2444469 .0427623 5.72 0.000 .1606289 .3282649

_cons | 13.31212 .0306297 434.62 0.000 13.25208 13.37216

* Notice that the coefficient, std error, and t-statistic are the same in the regression as in the (equal variance) t-test. The Null hypothesis, H0 for the female coefficient is that the female coefficient is zero (meaning the means of men and women are the same). This null hypothesis is rejected because the P value associated with this test is less than 0.05. Note also that there is a test for the constant. What is the null hypothesis of this second test? The null hypothesis is that men’s average education in the US is zero, which is of course a ridiculous null hypothesis, and one which the data reject very dramatically.

. display 2*ttail(18356, 5.7164)

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

* The T and Normal values for 2-tail tests when the statistics are 5.7164 are very similar because the T-distribution with df>25 is very similar to the Normal distribution (and with df=18000, it is very very similar).

. display invnormal(1-.025)

1.959964

* What values of df do we need to get in order for the critical value of the T-distribution to approach the critical value from the normal distribution? With df=10, it is already reasonably close. Note that invttail syntax is (df, statistic corresponding to this right hand tail probability).

. display invttail(2, .025)

4.3026527

. display invttail(10, .025)

2.2281389

. display invttail(100, .025)

1.9839715

. display invttail(1000, .025)

1.9623391

. display invttail(18000, .025)

1.9600958

. log close

name: <unnamed>

log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2012_381_logs\c

> lass4.log

log type: text

closed on: 4 Oct 2012, 15:44:47

-------------------------------------------------------------------------------------------