class 4

--------------------------------------------------------------------------

name: <unnamed>

log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fal

> l_2013_381_logs\class4.log

log type: text

opened on: 3 Oct 2013, 13:37:20

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)

----------+---------------------------------------------------

Male | 9,027 13.31212 2.967666 .0312351

Female | 9,511 13.55657 2.854472 .0292693

--------------------------------------------------------------

. display 2.967666/(sqrt(9027))

.03123513

* Note that the standard error of the mean is just the standard deviation divided by the square root of N.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0428057 -.32835 -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7106

Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

* There are two t-tests, the equal and the unequal. You use the option “unequal” to get the unequal variance t-test, otherwise Stata gives you the equal variance t-test. And note: in this case, the test statistic and the degrees of freedom are almost identical, because the underlying variance of men’s and women’s educations, and the N of the two samples, is so similar.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

. display ttail(18536,-5.7164)

.99999999

* the stata function ttail(df, t) gives you the right hand tail probability, which in this case is the probability of all values larger than -5.7164. If you want the tail probability, you need 1-P, and if you want the two tail probability, you need 2*(1-P). But note that if we had done women compared to men, rather than men compared to women, we would have had a positive 5.7164 statistic, and we wouldn’t have had to do the “one minus P” part.

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

* In order to generate a regression version of the above t-test, we need first to generate a 0-1 dummy variable for gender.

. tabulate sex

Sex | Freq. Percent Cum.

------------+-----------------------------------

Male | 64,791 48.46 48.46

Female | 68,919 51.54 100.00

------------+-----------------------------------

Total | 133,710 100.00

. codebook sex

-----------------------------------------------------------------------------------

sex Sex

-----------------------------------------------------------------------------------

type: numeric (byte)

label: sexlbl

range: [1,2] units: 1

unique values: 2 missing .: 0/133710

tabulation: Freq. Numeric Label

64791 1 Male

68919 2 Female

. gen byte female=0

. replace female=1 if sex==2

(68919 real changes made)

. tabulate sex female

| female

Sex | 0 1 | Total

-----------+----------------------+----------

Male | 64,791 0 | 64,791

Female | 0 68,919 | 68,919

-----------+----------------------+----------

Total | 64,791 68,919 | 133,710

. regress yrsed female if age>=25&age<=34

Source | SS df MS Number of obs = 18538

-------------+------------------------------ F( 1, 18536) = 32.68

Model | 276.742433 1 276.742433 Prob > F = 0.0000

Residual | 156979.922 18536 8.46892111 R-squared = 0.0018

-------------+------------------------------ Adj R-squared = 0.0017

Total | 157256.664 18537 8.48339343 Root MSE = 2.9101

------------------------------------------------------------------------------

yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

female | .2444469 .0427623 5.72 0.000 .1606289 .3282649

_cons | 13.31212 .0306297 434.62 0.000 13.25208 13.37216

------------------------------------------------------------------------------

* The t-test for female here is identical to the equal variance t-test above (check the standard error of the estimate to be sure it matches exactly, since T= coeff/Std Error. The null hypothesis is that gender does not influence years of education, the T-statistic allows us to reject that null hypothesis because the probability associated with that test is very small, about 1 in 100 million. In other words, if men and women in the US had the same levels of education, the chance of getting a difference this large (0.244) just by chance in a sample this big is 1 in 100 million. Since that chance is small, we reject the null hypothesis.

* And note that in these regression results we have a second test, the test of the constant term (t=434). The null hypothesis of the second test is that the constant is zero. Since the constant here is men’s average education, that second hull hypothesis is a dopey one we are happy to reject.

. display 2*(ttail(18356, 5.7164))

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

* The normal 2 tail probability associated with 5.7164 is a tiny bit smaller than the T- probability. T-distribution with 18000 df is very close to Normal, but not exactly the same.

. display invnormal(1-.025)

1.959964

* The key value of the normal distribution is 1.96, that is the value at which the tail distribution has P=0.25, meaning two tails yield P=5%. Anything that is less than 5% likely we deem (arbitrarily) to be too unlikely to have happened by chance.

* T-statistics that yield the same tail probability are always larger than the Normal statistic, but the difference only matters for very small N.

. display invttail(2, 0.025)

4.3026527

. display invttail(10, 0.025)

2.2281389

. display invttail(25, 0.025)

2.0595386

. display invttail(1000, 0.025)

1.9623391

. log close

name: <unnamed>

log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38

> 1_logs\class4.log

log type: text

closed on: 3 Oct 2013, 15:56:06

-----------------------------------------------------------------------------------