class 4

------------------------------------------------------------------------------------

name: <unnamed>

log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class4.log

log type: text

opened on: 22 Jan 2013, 13:38:39

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>24 & age<35, contents(freq mean yrsed sd yrsed)

-------------------------------------------------

Sex | Freq. mean(yrsed) sd(yrsed)

----------+--------------------------------------

Male | 9,027 13.31212 2.967666

Female | 9,511 13.55657 2.854472

-------------------------------------------------

*let’s refresh our memories about the young men and young women’s educational attainment from the CPS.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

* The fairly moderate difference between men and women’s education, 0.244 years, is associated with a T-statistic of -5.7164. I allege that this T-statistic is far from zero, too far to occur by chance, but in order to understand why we need to spend some time talking about the T-distribution, which we did (see Freedman’s tables, see my notes on the mean, and see my Excel sheet).

. ttest yrsed if age>24 & age<35, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0428057 -.32835 -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7106

Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

* In my notes and in the excel sheet and in HW2 we talk about two different t-tests, the equal variance t-test (which is what Stata produces by default), and the unequal variance t-test, which stata will produce if you ask for it by name. The unequal variance t-test has the same difference of means, but a slightly different std error of the mean and a slightly different number of degrees of freedom than the equal variance t-test. In this case, the two statistics are almost exactly the same, because men and womens’ educations have almost the exact same variance (note the sample standard deviations above), so it doesn’t matter whether we make the assumption that the two samples have the same variance, because they happen to have almost the same variance. In other cases, where the samples have very different variances, the two tests can produce substantively different results.

. display ttail(18000, 5.7164)

5.526e-09

* One key thing to know is that the Stata function ttail gives you the right hand cumulative tail probability of the t-distribution, meaning the probability of all values higher than 5.7164. The way to know this about the stata function is to look it up (help function). The tail of the t-distribution in this case has a probability of about 5 parts in a billion, which is a small probability.

. display ttail(18536, 5.7164)

5.524e-09

* For large df, the t-distribution is basically the same, but it does change a little bit, and there is no reason not to put the currect df here.

. display 2*ttail(18536, 5.7164)

1.105e-08

* We usually want to double the tail probability in order to get the probability of two equal tails added together, that is the probability of being this far from the mean in either direction.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

. display ttail(18536, -5.7164)

.99999999

* Given that ttail gives the right hand probability, and given that the T-statistic we got here was negative (because of the arbitrary choice to put the men first, and have the mean be men-women rather than women-men, we wouldn’t want to use the ttail on the actual negative t-statistic that we got, because that would give us not the left hand tail, but the cumulative probability above the left hand tail. In order to get the tail, we would have to subtract this from one (to get the left tail probability), and then multiply by 2 (to get the two tail test probability.

. display 1-ttail(18536, -5.7164)

5.524e-09

. display 2*(1-ttail(18536, -5.7164))

1.105e-08

*Which is the same as we got above, calculating the right hand probability with the positive value of the statistic.

. log close

name: <unnamed>

log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2

> 013\class4.log

log type: text

closed on: 22 Jan 2013, 15:47:22

------------------------------------------------------------------------------------