------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class4.log

  log type:  text

 opened on:  22 Jan 2013, 13:38:39

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

 

. table sex if age>24 & age<35, contents(freq mean yrsed sd yrsed)

 

-------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)

----------+--------------------------------------

     Male |       9,027     13.31212     2.967666

   Female |       9,511     13.55657     2.854472

-------------------------------------------------

 

*let’s refresh our memories about the young men and young women’s educational attainment from the CPS.

 

. ttest yrsed if age>24 & age<35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* The fairly moderate difference between men and women’s education, 0.244 years, is associated with a T-statistic of -5.7164. I allege that this T-statistic is far from zero, too far to occur by chance, but in order to understand why we need to spend some time talking about the T-distribution, which we did (see Freedman’s tables, see my notes on the mean, and see my Excel sheet).

 

. ttest yrsed if age>24 & age<35, by(sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* In my notes and in the excel sheet and in HW2 we talk about two different t-tests, the equal variance t-test (which is what Stata produces by default), and the unequal variance t-test, which stata will produce if you ask for it by name. The unequal variance t-test has the same difference of means, but a slightly different std error of the mean and a slightly different number of degrees of freedom than the equal variance t-test. In this case, the two statistics are almost exactly the same, because men and womens’ educations have almost the exact same variance (note the sample standard deviations above), so it doesn’t matter whether we make the assumption that the two samples have the same variance, because they happen to have almost the same variance. In other cases, where the samples have very different variances, the two tests can produce substantively different results.

 

. display ttail(18000, 5.7164)

5.526e-09

 

* One key thing to know is that the Stata function ttail gives you the right hand cumulative tail probability of the t-distribution, meaning the probability of all values higher than 5.7164. The way to know this about the stata function is to look it up (help function). The tail of the t-distribution in this case has a probability of about 5 parts in a billion, which is a small probability.

 

. display ttail(18536, 5.7164)

5.524e-09

 

* For large df, the t-distribution is basically the same, but it does change a little bit, and there is no reason not to put the currect df here.

 

. display 2*ttail(18536, 5.7164)

1.105e-08

 

* We usually want to double the tail probability in order to get the probability of two equal tails added together, that is the probability of being this far from the mean in either direction.

 

. ttest yrsed if age>24 & age<35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

 

. display ttail(18536, -5.7164)

.99999999

 

* Given that ttail gives the right hand probability, and given that the T-statistic we got here was negative (because of the arbitrary choice to put the men first, and have the mean be men-women rather than women-men, we wouldn’t want to use the ttail on the actual negative t-statistic that we got, because that would give us not the left hand tail, but the cumulative probability above the left hand tail. In order to get the tail, we would have to subtract this from one (to get the left tail probability), and then multiply by 2 (to get the two tail test probability.

 

. display 1-ttail(18536, -5.7164)

5.524e-09

 

. display 2*(1-ttail(18536, -5.7164))

1.105e-08

 

*Which is the same as we got above, calculating the right hand probability with the positive value of the statistic.

 

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2

> 013\class4.log

  log type:  text

 closed on:  22 Jan 2013, 15:47:22

------------------------------------------------------------------------------------