--------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fal

> l_2013_381_logs\class4.log

  log type:  text

 opened on:   3 Oct 2013, 13:37:20

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

 

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

 

--------------------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

     Male |       9,027     13.31212     2.967666     .0312351

   Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

 

. display 2.967666/(sqrt(9027))

.03123513

 

* Note that the standard error of the mean is just the standard deviation divided by the square root of N.

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* There are two t-tests, the equal and the unequal. You use the option “unequal” to get the unequal variance t-test, otherwise Stata gives you the equal variance t-test. And note: in this case, the test statistic and the degrees of freedom are almost identical, because the underlying variance of men’s and women’s educations, and the N of the two samples, is so similar.

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. display ttail(18536,-5.7164)

.99999999

 

* the stata function ttail(df, t)  gives you the right hand tail probability, which in this case is the probability of all values larger than -5.7164. If you want the tail probability, you need 1-P, and if you want the two tail probability, you need 2*(1-P). But note that if we had done women compared to men, rather than men compared to women, we would have had a positive 5.7164 statistic, and we wouldn’t have had to do the “one minus P” part.

 

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

 

* In order to generate a regression version of the above t-test, we need first to generate a 0-1 dummy variable for gender.

 

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. codebook sex

 

-----------------------------------------------------------------------------------

sex                                                                             Sex

-----------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  sexlbl

 

                 range:  [1,2]                        units:  1

         unique values:  2                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                         64791         1  Male

                         68919         2  Female

 

. gen byte female=0

 

. replace female=1 if sex==2

(68919 real changes made)

 

. tabulate sex female

 

           |        female

       Sex |         0          1 |     Total

-----------+----------------------+----------

      Male |    64,791          0 |    64,791

    Female |         0     68,919 |    68,919

-----------+----------------------+----------

     Total |    64,791     68,919 |   133,710

 

 

. regress yrsed female if age>=25&age<=34

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

       _cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

 

* The t-test for female here is identical to the equal variance t-test above (check the standard error of the estimate to be sure it matches exactly, since T= coeff/Std Error. The null hypothesis is that gender does not influence years of education, the T-statistic allows us to reject that null hypothesis because the probability associated with that test is very small, about 1 in 100 million. In other words, if men and women in the US had the same levels of education, the chance of getting a difference this large (0.244) just by chance in a sample this big is 1 in 100 million. Since that chance is small, we reject the null hypothesis.

 

* And note that in these regression results we have a second test, the test of the constant term (t=434). The null hypothesis of the second test is that the constant is zero. Since the constant here is men’s average education, that second hull hypothesis is a dopey one we are happy to reject.

 

. display 2*(ttail(18356, 5.7164))

1.105e-08

 

. display 2*(1-normal(5.7164))

1.088e-08

 

* The normal 2 tail probability associated with 5.7164 is a tiny bit smaller than the T- probability. T-distribution with 18000 df is very close to Normal, but not exactly the same.

 

. display invnormal(1-.025)

1.959964

 

* The key value of the normal distribution is 1.96, that is the value at which the tail distribution has P=0.25, meaning two tails yield P=5%. Anything that is less than 5% likely we deem (arbitrarily) to be too unlikely to have happened by chance.

 

* T-statistics that yield the same tail probability are always larger than the Normal statistic, but the difference only matters for very small N.

 

. display invttail(2, 0.025)

4.3026527

 

. display invttail(10, 0.025)

2.2281389

 

. display invttail(25, 0.025)

2.0595386

 

. display invttail(1000, 0.025)

1.9623391

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38

> 1_logs\class4.log

  log type:  text

 closed on:   3 Oct 2013, 15:56:06

-----------------------------------------------------------------------------------