----------------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3

> \2011_180B_logs\class4.log

  log type:  text

 opened on:   3 Feb 2011, 13:22:48

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

 

* a familiar t-test:

. ttest yrsed if age >24 & age <35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* a quick review of what the 5.716 statistic means:

 

* This below gives us the cumulative normal distribution up to 5.716

. display normal(5.716)

.99999999

 

* This below takes the tail probability (i.e. 1-P above) and doubles it for the two-tail probability

. display 2*(1-normal(5.716))

1.091e-08

 

* invnormal starts with cumulative probability and yields a Z-score, or an x axis value. Reminder of what the key value is for Normal distribution that would yield a 2.5% tail value, meaning a 5% probability in the two tail test.

. display invnormal(1-.025)

1.959964

 

* invttail provides the t-statistic (like a Z-score) given (df, tail probability). Note that as df goes up, the value approaches 1.96, i.e. the t distribution becomes almost exactly like the Normal distribution. You can use Stata online help to pull up the definitions of functions like these. Try it!

. display invttail(20,.025)

2.0859634

 

. display invttail(100,.025)

1.9839715

 

. display invttail(20000, .025)

1.9600826

 

. table sex if age>24 & age<35, contents(freq mean yrsed sd yrsed)

 

-------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)

----------+--------------------------------------

     Male |       9,027     13.31212     2.967666

   Female |       9,511     13.55657     2.854472

-------------------------------------------------

 

* first without weights above, then below with frequency weights note that the sample size has been increased by a factor of about 2000, but the mean and sd are only changed a little.

 

. table sex if age>24 & age<35 [fweight= perwt_rounded], contents(freq mean yrsed sd yrsed)

 

-------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)

----------+--------------------------------------

     Male |    1.86e+07      13.5574     2.819091

   Female |    1.92e+07     13.76295     2.720712

-------------------------------------------------

 

*aweights, below, use the weights (so the mean is the same as with fweights) but aweights rescale the weights to average one, so that the sample size is the same as the unweighted sample size, which preserves important information.

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents(freq mean yrsed sd yrsed)

 

-------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)

----------+--------------------------------------

     Male |       9,027      13.5574     2.819247

   Female |       9,511     13.76295     2.720855

-------------------------------------------------

 

* Generating a new 0-1 dummy variable for female, so that we can run regressions with gender as a predictor.

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. tabulate sex, nolab

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |     64,791       48.46       48.46

          2 |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. gen female=0

 

. replace female=1 if sex==2

(68919 real changes made)

 

.

. label define female_lbl 0 "male" 1 "female"

 

. label val female female_lbl

 

. tabulate sex female

 

           |        female

       Sex |      male     female |     Total

-----------+----------------------+----------

      Male |    64,791          0 |    64,791

    Female |         0     68,919 |    68,919

-----------+----------------------+----------

     Total |    64,791     68,919 |   133,710

 

 

. tabulate sex female, nolab

 

           |        female

       Sex |         0          1 |     Total

-----------+----------------------+----------

         1 |    64,791          0 |    64,791

         2 |         0     68,919 |    68,919

-----------+----------------------+----------

     Total |    64,791     68,919 |   133,710

 

 

. ttest yrsed if age >24 & age <35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. regress yrsed female if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

       _cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

 

* Note that the regression above, a simple classic OLS regression, with gender predicting yrsed, gives us the same coefficient, the same standard error, and the same t-statistic as the (equal variance) t-test above.

 

. regress yrsed female [aweight= perwt_rounded] if age>24 & age<35

(sum of wgt is   3.7786e+07)

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

       Model |  195.741395     1  195.741395           Prob > F      =  0.0000

    Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0013

       Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      female |   .2055446   .0406899     5.05   0.000     .1257887    .2853005

       _cons |    13.5574   .0290221   467.14   0.000     13.50051    13.61429

------------------------------------------------------------------------------

*aweighted regression gives similar coefficient, t-statistic and so on, but may be preferable to the unweighted regression because the weights are important to correct for sampling bias in the sample.

 

 

. regress yrsed female [fweight= perwt_rounded] if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =37785945

-------------+------------------------------           F(  1,37785943) =52018.00

       Model |  398979.047     1  398979.047           Prob > F      =  0.0000

    Residual |   28981891037785943  7.67001924           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0014

       Total |   29021788937785944  7.68057796           Root MSE      =  2.7695

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      female |   .2055446   .0009012   228.07   0.000     .2037782    .2073109

       _cons |    13.5574   .0006428  2.1e+04   0.000     13.55614    13.55866

------------------------------------------------------------------------------

* fweighted regression unfairly increases the sample size by 2000 times, decreasing the standard error by sqrt(2000) or about 43, and increasing the t-statistic by a factor of about 43.

 

 

. table occ1990 if occ1990==178| occ1990==95| occ1990==125, contents(freq mean incwage p25 incwage p75 incwage)

 

----------------------------------------------------------------------------------

Occupation, 1990      |

basis                 |         Freq.  mean(incwage)   p25(incwage)   p75(incwage)

----------------------+-----------------------------------------------------------

    Registered nurses |           966    37536.85197          25000          48000

Sociology instructors |             6    41508.33333          35000          46000

              Lawyers |           441    74044.32653          17000         100960

----------------------------------------------------------------------------------

 

. ttest incwage if occ1990==178| occ1990==95, by(occ1990)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Register |     966    37536.85    702.6892    21839.96    36157.88    38915.83

 Lawyers |     441    74044.33    3287.284    69032.96     67583.6    80505.06

---------+--------------------------------------------------------------------

combined |    1407    48979.49    1223.363    45888.34    46579.68    51379.31

---------+--------------------------------------------------------------------

    diff |           -36507.47    2451.758               -41316.97   -31697.97

------------------------------------------------------------------------------

    diff = mean(Register) - mean(Lawyers)                         t = -14.8903

Ho: diff = 0                                     degrees of freedom =     1405

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* above: how to run a t-test on the occupational comparisons.

 

 

. graph box incwage if occ1990==178| occ1990==95| occ1990==125, over(occ1990)

* above: how to generate a box plot.

 

. gen months_ed=yrsed*12

(30484 missing values generated)

 

* If we rescale yrsed by multiplying by 12, would our results change?

 

. ttest yrsed if age >24 & age <35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. ttest  months_ed if age >24 & age <35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

  Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

    diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* Answer: the units of educational attainment are changed, increased by factor of 12, but the t-test is not changed, it is unit-free.

 

 

. gen lawyers=0

 

. replace lawyers=1 if occ1990==178

(441 real changes made)

 

. gen nurses=0

 

. replace nurses=1 if occ1990==95

(966 real changes made)

 

. ttest incwage if occ1990==178| occ1990==95, by(occ1990)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Register |     966    37536.85    702.6892    21839.96    36157.88    38915.83

 Lawyers |     441    74044.33    3287.284    69032.96     67583.6    80505.06

---------+--------------------------------------------------------------------

combined |    1407    48979.49    1223.363    45888.34    46579.68    51379.31

---------+--------------------------------------------------------------------

    diff |           -36507.47    2451.758               -41316.97   -31697.97

------------------------------------------------------------------------------

    diff = mean(Register) - mean(Lawyers)                         t = -14.8903

Ho: diff = 0                                     degrees of freedom =     1405

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. regress incwage lawyers if occ1990==178| occ1990==95

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  221.72

       Model |  4.0354e+11     1  4.0354e+11           Prob > F      =  0.0000

    Residual |  2.5571e+12  1405  1.8200e+09           R-squared     =  0.1363

-------------+------------------------------           Adj R-squared =  0.1357

       Total |  2.9607e+12  1406  2.1057e+09           Root MSE      =   42662

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

     lawyers |   36507.47   2451.758    14.89   0.000     31697.97    41316.97

       _cons |   37536.85   1372.618    27.35   0.000     34844.25    40229.45

------------------------------------------------------------------------------

 

* would it matter if we compared nurses to lawyers instead of the other way around? No, nothing changes except the signs of the coefficient and the sign of the t-statistic. Since the t-distribution is symmetric, and since we are generally always looking at two tailed tests, plus and minus coefficients have the same meaning.

 

. regress incwage  nurses if occ1990==178| occ1990==95

 

      Source |       SS       df       MS              Number of obs =    1407

-------------+------------------------------           F(  1,  1405) =  221.72

       Model |  4.0354e+11     1  4.0354e+11           Prob > F      =  0.0000

    Residual |  2.5571e+12  1405  1.8200e+09           R-squared     =  0.1363

-------------+------------------------------           Adj R-squared =  0.1357

       Total |  2.9607e+12  1406  2.1057e+09           Root MSE      =   42662

 

------------------------------------------------------------------------------

     incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      nurses |  -36507.47   2451.758   -14.89   0.000    -41316.97   -31697.97

       _cons |   74044.33    2031.51    36.45   0.000     70059.21    78029.45

------------------------------------------------------------------------------

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\so

> c_meth_proj3\2011_180B_logs\class4.log

  log type:  text

 closed on:   3 Feb 2011, 15:29:11

----------------------------------------------------------------------------------------