-----------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38

> 1_logs\class5.log

  log type:  text

 opened on:   8 Oct 2013, 13:37:43

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

 

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)

 

-------------------------------------------------

      Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

     Male |    13.31212     2.967666        9,027

   Female |    13.55657     2.854472        9,511

-------------------------------------------------

 

* Our old friend, the young men and women’s table of average educational attainments.

 

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed semean yrsed freq)

 

--------------------------------------------------------------

      Sex | mean(yrsed)    sd(yrsed)   sem(yrsed)        Freq.

----------+---------------------------------------------------

     Male |    13.31212     2.967666     .0312351        9,027

   Female |    13.55657     2.854472     .0292693        9,511

--------------------------------------------------------------

 

* With Standard error of the mean, which you recall is semean=sd/sqrt(n)

 

 

. ttest yrsed if age>24 & age<35, by (sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

 

* Here are the equal(above) and unequal (below) versions of the ttest, very similar in outcome because the standard errors of the means of the educations of men and women are so similar…

 

. ttest yrsed if age>24 & age<35, by (sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

*First check: change of scale of outcome variable.

 

. gen months_ed=yrsed*12

(30484 missing values generated)

 

. ttest months_ed if age>24 & age<35, by (sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

  Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

    diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* Under change of scale, mean and std error are changed (because they are in the units of Y, whatever Y is), but the t-statistic remains the same, because t statistic is unit-free.

 

. gen byte male=0

 

. replace male=1 if sex==1

(64791 real changes made)

 

* generate a new dummy variable for male, which we will use in our regressions.

 

. tabulate sex male

 

           |         male

       Sex |         0          1 |     Total

-----------+----------------------+----------

      Male |         0     64,791 |    64,791

    Female |    68,919          0 |    68,919

-----------+----------------------+----------

     Total |    68,919     64,791 |   133,710

 

 

. regress yrsed male if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

       _cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

 

. regress months_ed male if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  39850.9104     1  39850.9104           Prob > F      =  0.0000

    Residual |  22605108.7 18536  1219.52464           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  22644959.6 18537  1221.60865           Root MSE      =  34.922

 

------------------------------------------------------------------------------

   months_ed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -2.933363   .5131471    -5.72   0.000    -3.939178   -1.927547

       _cons |   162.6788   .3580818   454.31   0.000     161.9769    163.3807

------------------------------------------------------------------------------

 

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed semean yrsed freq)

 

--------------------------------------------------------------

      Sex | mean(yrsed)    sd(yrsed)   sem(yrsed)        Freq.

----------+---------------------------------------------------

     Male |    13.31212     2.967666     .0312351        9,027

   Female |    13.55657     2.854472     .0292693        9,511

--------------------------------------------------------------

 

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed semean yrsed freq)

 

--------------------------------------------------------------

      Sex | mean(yrsed)    sd(yrsed)   sem(yrsed)        Freq.

----------+---------------------------------------------------

     Male |     13.5574     2.819247      .029673        9,027

   Female |    13.76295     2.720855     .0278992        9,511

--------------------------------------------------------------

 

* aweighted data has similar (but not exactly the same) mean and sd, and weighted N exactly the same as unweighted N, because aweights (or “analytical weights”) rescale the weights so that the average weight is 1, in order to leave sample size unchanged.

 

. regress yrsed male if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

       _cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

 

. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

       Model |  195.741395     1  195.741395           Prob > F      =  0.0000

    Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0013

       Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

       _cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

 

* aweighted regression is similar to unweighted regression, but not exactly the same (because the application of the weights makes some cases relatively more important, and some cases less important…

 

. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]

 

      Source |       SS       df       MS              Number of obs =37785945

-------------+------------------------------           F(  1,37785943) =52018.00

       Model |  398979.047     1  398979.047           Prob > F      =  0.0000

    Residual |   28981891037785943  7.67001924           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0014

       Total |   29021788937785944  7.68057796           Root MSE      =  2.7695

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2055446   .0009012  -228.07   0.000    -.2073109   -.2037782

       _cons |   13.76294   .0006317  2.2e+04   0.000     13.76171    13.76418

------------------------------------------------------------------------------

 

* regression with fweights has the same coefficients as the aweighted regression, but has sample size increased by a factor of about 2000, meaning t-statistic increased by a factor of sqrt(2000), or about 43 times, to -228. The key thing to know about this is that the fweighted regression produces a wildly unrealistically large t-statistic, because here we are pretending that we really have 37 million young people in our sample, instead of the 18 thousand we really do have. Fweights are useful and correct for some applications (we use them with the CPS to generate national totals), but used in this way, the fweighted regression is wrong and misleading.

 

. gen random_uniform=uniform()

 

. summarize random_uniform

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

random_uni~m |    133710    .5006203    .2884588   .0000219   .9999971

 

. display 0.2884588^2

.08320848

 

* Just to recall, we proved earlier that the mean of the uniform distribution would be 0.5, and the variance would be 1/12, nice to see that both are still true.

 

* We cannot legitimately increase our sample size, but we can decrease the sample size arbitrarily.

 

. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

       Model |  195.741395     1  195.741395           Prob > F      =  0.0000

    Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0013

       Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

       _cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

 

. regress yrsed male if age>24 & age<35 & random_uniform <=0.25 [aweight= perwt_rounded]

(sum of wgt is   9.2846e+06)

 

      Source |       SS       df       MS              Number of obs =    4578

-------------+------------------------------           F(  1,  4576) =    4.52

       Model |  34.4468815     1  34.4468815           Prob > F      =  0.0336

    Residual |  34890.4634  4576  7.62466419           R-squared     =  0.0010

-------------+------------------------------           Adj R-squared =  0.0008

       Total |  34924.9102  4577  7.63052441           Root MSE      =  2.7613

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.1735029   .0816285    -2.13   0.034    -.3335342   -.0134715

       _cons |   13.72653    .057329   239.43   0.000     13.61413    13.83892

------------------------------------------------------------------------------

 

* If we arbitrarily limit ourselves to ¼ of the data in the CPS, we expect the T-statistic to be half as large, but since this is a random sub-sample, it can be bigger or small than we expect it to be.

 

. regress yrsed male if age>24 & age<35 & random_uniform >=0.75 [aweight= perwt_rounded]

(sum of wgt is   9.6623e+06)

 

      Source |       SS       df       MS              Number of obs =    4719

-------------+------------------------------           F(  1,  4717) =   11.48

       Model |  87.7286053     1  87.7286053           Prob > F      =  0.0007

    Residual |  36055.7534  4717  7.64378914           R-squared     =  0.0024

-------------+------------------------------           Adj R-squared =  0.0022

       Total |   36143.482  4718  7.66076345           Root MSE      =  2.7647

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2727883   .0805211    -3.39   0.001    -.4306472   -.1149294

       _cons |    13.8423   .0561835   246.38   0.000     13.73216    13.95245

------------------------------------------------------------------------------

 

* And here is a different random ¼ sample, note that the results are somewhat different than the previous, but would still lead to the same substantive answer (that young women in the US have significantly more education than young men).

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38

> 1_logs\class5.log

  log type:  text

 closed on:   8 Oct 2013, 16:03:12

-----------------------------------------------------------------------------------