--------------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fal
> l_2010_s381_logs\class5.log
log type: text
opened on: 5 Oct 2010, 14:05:17
. table sex if age>24 & age<35, contents (freq mean yrsed sd yrsed p25 yrsed p75 yrsed)
---------------------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) p25(yrsed) p75(yrsed)
----------+----------------------------------------------------------------
Male | 9,027 13.31212 2.967666 12 17
Female | 9,511 13.55657 2.854472 12 17
---------------------------------------------------------------------------
* This is the summary data of education by gender which we have seen before, see also my Excel file.
. gen random=runiform()
*generate a uniform random variable, which I called random.
. summarize random
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
random | 133710 .5006203 .2884588 .0000219 .9999971
*Max of 1, min of zero, average of 0.5
. histogram random
(bin=51, start=.00002188, width=.01960736)
*histogram shows that our new variable is nice and flat, but not perfectly so.
. table sex if age>24 & age<35 & random<.25, contents (freq mean yrsed sd yrsed p25 yrsed p75 yrsed)
---------------------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) p25(yrsed) p75(yrsed)
----------+----------------------------------------------------------------
Male | 2,249 13.36261 2.907726 12 17
Female | 2,366 13.61855 2.829585 12 17
---------------------------------------------------------------------------
* The random sub-sample of one fourth of our data has similar mean and sd, but not exactly the same. The 25th and 75th percentiles are exactly the same… Sample size is roughly one fourth of what it is above.
. ttest yrsed if age>24 & age<35, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. ttest yrsed if age>24 & age<35 & random<.25, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 2249 13.36261 .0613139 2.907726 13.24237 13.48284
Female | 2366 13.61855 .0581722 2.829585 13.50448 13.73263
---------+--------------------------------------------------------------------
combined | 4615 13.49382 .042254 2.870473 13.41099 13.57666
---------+--------------------------------------------------------------------
diff | -.2559489 .0845186 -.4216461 -.0902518
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -3.0283
Ho: diff = 0 Satterthwaite's degrees of freedom = 4585.15
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0012 Pr(|T| > |t|) = 0.0025 Pr(T > t) = 0.9988
* Given that t-statistic is proportional to square root of N, we would expect the second t-statistic to be half as large as the full one; it is in the neighborhood of half as large (not exactly half as large because the random sub-samples introduce random variation..)
. ttest yrsed if age>24 & age<35 & random>=.25 & random<.5, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 2312 13.20631 .0637414 3.064897 13.08132 13.33131
Female | 2366 13.49239 .0590895 2.874203 13.37652 13.60826
---------+--------------------------------------------------------------------
combined | 4678 13.351 .043469 2.973105 13.26578 13.43622
---------+--------------------------------------------------------------------
diff | -.2860773 .0869168 -.4564757 -.115679
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -3.2914
Ho: diff = 0 Satterthwaite's degrees of freedom = 4640.72
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0005 Pr(|T| > |t|) = 0.0010 Pr(T > t) = 0.9995
. graph hbox yrsed if age>24 & age<35, over(sex)
.
. graph hbox yrsed if age>24 & age<35 & random<.25, over(sex)
* These two box plots were identical, and the men and women's boxes were identical. Which either means that box plot is not a good way of comparing categorical variables with few categories, or else the difference between men's and women's educational attainment is not large enough to matter…
. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace
file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fal
> l_2010_s381_logs\class5.log
log type: text
closed on: 5 Oct 2010, 16:01:36
--------------------------------------------------------------------------------------------------------