----------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3
> \2011_180B_logs\class2.log
log type: text
opened on: 27 Jan 2011, 12:02:18
* Don't forget to start by opening a log!
*First, I went through the process of opening and reading in a CPS file that one has downloaded from ipums. First you put the compressed downloaded file in its own folder, then uncompress the contents to that same folder. Then set the default directory in Stata to that folder, using the cd command.
. cd "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps"
C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps
* Then go to the File menu in Stata, pick the do function, and select the do-file in your special directory.
. do "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\cps_00008.do"
. /* Important: you need to put the .dat and .do files in one folder/
> directory and then set the working folder to that folder. */
.
. set more off
.
. clear
. infix ///
> int year 1-4 ///
> byte age 5-6 ///
> byte sex 7 ///
> using cps_00008.dat
(210648 observations read)
.
. label var year `"Survey year"'
. label var age `"Age"'
. label var sex `"Sex"'
.
. label define agelbl 00 `"Under 1 year"'
. label define agelbl 01 `"1"', add
. label define agelbl 02 `"2"', add
. label define agelbl 03 `"3"', add
. label define agelbl 04 `"4"', add
. label define agelbl 05 `"5"', add
. label define agelbl 06 `"6"', add
. label define agelbl 07 `"7"', add
. label define agelbl 08 `"8"', add
. label define agelbl 09 `"9"', add
. label define agelbl 10 `"10"', add
. label define agelbl 11 `"11"', add
. label define agelbl 12 `"12"', add
. label define agelbl 13 `"13"', add
. label define agelbl 14 `"14"', add
. label define agelbl 15 `"15"', add
. label define agelbl 16 `"16"', add
. label define agelbl 17 `"17"', add
. label define agelbl 18 `"18"', add
. label define agelbl 19 `"19"', add
. label define agelbl 20 `"20"', add
. label define agelbl 21 `"21"', add
. label define agelbl 22 `"22"', add
. label define agelbl 23 `"23"', add
. label define agelbl 24 `"24"', add
. label define agelbl 25 `"25"', add
. label define agelbl 26 `"26"', add
. label define agelbl 27 `"27"', add
. label define agelbl 28 `"28"', add
. label define agelbl 29 `"29"', add
. label define agelbl 30 `"30"', add
. label define agelbl 31 `"31"', add
. label define agelbl 32 `"32"', add
. label define agelbl 33 `"33"', add
. label define agelbl 34 `"34"', add
. label define agelbl 35 `"35"', add
. label define agelbl 36 `"36"', add
. label define agelbl 37 `"37"', add
. label define agelbl 38 `"38"', add
. label define agelbl 39 `"39"', add
. label define agelbl 40 `"40"', add
. label define agelbl 41 `"41"', add
. label define agelbl 42 `"42"', add
. label define agelbl 43 `"43"', add
. label define agelbl 44 `"44"', add
. label define agelbl 45 `"45"', add
. label define agelbl 46 `"46"', add
. label define agelbl 47 `"47"', add
. label define agelbl 48 `"48"', add
. label define agelbl 49 `"49"', add
. label define agelbl 50 `"50"', add
. label define agelbl 51 `"51"', add
. label define agelbl 52 `"52"', add
. label define agelbl 53 `"53"', add
. label define agelbl 54 `"54"', add
. label define agelbl 55 `"55"', add
. label define agelbl 56 `"56"', add
. label define agelbl 57 `"57"', add
. label define agelbl 58 `"58"', add
. label define agelbl 59 `"59"', add
. label define agelbl 60 `"60"', add
. label define agelbl 61 `"61"', add
. label define agelbl 62 `"62"', add
. label define agelbl 63 `"63"', add
. label define agelbl 64 `"64"', add
. label define agelbl 65 `"65"', add
. label define agelbl 66 `"66"', add
. label define agelbl 67 `"67"', add
. label define agelbl 68 `"68"', add
. label define agelbl 69 `"69"', add
. label define agelbl 70 `"70"', add
. label define agelbl 71 `"71"', add
. label define agelbl 72 `"72"', add
. label define agelbl 73 `"73"', add
. label define agelbl 74 `"74"', add
. label define agelbl 75 `"75"', add
. label define agelbl 76 `"76"', add
. label define agelbl 77 `"77"', add
. label define agelbl 78 `"78"', add
. label define agelbl 79 `"79"', add
. label define agelbl 80 `"80"', add
. label define agelbl 81 `"81"', add
. label define agelbl 82 `"82"', add
. label define agelbl 83 `"83"', add
. label define agelbl 84 `"84"', add
. label define agelbl 85 `"85"', add
. label define agelbl 86 `"86"', add
. label define agelbl 87 `"87"', add
. label define agelbl 88 `"88"', add
. label define agelbl 89 `"89"', add
. label define agelbl 90 `"90 (90+, 1988-2002)"', add
. label define agelbl 91 `"91"', add
. label define agelbl 92 `"92"', add
. label define agelbl 93 `"93"', add
. label define agelbl 94 `"94"', add
. label define agelbl 95 `"95"', add
. label define agelbl 96 `"96"', add
. label define agelbl 97 `"97"', add
. label define agelbl 98 `"98"', add
. label define agelbl 99 `"99+"', add
. label values age agelbl
.
. label define sexlbl 1 `"Male"'
. label define sexlbl 2 `"Female"', add
. label values sex sexlbl
.
.
end of do-file
* At the end of the do-file, you will see the variables in your variable window. It is time to save the dataset (easiest to do using the save command under the File menu.
. save "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\trial_cps.dta"
file C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\trial_cps.dta saved
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 102,202 48.52 48.52
Female | 108,446 51.48 100.00
------------+-----------------------------------
Total | 210,648 100.00
. tabulate year
Survey year | Freq. Percent Cum.
------------+-----------------------------------
2005 | 210,648 100.00 100.00
------------+-----------------------------------
Total | 210,648 100.00
. clear all
* OK, enough with the trial CPS dataset, let's get back to our regular 2000 CPS dataset, we select it using the Open command under the file menu.
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear
. tabulate union
Union membership | Freq. Percent Cum.
----------------------------------+-----------------------------------
NIU | 120,249 89.93 89.93
No union coverage | 11,383 8.51 98.45
Member of labor union | 1,883 1.41 99.85
Covered by union but not a member | 195 0.15 100.00
----------------------------------+-----------------------------------
Total | 133,710 100.00
* Union is a curious variable, because there are so many
NIU, not in universe cases..
* The problem with NIU cases is that it throws the averages way off. That is,
this makes it look like only 1.4% of people are in unions, but that seems too
low..
. tabulate union, nolab
Union |
membership | Freq. Percent Cum.
------------+-----------------------------------
0 | 120,249 89.93 89.93
1 | 11,383 8.51 98.45
2 | 1,883 1.41 99.85
3 | 195 0.15 100.00
------------+-----------------------------------
Total | 133,710 100.00
* let's make a new union variable.
. generate union_new=union
. tabulate union union_new
* So far, the new union variable is just like the old one, but without the value labels.
| union_new
Union membership | 0 1 2 3 | Total
----------------------+--------------------------------------------+----------
NIU | 120,249 0 0 0 | 120,249
No union coverage | 0 11,383 0 0 | 11,383
Member of labor union | 0 0 1,883 0 | 1,883
Covered by union but | 0 0 0 195 | 195
----------------------+--------------------------------------------+----------
Total | 120,249 11,383 1,883 195 | 133,710
. tabulate union union_new, nolab
Union | union_new
membership | 0 1 2 3 | Total
-----------+--------------------------------------------+----------
0 | 120,249 0 0 0 | 120,249
1 | 0 11,383 0 0 | 11,383
2 | 0 0 1,883 0 | 1,883
3 | 0 0 0 195 | 195
-----------+--------------------------------------------+----------
Total | 120,249 11,383 1,883 195 | 133,710
* Let's add some value labels to the new union variable. First we define the new label.
. label define new_union_lbl 1 "no union coverage" 2 "union member" 3 "covered by union but not a member"
* Then we associate the new labels with the values of variable union_new.
. label val union_new new_union_lbl
. tabulate union_new
union_new | Freq. Percent Cum.
----------------------------------+-----------------------------------
0 | 120,249 89.93 89.93
no union coverage | 11,383 8.51 98.45
union member | 1,883 1.41 99.85
covered by union but not a member | 195 0.15 100.00
----------------------------------+-----------------------------------
Total | 133,710 100.00
. replace union_new=. if union==0
(120249 real changes made, 120249 to missing)
* Then we take the key step of setting all those NIU values to missing. Stata's default missing value code for numeric variables is the period. Note also the double equal sign after the if.
* Now when we tabulate the new union variable, the missing values are left out and we get a more sensible tabulation that shows that 14% of persons who were asked the union question were in fact in a union.
. tabulate union_new
union_new | Freq. Percent Cum.
----------------------------------+-----------------------------------
no union coverage | 11,383 84.56 84.56
union member | 1,883 13.99 98.55
covered by union but not a member | 195 1.45 100.00
----------------------------------+-----------------------------------
Total | 13,461 100.00
* In truth, we didn't need to create a new variable, we could have tabulated the old variable just excluding the NIU code of zero. "~=" means "not equal to". See my intro to stata page.
. tabulate union if union~=0
Union membership | Freq. Percent Cum.
----------------------------------+-----------------------------------
No union coverage | 11,383 84.56 84.56
Member of labor union | 1,883 13.99 98.55
Covered by union but not a member | 195 1.45 100.00
----------------------------------+-----------------------------------
Total | 13,461 100.00
. tabulate educrec
Educational attainment |
recode | Freq. Percent Cum.
------------------------+-----------------------------------
NIU | 30,484 22.80 22.80
None or preschool | 457 0.34 23.14
Grades 1, 2, 3, or 4 | 1,187 0.89 24.03
Grades 5, 6, 7, or 8 | 6,847 5.12 29.15
Grade 9 | 4,161 3.11 32.26
Grade 10 | 4,695 3.51 35.77
Grade 11 | 4,721 3.53 39.30
Grade 12 | 33,461 25.03 64.33
1 to 3 years of college | 25,883 19.36 83.69
4+ years of college | 21,814 16.31 100.00
------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate educrec, nolab
Educational |
attainment |
recode | Freq. Percent Cum.
------------+-----------------------------------
0 | 30,484 22.80 22.80
1 | 457 0.34 23.14
2 | 1,187 0.89 24.03
3 | 6,847 5.12 29.15
4 | 4,161 3.11 32.26
5 | 4,695 3.51 35.77
6 | 4,721 3.53 39.30
7 | 33,461 25.03 64.33
8 | 25,883 19.36 83.69
9 | 21,814 16.31 100.00
------------+-----------------------------------
Total | 133,710 100.00
* The problem with educrec is that the numbers do not correspond to actual years of education, so you can't take the average of educrec. That is why I created yrsed, to have real numbers representing the years of educational attainment of educrec.
. table educrec, contents (mean yrsed)
-------------------------------------
Educational attainment |
recode | mean(yrsed)
------------------------+------------
NIU |
None or preschool | 0
Grades 1, 2, 3, or 4 | 2.5
Grades 5, 6, 7, or 8 | 6.5
Grade 9 | 9
Grade 10 | 10
Grade 11 | 11
Grade 12 | 12
1 to 3 years of college | 14
4+ years of college | 17
-------------------------------------
* Table is like tabulate, except with table you can put anything (specified in contents) into the table.
. sort sex
. by sex: summarize yrsed if age>24 & age<35
---------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9027 13.31212 2.967666 0 17
---------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9511 13.55657 2.854472 0 17
* Is this difference between young men and women's educational attainment a big difference? It is about .24 years, or about 3 months. The real question is whether we can be sure, based on this CPS sample, that there really is a difference between men and women's education in the wider US society. That is, can we exclude the possibility that men and women have the same level of education in the US, and the women in this sample just happen to have a little bit more by chance? If we start with the assumption that women and men age 25-34 in the US have the same average educational attainment, how likely are we to find a random subsample with this big of a difference between men and women (and after all, this difference is not very big)? If the Probability of this null hypothesis of no difference turns out to be small, let's say smaller than 5%, we may reject the null hypothesis, and convince ourselves that young women in the US have more educational attainment than young men.
. ttest yrsed if age>24 & age <35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* I mentioned in class that the probability associated with a T-statistic of 5.7 was a small probability, certainly much smaller than needed to reject the null hypothesis of no difference between women and men. Stata lists the probability above as 0.0000, the middle probability. We could quantify the probability a little more carefully by asking Stata to produce the exact tail distribution of the T-distribution of value 5.7164, with 18,536 degrees of freedom. Then we would double the Probability to get a 2-tail test which I will explain in class. I didn't do this in class because I will be covering this later, but here it is now:
. display 2*ttail(18536, 5.7164)
1.105e-08
* So the probability of the null hypothesis (of no difference between women and men) is 10 e-8, or about 0.00000001, or about 1 in 100 million. That is a small probability, to be sure… What is really driving this probability to be so small is that although our difference was small, our sample size was large. Larger sample sizes allow one to be more certain of any small change.
* Note that you can use table to generate the same kind of statistics that summarize generates.
. table sex if age>24 & age< 35, contents (freq mean yrsed sd yrsed min yrsed max yrsed)
---------------------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) min(yrsed) max(yrsed)
----------+----------------------------------------------------------------
Male | 9,027 13.31212 2.967666 0 17
Female | 9,511 13.55657 2.854472 0 17
---------------------------------------------------------------------------
* Let's briefly go back to a male-female difference from last class that seemed like a big difference.
. by sex: summarize incwage if incwage>0 & age>25 & age<65
---------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 27097 42865.9 37545.68 1 364302
---------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 25198 25866.4 22697.5 1 333564
. ttest incwage if incwage>0 & age>25 & age<65, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 27097 42865.9 228.0864 37545.68 42418.84 43312.96
Female | 25198 25866.4 142.9865 22697.5 25586.13 26146.66
---------+--------------------------------------------------------------------
combined | 52295 34674.8 141.7524 32416.08 34396.97 34952.64
---------+--------------------------------------------------------------------
diff | 16999.51 273.7817 16462.89 17536.12
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = 62.0915
Ho: diff = 0 degrees of freedom = 52293
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
* The difference here is so large, the t-statistic is enormous, and the resulting p-value is so small that Stata will not give us a number more specific than "zero". The smallest number that Stata can express is 10 e-307. That is a really small number, when you consider that 10 e83 is roughly the number of atoms in the universe.
. display 2*ttail(52293, 62.09)
0
. describe yrsed
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------
yrsed float %9.0g based on educrec
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\
> soc_meth_proj3\2011_180B_logs\class2.log
log type: text
closed on: 27 Jan 2011, 15:49:59
--------------------------------------------------------------------------------------