------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2011_381_logs\class2.log
log type: text
opened on: 29 Sep 2011, 11:59:43
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new_unch
> anged.dta", clear
. sort sex
. by sex: summarize incwelfr
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 49353 11.35025 245.3368 0 13800
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 53873 67.43862 618.6006 0 25000
*We always have to be thinking about what the appropriate population frame is. The average welfare income across all people is too broad a frame, most people have zero welfare income so the average is way low, $11 for men.
. by sex: summarize incwelfr if incwelfr>0
--------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 188 2979.622 2644.509 1 13800
--------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1101 3299.837 2839.866 1 25000
. by sex: summarize incwelfr if incwelfr>0 [fweight=perwt_rounded]
---------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 357702 2897.24 2577.316 1 13800
---------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 2193544 3100.608 2837.588 1 25000
. table sex if incwelfr>0 [fweight=perwt_rounded], contents (freq mean incwelfr max incwelfr)
----------------------------------------------------------
Sex | Freq. mean(incwelfr) max(incwelfr)
----------+-----------------------------------------------
Male | 3.12e+07 2897.240312 13800
Female | 3.17e+07 3100.608278 25000
----------------------------------------------------------
. table sex if incwelfr>0 & incwelfr ~=. [fweight=perwt_rounded], contents (freq mean incwelfr max incwelfr)
----------------------------------------------------------
Sex | Freq. mean(incwelfr) max(incwelfr)
----------+-----------------------------------------------
Male | 357,702 2897.240312 13800
Female | 2193544 3100.608278 25000
----------------------------------------------------------
* Table is a really useful and versatile command, but notice that the frequency counts didn’t match what we got with summarize until we excluded all cases with missing values (i.e. incwelfr~=.).
. by sex: summarize incwage if age >=25 & age<=34
---------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 9027 29510.62 26619.54 0 362302
---------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 9511 17728.95 20249.23 0 333564
. table sex if age>=25 & age<=34, contents (freq mean incwage mean yrsed)
-------------------------------------------------------
Sex | Freq. mean(incwage) mean(yrsed)
----------+--------------------------------------------
Male | 9,027 29510.61781 13.31212
Female | 9,511 17728.94764 13.55657
-------------------------------------------------------
. summarize age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 133710 35.17964 22.21722 0 90
* By the way, why is the age maximum 90 years? Surely in a sample of 133K people, someone would be older than 90, right? The answer is that ages above 90 are topcoded to maintain confidentiality. You can see this if you tabulate age, or more easily by looking at the ipums documentation for variable age.
. tabulate age
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 1.28 1.28
1 | 1,932 1.44 2.73
2 | 1,950 1.46 4.18
3 | 1,939 1.45 5.63
4 | 1,965 1.47 7.10
5 | 1,998 1.49 8.60
6 | 2,059 1.54 10.14
7 | 2,176 1.63 11.77
8 | 2,163 1.62 13.38
9 | 2,243 1.68 15.06
10 | 2,202 1.65 16.71
11 | 2,083 1.56 18.27
12 | 2,035 1.52 19.79
13 | 2,047 1.53 21.32
14 | 1,979 1.48 22.80
15 | 2,046 1.53 24.33
16 | 1,965 1.47 25.80
17 | 1,998 1.49 27.29
18 | 1,847 1.38 28.67
19 | 1,826 1.37 30.04
20 | 1,722 1.29 31.33
21 | 1,687 1.26 32.59
22 | 1,638 1.23 33.81
23 | 1,622 1.21 35.03
24 | 1,662 1.24 36.27
25 | 1,666 1.25 37.52
26 | 1,640 1.23 38.74
27 | 1,726 1.29 40.03
28 | 1,801 1.35 41.38
29 | 1,995 1.49 42.87
30 | 1,907 1.43 44.30
31 | 1,991 1.49 45.79
32 | 1,890 1.41 47.20
33 | 1,898 1.42 48.62
34 | 2,024 1.51 50.13
35 | 2,134 1.60 51.73
36 | 2,123 1.59 53.32
37 | 2,099 1.57 54.89
38 | 2,064 1.54 56.43
39 | 2,228 1.67 58.10
40 | 2,190 1.64 59.74
41 | 2,115 1.58 61.32
42 | 2,137 1.60 62.92
43 | 2,091 1.56 64.48
44 | 2,114 1.58 66.06
45 | 2,118 1.58 67.64
46 | 1,939 1.45 69.10
47 | 1,957 1.46 70.56
48 | 1,827 1.37 71.93
49 | 1,767 1.32 73.25
50 | 1,865 1.39 74.64
51 | 1,802 1.35 75.99
52 | 1,825 1.36 77.35
53 | 1,695 1.27 78.62
54 | 1,301 0.97 79.59
55 | 1,323 0.99 80.58
56 | 1,324 0.99 81.57
57 | 1,304 0.98 82.55
58 | 1,128 0.84 83.39
59 | 1,129 0.84 84.24
60 | 1,154 0.86 85.10
61 | 1,051 0.79 85.89
62 | 1,073 0.80 86.69
63 | 938 0.70 87.39
64 | 952 0.71 88.10
65 | 1,014 0.76 88.86
66 | 869 0.65 89.51
67 | 926 0.69 90.20
68 | 908 0.68 90.88
69 | 904 0.68 91.56
70 | 913 0.68 92.24
71 | 885 0.66 92.90
72 | 770 0.58 93.48
73 | 797 0.60 94.08
74 | 814 0.61 94.68
75 | 796 0.60 95.28
76 | 704 0.53 95.81
77 | 646 0.48 96.29
78 | 687 0.51 96.80
79 | 602 0.45 97.25
80 | 514 0.38 97.64
81 | 476 0.36 97.99
82 | 425 0.32 98.31
83 | 427 0.32 98.63
84 | 325 0.24 98.87
85 | 306 0.23 99.10
86 | 248 0.19 99.29
87 | 209 0.16 99.44
88 | 172 0.13 99.57
89 | 155 0.12 99.69
90 (90+, 1988-2002) | 416 0.31 100.00
--------------------+-----------------------------------
Total | 133,710 100.00
. table sex if age>=25 & age<=34, contents (freq mean incwage mean yrsed)
-------------------------------------------------------
Sex | Freq. mean(incwage) mean(yrsed)
----------+--------------------------------------------
Male | 9,027 29510.61781 13.31212
Female | 9,511 17728.94764 13.55657
-------------------------------------------------------
. display 13.55657-13.31212
.24445
* display is like a little calculator command, it prints the results without altering the variables in memory at all.
. ttest yrsed if age >=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Let’s see what it looks like to generate a new 0-1 dummy variable for gender, and put that into a simple regression.
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. tabulate sex, nolab
Sex | Freq. Percent Cum.
------------+-----------------------------------
1 | 64,791 48.46 48.46
2 | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. generate male=0
. replace male=1 if sex==1
(64791 real changes made)
. label define male_lbl 0 "female" 1 "male"
. label val male male_lbl
* When generating a new variable: first generate, then replace the values until they are what you want, then create value labels, then attach those value labels to the variable.
. tabulate sex male
| male
Sex | female male | Total
-----------+----------------------+----------
Male | 0 64,791 | 64,791
Female | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
* Then cross tabulate the old and new variables, to make sure that the new variable does what you want, and that you haven’t miscoded or left any cases as missing.
. tabulate sex male, miss
| male
Sex | female male | Total
-----------+----------------------+----------
Male | 0 64,791 | 64,791
Female | 68,919 0 | 68,919
-----------+----------------------+----------
Total | 68,919 64,791 | 133,710
*With a proper 0-1 dummy variable for gender, we can now plug it into the regression and run it. And guess what? It gives us exactly the same result as the t-test.
. regress yrsed male if age >=25 & age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
. ttest yrsed if age >=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* At this point, before clearing the data out of memory, if you wanted to keep the new “male” variable, you would have to save the dataset. I didn’t want to keep it, so I just cleared.
. clear
* Then I copied the folder address name where my downloaded data was store and made it the default directory using stata’s “cd” command.
. cd "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps"
C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps
* Then I used the File>Do menu to run the do-file from that default directory, which automatically found the .dat file in the same directory, and then STATA was off to the races.
. do "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\cps_00008.do"
* Since the do-file is just a text file list of STATA commands, the do-file commands appear in the results window and in the log as they are performed.
. /* Important: you need to put the .dat and .do files in one folder/
> directory and then set the working folder to that folder. */
.
. set more off
.
. clear
. infix ///
> int year 1-4 ///
> byte age 5-6 ///
> byte sex 7 ///
> using cps_00008.dat
(210648 observations read)
.
. label var year `"Survey year"'
. label var age `"Age"'
. label var sex `"Sex"'
.
. label define agelbl 00 `"Under 1 year"'
. label define agelbl 01 `"1"', add
. label define agelbl 02 `"2"', add
. label define agelbl 03 `"3"', add
. label define agelbl 04 `"4"', add
. label define agelbl 05 `"5"', add
. label define agelbl 06 `"6"', add
. label define agelbl 07 `"7"', add
. label define agelbl 08 `"8"', add
. label define agelbl 09 `"9"', add
. label define agelbl 10 `"10"', add
. label define agelbl 11 `"11"', add
. label define agelbl 12 `"12"', add
. label define agelbl 13 `"13"', add
. label define agelbl 14 `"14"', add
. label define agelbl 15 `"15"', add
. label define agelbl 16 `"16"', add
. label define agelbl 17 `"17"', add
. label define agelbl 18 `"18"', add
. label define agelbl 19 `"19"', add
. label define agelbl 20 `"20"', add
. label define agelbl 21 `"21"', add
. label define agelbl 22 `"22"', add
. label define agelbl 23 `"23"', add
. label define agelbl 24 `"24"', add
. label define agelbl 25 `"25"', add
. label define agelbl 26 `"26"', add
. label define agelbl 27 `"27"', add
. label define agelbl 28 `"28"', add
. label define agelbl 29 `"29"', add
. label define agelbl 30 `"30"', add
. label define agelbl 31 `"31"', add
. label define agelbl 32 `"32"', add
. label define agelbl 33 `"33"', add
. label define agelbl 34 `"34"', add
. label define agelbl 35 `"35"', add
. label define agelbl 36 `"36"', add
. label define agelbl 37 `"37"', add
. label define agelbl 38 `"38"', add
. label define agelbl 39 `"39"', add
. label define agelbl 40 `"40"', add
. label define agelbl 41 `"41"', add
. label define agelbl 42 `"42"', add
. label define agelbl 43 `"43"', add
. label define agelbl 44 `"44"', add
. label define agelbl 45 `"45"', add
. label define agelbl 46 `"46"', add
. label define agelbl 47 `"47"', add
. label define agelbl 48 `"48"', add
. label define agelbl 49 `"49"', add
. label define agelbl 50 `"50"', add
. label define agelbl 51 `"51"', add
. label define agelbl 52 `"52"', add
. label define agelbl 53 `"53"', add
. label define agelbl 54 `"54"', add
. label define agelbl 55 `"55"', add
. label define agelbl 56 `"56"', add
. label define agelbl 57 `"57"', add
. label define agelbl 58 `"58"', add
. label define agelbl 59 `"59"', add
. label define agelbl 60 `"60"', add
. label define agelbl 61 `"61"', add
. label define agelbl 62 `"62"', add
. label define agelbl 63 `"63"', add
. label define agelbl 64 `"64"', add
. label define agelbl 65 `"65"', add
. label define agelbl 66 `"66"', add
. label define agelbl 67 `"67"', add
. label define agelbl 68 `"68"', add
. label define agelbl 69 `"69"', add
. label define agelbl 70 `"70"', add
. label define agelbl 71 `"71"', add
. label define agelbl 72 `"72"', add
. label define agelbl 73 `"73"', add
. label define agelbl 74 `"74"', add
. label define agelbl 75 `"75"', add
. label define agelbl 76 `"76"', add
. label define agelbl 77 `"77"', add
. label define agelbl 78 `"78"', add
. label define agelbl 79 `"79"', add
. label define agelbl 80 `"80"', add
. label define agelbl 81 `"81"', add
. label define agelbl 82 `"82"', add
. label define agelbl 83 `"83"', add
. label define agelbl 84 `"84"', add
. label define agelbl 85 `"85"', add
. label define agelbl 86 `"86"', add
. label define agelbl 87 `"87"', add
. label define agelbl 88 `"88"', add
. label define agelbl 89 `"89"', add
. label define agelbl 90 `"90 (90+, 1988-2002)"', add
. label define agelbl 91 `"91"', add
. label define agelbl 92 `"92"', add
. label define agelbl 93 `"93"', add
. label define agelbl 94 `"94"', add
. label define agelbl 95 `"95"', add
. label define agelbl 96 `"96"', add
. label define agelbl 97 `"97"', add
. label define agelbl 98 `"98"', add
. label define agelbl 99 `"99+"', add
. label values age agelbl
.
. label define sexlbl 1 `"Male"'
. label define sexlbl 2 `"Female"', add
. label values sex sexlbl
.
.
end of do-file
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web p
> ages\soc_meth_proj3\fall_2011_381_logs\class2.log
log type: text
closed on: 29 Sep 2011, 15:30:54
---------------------------------------------------------------------------------