---------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s
> oc_meth_proj3\2011_logs\class1.log
log type: text
opened on: 25 Jan 2011, 14:22:08
*Comments in the log will be preceded by an asterisk. The first thing you want to do is open a log (preferably in .log format), so that you have a place where your results are saved. The log is different from the CPS datafile, which you need to download from my website, and which you only need to save if you add new variables..
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear
*Opening the log and then opening the dataset are easiest to perform from the file menu within stata.
. describe
* I will try to bold the commands that I enter on the command line.
Contains data from C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta
obs: 133,710
vars: 55 1 Feb 2009 13:36
size: 15,109,230 (71.2% of memory free)
---------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------
year int %8.0g yearlbl Survey year
serial long %12.0g seriallbl
Household serial number
hhwt float %9.0g hhwtlbl Household weight
region byte %27.0g regionlbl
Region and division
statefip byte %57.0g statefiplbl
State (FIPS code)
metro byte %27.0g metrolbl Metropolitan central city status
metarea int %50.0g metarealbl
Metropolitan area
ownershp byte %21.0g ownershplbl
Ownership of dwelling
hhincome long %12.0g hhincomelbl
Total household income
pubhous byte %8.0g pubhouslbl
Living in public housing
foodstmp byte %8.0g foodstmplbl
Food stamp recipiency
pernum byte %8.0g pernumlbl
Person number in sample unit
perwt float %9.0g perwtlbl Person weight
momloc byte %8.0g momloclbl
Mother's location in the household
poploc byte %8.0g poploclbl
Father's location in the household
sploc byte %8.0g sploclbl Spouse's location in household
famsize byte %25.0g famsizelbl
Number of own family members in hh
nchild byte %18.0g nchildlbl
Number of own children in household
nchlt5 byte %23.0g nchlt5lbl
Number of own children under age 5 in hh
nsibs byte %18.0g nsibslbl Number of own siblings in household
relate int %34.0g relatelbl
Relationship to household head
age byte %19.0g agelbl Age
sex byte %8.0g sexlbl Sex
race int %37.0g racelbl Race
marst byte %23.0g marstlbl Marital status
popstat byte %14.0g popstatlbl
Adult civilian, armed forces, or child
bpl long %27.0g bpllbl Birthplace
yrimmig int %11.0g yrimmiglbl
Year of immigration
citizen byte %31.0g citizenlbl
Citizenship status
mbpl long %27.0g mbpllbl Mother's birthplace
fbpl long %27.0g fbpllbl Father's birthplace
hispan int %29.0g hispanlbl
Hispanic origin
educ99 byte %38.0g educ99lbl
Educational attainment, 1990
educrec byte %23.0g educreclbl
Educational attainment recode
schlcoll byte %45.0g schlcolllbl
School or college attendance
empstat byte %30.0g empstatlbl
Employment status
occ1990 int %78.0g occ1990lbl
Occupation, 1990 basis
wkswork1 byte %8.0g wkswork1lbl
Weeks worked last year
hrswork byte %8.0g hrsworklbl
Hours worked last week
uhrswork byte %13.0g uhrsworklbl
Usual hours worked per week (last yr)
hourwage int %8.0g hourwagelbl
Hourly wage
union byte %33.0g unionlbl Union membership
inctot long %12.0g Total personal income
incwage long %12.0g Wage and salary income
incss long %12.0g Social Security income
incwelfr long %12.0g Welfare (public assistance) income
vetstat byte %10.0g vetstatlbl
Veteran status
vetlast byte %26.0g vetlastlbl
Veteran's most recent period of service
disabwrk byte %34.0g disabwrklbl
Work disability
health byte %9.0g healthlbl
Health status
inclugh byte %8.0g inclughlbl
Included in employer group health plan
last year
himcaid byte %8.0g himcaidlbl
Covered by Medicaid last year
ftotval double %10.0g ftotvallbl
Total family income
perwt_rounded float %9.0g integer perwt, negative values recoded to
0
yrsed float %9.0g based on educrec
---------------------------------------------------------------------------------------
Sorted by: race
. clear all
. set mem 50m
*If you get a "not enough memory" error, you need to set mem to 50m or so. The dataset itself takes up about 15M.
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 50M max. data space 50.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
53.163M
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear
* Then I re-open the data..
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
* The number of individual cases in the March, 2000 CPS is 133,710.
. tabulate sex [ fweight=perwt_rounded]
* fweights are frequency weights, which we will be using a lot in this class..
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male |133,932,994 48.86 48.86
Female |140,154,827 51.14 100.00
------------+-----------------------------------
Total |274,087,821 100.00
* The number of people in the non-institutional population of the US in March, 2000 was 274 million.
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race [fweight= perwt_rounded]
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White |224,806,952 82.02 82.02
Black/Negro | 35,508,668 12.96 94.98
American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01
Asian or Pacific Islander | 10,924,728 3.99 100.00
--------------------------------------+-----------------------------------
Total |274,087,821 100.00
* Note that the weights are not uniform. That is, some people and some groups have larger or smaller weights, so that blacks make up almost 13% of the US but only 10% of the CPS. The weights are designed to correct for differences in response rates.
. tabulate race, nolabel
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
* Here is something to keep in mind: even nominal categorical variables like race are stored as numbers, with labels appended to the categories for that variable.
. summarize incwelfr
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 103226 40.62242 478.8231 0 25000
* You always need to apply the basic logic test to any result. Does it make sense that the average welfare income for 1999 would be $40? It makes sense when you consider that only a small fraction of the population has welfare income, so that the average is pulled down by many zeros.
. summarize incwelfr, detail
Welfare (public assistance) income
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 103226
25% 0 0 Sum of Wgt. 103226
50% 0 Mean 40.62242
Largest Std. Dev. 478.8231
75% 0 15600
90% 0 19999 Variance 229271.5
95% 0 23292 Skewness 16.98146
99% 804 25000 Kurtosis 403.6187
. summarize incwelfr if incwelfr>0
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwelfr | 1289 3253.134 2813.505 1 25000
* A more reasonable average ($3253) is the average welfare income for people whose welfare income is >0.
. tabulate age
Age | Freq. Percent Cum.
--------------------+-----------------------------------
Under 1 year | 1,713 1.28 1.28
1 | 1,932 1.44 2.73
2 | 1,950 1.46 4.18
3 | 1,939 1.45 5.63
4 | 1,965 1.47 7.10
5 | 1,998 1.49 8.60
6 | 2,059 1.54 10.14
7 | 2,176 1.63 11.77
8 | 2,163 1.62 13.38
9 | 2,243 1.68 15.06
10 | 2,202 1.65 16.71
11 | 2,083 1.56 18.27
12 | 2,035 1.52 19.79
13 | 2,047 1.53 21.32
14 | 1,979 1.48 22.80
15 | 2,046 1.53 24.33
16 | 1,965 1.47 25.80
17 | 1,998 1.49 27.29
18 | 1,847 1.38 28.67
19 | 1,826 1.37 30.04
20 | 1,722 1.29 31.33
21 | 1,687 1.26 32.59
22 | 1,638 1.23 33.81
23 | 1,622 1.21 35.03
24 | 1,662 1.24 36.27
25 | 1,666 1.25 37.52
26 | 1,640 1.23 38.74
27 | 1,726 1.29 40.03
28 | 1,801 1.35 41.38
29 | 1,995 1.49 42.87
30 | 1,907 1.43 44.30
31 | 1,991 1.49 45.79
32 | 1,890 1.41 47.20
33 | 1,898 1.42 48.62
34 | 2,024 1.51 50.13
35 | 2,134 1.60 51.73
36 | 2,123 1.59 53.32
37 | 2,099 1.57 54.89
38 | 2,064 1.54 56.43
39 | 2,228 1.67 58.10
40 | 2,190 1.64 59.74
41 | 2,115 1.58 61.32
42 | 2,137 1.60 62.92
43 | 2,091 1.56 64.48
44 | 2,114 1.58 66.06
45 | 2,118 1.58 67.64
46 | 1,939 1.45 69.10
47 | 1,957 1.46 70.56
48 | 1,827 1.37 71.93
49 | 1,767 1.32 73.25
50 | 1,865 1.39 74.64
51 | 1,802 1.35 75.99
52 | 1,825 1.36 77.35
53 | 1,695 1.27 78.62
54 | 1,301 0.97 79.59
55 | 1,323 0.99 80.58
56 | 1,324 0.99 81.57
57 | 1,304 0.98 82.55
58 | 1,128 0.84 83.39
59 | 1,129 0.84 84.24
60 | 1,154 0.86 85.10
61 | 1,051 0.79 85.89
62 | 1,073 0.80 86.69
63 | 938 0.70 87.39
64 | 952 0.71 88.10
65 | 1,014 0.76 88.86
66 | 869 0.65 89.51
67 | 926 0.69 90.20
68 | 908 0.68 90.88
69 | 904 0.68 91.56
70 | 913 0.68 92.24
71 | 885 0.66 92.90
72 | 770 0.58 93.48
73 | 797 0.60 94.08
74 | 814 0.61 94.68
75 | 796 0.60 95.28
76 | 704 0.53 95.81
77 | 646 0.48 96.29
78 | 687 0.51 96.80
79 | 602 0.45 97.25
80 | 514 0.38 97.64
81 | 476 0.36 97.99
82 | 425 0.32 98.31
83 | 427 0.32 98.63
84 | 325 0.24 98.87
85 | 306 0.23 99.10
86 | 248 0.19 99.29
87 | 209 0.16 99.44
88 | 172 0.13 99.57
89 | 155 0.12 99.69
90 (90+, 1988-2002) | 416 0.31 100.00
--------------------+-----------------------------------
Total | 133,710 100.00
* If you tabulate age, you find that 90 is the highest category. The CPS topcodes age to help protect the identity of old age outliers. They topcode income also. You can find out about the topcodes and other relevant information at ipums.org.
. summarize incwage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 103226 19462.59 28843.38 0 364302
. summarize race
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
race | 133710 132.4183 105.8387 100 650
* Income is a variable that it makes sense to summarize, because the mean of income means something. Race is a categorical variable stored as a number, so you *can* take the average of race, but you should not because the results don't mean anything. Incwage has units (1999 US dollars), race does not.
. *don't do this!
* Tabulate is for categorical variables, summarize is for true numeric or continuous variables. Just as you don't want to summarize the categorical variables, you don't want to tabulate the true continuous variables like incwage, because you get a different row for every value of incwage, and the table would go on for a thousand pages… Not good!
. tabulate incwage
Wage and |
salary |
income | Freq. Percent Cum.
------------+-----------------------------------
0 | 35,825 34.71 34.71
1 | 7 0.01 34.71
5 | 15 0.01 34.73
7 | 1 0.00 34.73
8 | 1 0.00 34.73
10 | 1 0.00 34.73
12 | 2 0.00 34.73
18 | 1 0.00 34.73
20 | 10 0.01 34.74
21 | 2 0.00 34.74
28 | 2 0.00 34.75
30 | 5 0.00 34.75
31 | 1 0.00 34.75
34 | 4 0.00 34.76
35 | 5 0.00 34.76
36 | 1 0.00 34.76
40 | 8 0.01 34.77
44 | 1 0.00 34.77
45 | 4 0.00 34.77
46 | 3 0.00 34.78
47 | 1 0.00 34.78
50 | 19 0.02 34.80
52 | 3 0.00 34.80
53 | 1 0.00 34.80
55 | 1 0.00 34.80
56 | 1 0.00 34.80
--Break--
r(1);
. *don't do this (tabulate incwage) either!
. summarize incwage, detail
Wage and salary income
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 103226
25% 0 0 Sum of Wgt. 103226
50% 10000 Mean 19462.59
Largest Std. Dev. 28843.38
75% 30000 362302
90% 50000 362302 Variance 8.32e+08
95% 66500 362302 Skewness 3.583439
99% 125000 364302 Kurtosis 24.50639
* But when we are interested in average income, we are usually just interested in the people who have income.
. summarize incwage if incwage>0 [fweight=perwt_rounded], detail
Wage and salary income
-------------------------------------------------------------
Percentiles Smallest
1% 300 1
5% 1548 1
10% 3500 1 Obs 140107244
25% 11000 1 Sum of Wgt. 140107244
50% 23841 Mean 30524.67
Largest Std. Dev. 31676.73
75% 40000 362302
90% 60647 362302 Variance 1.00e+09
95% 80000 362302 Skewness 3.336273
99% 197387 364302 Kurtosis 20.47819
* And we might want to limit ourselves to people with positive incomes in the age groups wherein people actually work for a living..
. summarize incwage if incwage>0 & age>25 & age<65 [fweight=perwt_rounded], detail
Wage and salary income
-------------------------------------------------------------
Percentiles Smallest
1% 650 1
5% 4000 1
10% 8000 1 Obs 107670623
25% 16000 1 Sum of Wgt. 107670623
50% 28711 Mean 35756.95
Largest Std. Dev. 33031.75
75% 45000 362302
90% 68000 362302 Variance 1.09e+09
95% 87468 362302 Skewness 3.234811
99% 229339 364302 Kurtosis 18.86115
* And we might want to see how men's income and women's income is different, which we accomplish in two steps. First, we sort by the variable or variables in question, then we summarize by those variables.
. sort sex
. by sex: summarize incwage if incwage>0 & age>25 & age<64
---------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 26909 42874.63 37494.11 1 364302
---------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
incwage | 25030 25901.4 22719.26 1 333564
* $43K compared to $26K seems like a big difference to me.
. summarize perwt_rounded
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
perwt_roun~d | 133710 2049.868 1083.244 93 14281
* The average person weight in the dataset is about 2000, because the CPS is a 1-in-2000 survey, which means 1 out of every 2000 persons in the US was surveyed. The weights are the inverse of the sampling frequency.
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s
> oc_meth_proj3\2011_logs\class1.log
log type: text
closed on: 25 Jan 2011, 15:27:00
---------------------------------------------------------------------------------------