---------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s

> oc_meth_proj3\2011_logs\class1.log

  log type:  text

 opened on:  25 Jan 2011, 14:22:08

 

*Comments in the log will be preceded by an asterisk. The first thing you want to do is open a log (preferably in .log format), so that you have a place where your results are saved. The log is different from the CPS datafile, which you need to download from my website, and which you only need to save if you add new variables..

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

 

*Opening the log and then opening the dataset are easiest to perform from the file menu within stata.

 

. describe

 

* I will try to bold the commands that I enter on the command line.

 

Contains data from C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta

  obs:       133,710                         

 vars:            55                          1 Feb 2009 13:36

 size:    15,109,230 (71.2% of memory free)

---------------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

---------------------------------------------------------------------------------------

year            int    %8.0g       yearlbl    Survey year

serial          long   %12.0g      seriallbl

                                              Household serial number

hhwt            float  %9.0g       hhwtlbl    Household weight

region          byte   %27.0g      regionlbl

                                              Region and division

statefip        byte   %57.0g      statefiplbl

                                              State (FIPS code)

metro           byte   %27.0g      metrolbl   Metropolitan central city status

metarea         int    %50.0g      metarealbl

                                              Metropolitan area

ownershp        byte   %21.0g      ownershplbl

                                              Ownership of dwelling

hhincome        long   %12.0g      hhincomelbl

                                              Total household income

pubhous         byte   %8.0g       pubhouslbl

                                              Living in public housing

foodstmp        byte   %8.0g       foodstmplbl

                                              Food stamp recipiency

pernum          byte   %8.0g       pernumlbl

                                              Person number in sample unit

perwt           float  %9.0g       perwtlbl   Person weight

momloc          byte   %8.0g       momloclbl

                                              Mother's location in the household

poploc          byte   %8.0g       poploclbl

                                              Father's location in the household

sploc           byte   %8.0g       sploclbl   Spouse's location in household

famsize         byte   %25.0g      famsizelbl

                                              Number of own family members in hh

nchild          byte   %18.0g      nchildlbl

                                              Number of own children in household

nchlt5          byte   %23.0g      nchlt5lbl

                                              Number of own children under age 5 in hh

nsibs           byte   %18.0g      nsibslbl   Number of own siblings in household

relate          int    %34.0g      relatelbl

                                              Relationship to household head

age             byte   %19.0g      agelbl     Age

sex             byte   %8.0g       sexlbl     Sex

race            int    %37.0g      racelbl    Race

marst           byte   %23.0g      marstlbl   Marital status

popstat         byte   %14.0g      popstatlbl

                                              Adult civilian, armed forces, or child

bpl             long   %27.0g      bpllbl     Birthplace

yrimmig         int    %11.0g      yrimmiglbl

                                              Year of immigration

citizen         byte   %31.0g      citizenlbl

                                              Citizenship status

mbpl            long   %27.0g      mbpllbl    Mother's birthplace

fbpl            long   %27.0g      fbpllbl    Father's birthplace

hispan          int    %29.0g      hispanlbl

                                              Hispanic origin

educ99          byte   %38.0g      educ99lbl

                                              Educational attainment, 1990

educrec         byte   %23.0g      educreclbl

                                              Educational attainment recode

schlcoll        byte   %45.0g      schlcolllbl

                                              School or college attendance

empstat         byte   %30.0g      empstatlbl

                                              Employment status

occ1990         int    %78.0g      occ1990lbl

                                              Occupation, 1990 basis

wkswork1        byte   %8.0g       wkswork1lbl

                                              Weeks worked last year

hrswork         byte   %8.0g       hrsworklbl

                                              Hours worked last week

uhrswork        byte   %13.0g      uhrsworklbl

                                              Usual hours worked per week (last yr)

hourwage        int    %8.0g       hourwagelbl

                                              Hourly wage

union           byte   %33.0g      unionlbl   Union membership

inctot          long   %12.0g                 Total personal income

incwage         long   %12.0g                 Wage and salary income

incss           long   %12.0g                 Social Security income

incwelfr        long   %12.0g                 Welfare (public assistance) income

vetstat         byte   %10.0g      vetstatlbl

                                              Veteran status

vetlast         byte   %26.0g      vetlastlbl

                                              Veteran's most recent period of service

disabwrk        byte   %34.0g      disabwrklbl

                                              Work disability

health          byte   %9.0g       healthlbl

                                              Health status

inclugh         byte   %8.0g       inclughlbl

                                              Included in employer group health plan

                                                last year

himcaid         byte   %8.0g       himcaidlbl

                                              Covered by Medicaid last year

ftotval         double %10.0g      ftotvallbl

                                              Total family income

perwt_rounded   float  %9.0g                  integer perwt, negative values recoded to

                                                0

yrsed           float  %9.0g                  based on educrec

---------------------------------------------------------------------------------------

Sorted by:  race

 

. clear all

 

. set mem 50m

 

*If you get a "not enough memory" error, you need to set mem to 50m or so. The dataset itself takes up about 15M.

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.909M

    set memory           50M    max. data space                 50.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                                53.163M

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

 

* Then I re-open the data..

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

* The number of individual cases in the March, 2000 CPS is 133,710.

 

 

. tabulate sex [ fweight=perwt_rounded]

 

* fweights are frequency weights, which we will be using a lot in this class..

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |133,932,994       48.86       48.86

     Female |140,154,827       51.14      100.00

------------+-----------------------------------

      Total |274,087,821      100.00

 

* The number of people in the non-institutional population of the US in March, 2000 was 274 million.

 

. tabulate race

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |    113,475       84.87       84.87

                          Black/Negro |     13,626       10.19       95.06

         American Indian/Aleut/Eskimo |      1,894        1.42       96.47

            Asian or Pacific Islander |      4,715        3.53      100.00

--------------------------------------+-----------------------------------

                                Total |    133,710      100.00

 

. tabulate race [fweight= perwt_rounded]

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |224,806,952       82.02       82.02

                          Black/Negro | 35,508,668       12.96       94.98

         American Indian/Aleut/Eskimo |  2,847,473        1.04       96.01

            Asian or Pacific Islander | 10,924,728        3.99      100.00

--------------------------------------+-----------------------------------

                                Total |274,087,821      100.00

 

* Note that the weights are not uniform. That is, some people and some groups have larger or smaller weights, so that blacks make up almost 13% of the US but only 10% of the CPS. The weights are designed to correct for differences in response rates.

 

. tabulate race, nolabel

 

       Race |      Freq.     Percent        Cum.

------------+-----------------------------------

        100 |    113,475       84.87       84.87

        200 |     13,626       10.19       95.06

        300 |      1,894        1.42       96.47

        650 |      4,715        3.53      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

* Here is something to keep in mind: even nominal categorical variables like race are stored as numbers, with labels appended to the categories for that variable.

 

. summarize incwelfr

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |    103226    40.62242    478.8231          0      25000

 

* You always need to apply the basic logic test to any result. Does it make sense that the average welfare income for 1999 would be $40? It makes sense when you consider that only a small fraction of the population has welfare income, so that the average is pulled down by many zeros.

 

. summarize incwelfr, detail

 

             Welfare (public assistance) income

-------------------------------------------------------------

      Percentiles      Smallest

 1%            0              0

 5%            0              0

10%            0              0       Obs              103226

25%            0              0       Sum of Wgt.      103226

 

50%            0                      Mean           40.62242

                        Largest       Std. Dev.      478.8231

75%            0          15600

90%            0          19999       Variance       229271.5

95%            0          23292       Skewness       16.98146

99%          804          25000       Kurtosis       403.6187

 

. summarize incwelfr if incwelfr>0

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |      1289    3253.134    2813.505          1      25000

 

* A more reasonable average ($3253) is the average welfare income for people whose welfare income is >0.

 

. tabulate age

 

                Age |      Freq.     Percent        Cum.

--------------------+-----------------------------------

       Under 1 year |      1,713        1.28        1.28

                  1 |      1,932        1.44        2.73

                  2 |      1,950        1.46        4.18

                  3 |      1,939        1.45        5.63

                  4 |      1,965        1.47        7.10

                  5 |      1,998        1.49        8.60

                  6 |      2,059        1.54       10.14

                  7 |      2,176        1.63       11.77

                  8 |      2,163        1.62       13.38

                  9 |      2,243        1.68       15.06

                 10 |      2,202        1.65       16.71

                 11 |      2,083        1.56       18.27

                 12 |      2,035        1.52       19.79

                 13 |      2,047        1.53       21.32

                 14 |      1,979        1.48       22.80

                 15 |      2,046        1.53       24.33

                 16 |      1,965        1.47       25.80

                 17 |      1,998        1.49       27.29

                 18 |      1,847        1.38       28.67

                 19 |      1,826        1.37       30.04

                 20 |      1,722        1.29       31.33

                 21 |      1,687        1.26       32.59

                 22 |      1,638        1.23       33.81

                 23 |      1,622        1.21       35.03

                 24 |      1,662        1.24       36.27

                 25 |      1,666        1.25       37.52

                 26 |      1,640        1.23       38.74

                 27 |      1,726        1.29       40.03

                 28 |      1,801        1.35       41.38

                 29 |      1,995        1.49       42.87

                 30 |      1,907        1.43       44.30

                 31 |      1,991        1.49       45.79

                 32 |      1,890        1.41       47.20

                 33 |      1,898        1.42       48.62

                 34 |      2,024        1.51       50.13

                 35 |      2,134        1.60       51.73

                 36 |      2,123        1.59       53.32

                 37 |      2,099        1.57       54.89

                 38 |      2,064        1.54       56.43

                 39 |      2,228        1.67       58.10

                 40 |      2,190        1.64       59.74

                 41 |      2,115        1.58       61.32

                 42 |      2,137        1.60       62.92

                 43 |      2,091        1.56       64.48

                 44 |      2,114        1.58       66.06

                 45 |      2,118        1.58       67.64

                 46 |      1,939        1.45       69.10

                 47 |      1,957        1.46       70.56

                 48 |      1,827        1.37       71.93

                 49 |      1,767        1.32       73.25

                 50 |      1,865        1.39       74.64

                 51 |      1,802        1.35       75.99

                 52 |      1,825        1.36       77.35

                 53 |      1,695        1.27       78.62

                 54 |      1,301        0.97       79.59

                 55 |      1,323        0.99       80.58

                 56 |      1,324        0.99       81.57

                 57 |      1,304        0.98       82.55

                 58 |      1,128        0.84       83.39

                 59 |      1,129        0.84       84.24

                 60 |      1,154        0.86       85.10

                 61 |      1,051        0.79       85.89

                 62 |      1,073        0.80       86.69

                 63 |        938        0.70       87.39

                 64 |        952        0.71       88.10

                 65 |      1,014        0.76       88.86

                 66 |        869        0.65       89.51

                 67 |        926        0.69       90.20

                 68 |        908        0.68       90.88

                 69 |        904        0.68       91.56

                 70 |        913        0.68       92.24

                 71 |        885        0.66       92.90

                 72 |        770        0.58       93.48

                 73 |        797        0.60       94.08

                 74 |        814        0.61       94.68

                 75 |        796        0.60       95.28

                 76 |        704        0.53       95.81

                 77 |        646        0.48       96.29

                 78 |        687        0.51       96.80

                 79 |        602        0.45       97.25

                 80 |        514        0.38       97.64

                 81 |        476        0.36       97.99

                 82 |        425        0.32       98.31

                 83 |        427        0.32       98.63

                 84 |        325        0.24       98.87

                 85 |        306        0.23       99.10

                 86 |        248        0.19       99.29

                 87 |        209        0.16       99.44

                 88 |        172        0.13       99.57

                 89 |        155        0.12       99.69

90 (90+, 1988-2002) |        416        0.31      100.00

--------------------+-----------------------------------

              Total |    133,710      100.00

 

* If you tabulate age, you find that 90 is the highest category. The CPS topcodes age to help protect the identity of old age outliers. They topcode income also. You can find out about the topcodes and other relevant information at ipums.org.

 

. summarize incwage

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |    103226    19462.59    28843.38          0     364302

 

 

 

. summarize race

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

        race |    133710    132.4183    105.8387        100        650

 

* Income is a variable that it makes sense to summarize, because the mean of income means something. Race is a categorical variable stored as a number, so you *can* take the average of race, but you should not because the results don't mean anything. Incwage has units (1999 US dollars), race does not.

 

. *don't do this!

 

* Tabulate is for categorical variables, summarize is for true numeric or continuous variables. Just as you don't want to summarize the categorical variables, you don't want to tabulate the true continuous variables like incwage, because you get a different row for every value of incwage, and the table would go on for a thousand pages… Not good!

 

. tabulate incwage

 

   Wage and |

     salary |

     income |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |     35,825       34.71       34.71

          1 |          7        0.01       34.71

          5 |         15        0.01       34.73

          7 |          1        0.00       34.73

          8 |          1        0.00       34.73

         10 |          1        0.00       34.73

         12 |          2        0.00       34.73

         18 |          1        0.00       34.73

         20 |         10        0.01       34.74

         21 |          2        0.00       34.74

         28 |          2        0.00       34.75

         30 |          5        0.00       34.75

         31 |          1        0.00       34.75

         34 |          4        0.00       34.76

         35 |          5        0.00       34.76

         36 |          1        0.00       34.76

         40 |          8        0.01       34.77

         44 |          1        0.00       34.77

         45 |          4        0.00       34.77

         46 |          3        0.00       34.78

         47 |          1        0.00       34.78

         50 |         19        0.02       34.80

         52 |          3        0.00       34.80

         53 |          1        0.00       34.80

         55 |          1        0.00       34.80

         56 |          1        0.00       34.80

--Break--

r(1);

 

. *don't do this (tabulate incwage) either!

 

. summarize incwage, detail

 

                   Wage and salary income

-------------------------------------------------------------

      Percentiles      Smallest

 1%            0              0

 5%            0              0

10%            0              0       Obs              103226

25%            0              0       Sum of Wgt.      103226

 

50%        10000                      Mean           19462.59

                        Largest       Std. Dev.      28843.38

75%        30000         362302

90%        50000         362302       Variance       8.32e+08

95%        66500         362302       Skewness       3.583439

99%       125000         364302       Kurtosis       24.50639

 

* But when we are interested in average income, we are usually just interested in the people who have income.

 

. summarize incwage if incwage>0 [fweight=perwt_rounded], detail

 

                   Wage and salary income

-------------------------------------------------------------

      Percentiles      Smallest

 1%          300              1

 5%         1548              1

10%         3500              1       Obs           140107244

25%        11000              1       Sum of Wgt.   140107244

 

50%        23841                      Mean           30524.67

                        Largest       Std. Dev.      31676.73

75%        40000         362302

90%        60647         362302       Variance       1.00e+09

95%        80000         362302       Skewness       3.336273

99%       197387         364302       Kurtosis       20.47819

 

* And we might want to limit ourselves to people with positive incomes in the age groups wherein people actually work for a living..

 

. summarize incwage if incwage>0 & age>25 & age<65 [fweight=perwt_rounded], detail

 

                   Wage and salary income

-------------------------------------------------------------

      Percentiles      Smallest

 1%          650              1

 5%         4000              1

10%         8000              1       Obs           107670623

25%        16000              1       Sum of Wgt.   107670623

 

50%        28711                      Mean           35756.95

                        Largest       Std. Dev.      33031.75

75%        45000         362302

90%        68000         362302       Variance       1.09e+09

95%        87468         362302       Skewness       3.234811

99%       229339         364302       Kurtosis       18.86115

 

* And we might want to see how men's income and women's income is different, which we accomplish in two steps. First, we sort by the variable or variables in question, then we summarize by those variables.

 

. sort sex

 

 

. by sex: summarize incwage if incwage>0 & age>25 & age<64

 

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |     26909    42874.63    37494.11          1     364302

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |     25030     25901.4    22719.26          1     333564

 

* $43K compared to $26K seems like a big difference to me.

 

 

. summarize  perwt_rounded

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

perwt_roun~d |    133710    2049.868    1083.244         93      14281

 

* The average person weight in the dataset is about 2000, because the CPS is a 1-in-2000 survey, which means 1 out of every 2000 persons in the US was surveyed. The weights are the inverse of the sampling frequency.

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s

> oc_meth_proj3\2011_logs\class1.log

  log type:  text

 closed on:  25 Jan 2011, 15:27:00

---------------------------------------------------------------------------------------