Some housekeeping I have done with the March 200 CPS file, which you may also want to do to your own CPS extractions:

* We want to use the CPS variable perwt to generate weighted counts for the US population. Unfortunately, the weights come with two decimals and there are a few negative values. So I created a new weight, rounded to the nearest integer, and with negative values recoded to zero.

tabulate race if year==2000 [fweight=perwt]

may not use noninteger frequency weights

r(401);

. summarize perwt

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

perwt |    896445    1522.901    838.8933   -3033.03   14280.64

. gen perwt_rounded=round( perwt)

. replace  perwt_rounded=0 if  perwt<0

. label var perwt_rounded "integer perwt, negative values recoded to 0"

* Some of the income variables had high values which coded for “missing” rather than real incomes (see the ipums documentation). I replaced these missing values with a missing value code Stata understands, the period.

summarize  inctot if year==2000

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

inctot |    133710    248066.9    409591.8     -24998     999999

. summarize  inctot if year==2000, detail

Total personal income

-------------------------------------------------------------

Percentiles      Smallest

1%            0         -24998

5%            0         -18582

10%         1000         -13300       Obs              133710

25%         9440         -12949       Sum of Wgt.      133710

50%      26097.5                      Mean           248066.9

Largest       Std. Dev.      409591.8

75%       100000         999999

90%       999999         999999       Variance       1.68e+11

95%       999999         999999       Skewness       1.281213

99%       999999         999999       Kurtosis       2.662364

. replace inctot=. if inctot==999999

(219626 real changes made, 219626 to missing)

. replace inctot=. if inctot==999998

(53 real changes made, 53 to missing)

. summarize  inctot if year==2000

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

inctot |    103226     26011.4    32061.48     -24998     425510

* The income variables come with value labels, but continuous variables should not have value labels and sometimes the empty value labels will confuse graphing functions. So set the value labels for those variables to missing (use the period):

describe incwage

storage  display     value

variable name   type   format      label      variable label

-----------------------------------------------------------------------------------------

incwage         long   %12.0g      incwagelbl

Wage and salary income

. label val incwage .

. describe incwage

storage  display     value

variable name   type   format      label      variable label

-----------------------------------------------------------------------------------------

incwage         long   %12.0g                 Wage and salary income

. describe inctot

storage  display     value

variable name   type   format      label      variable label

-----------------------------------------------------------------------------------------

inctot          long   %12.0g      inctotlbl

Total personal income

. label val inctot .

. describe incss

storage  display     value

variable name   type   format      label      variable label

-----------------------------------------------------------------------------------------

incss           long   %12.0g      incsslbl   Social Security income

. label val incss .

. label val incwelfr .