Evaluation of Evidence, Quantitative Analysis: Analyzing the Current Population Survey, Assignment 1

This assignment is designed to make you reasonably proficient in some of the skills of basic data analysis with STATA, and some comfort with the CPS and the ipums website.  This assignment will be worth 8 points.

Include:

* Who, if anyone, you worked with.

For all of the following questions, please show some work from your STATA log file, so we can tell how you arrived at your answer.  Remember to use weights in order to get real numbers for the whole US population. All the questions here apply *only* to the 2000 current population survey, so either use the data which is exclusively the 2000 CPS, or use the multiyear data and specify year==2000.

1) How many immigrants were there in the US at the time of the survey (use the variable citizen).  What does the ipums documentation have to say about the first category, “NIU.” How do you interpret this? [Note that by “in the US” I mean “in the non-institutional population of the US,” which is the survey frame of the CPS, as of March 2000]

2) Create a new variable which breaks the population into 2 categories: US born and immigrant. Then attach value labels to the two categories. What percentage of immigrants in the US were under age 10?  What percentage of immigrants were over age 75?  How does this compare to the percentages for US Natives?  Note:  the percentage of immigrants that were under age 10 is NOT the same as the percentage of people under age 10 who are immigrants. Think about numerators and denominators.

3) What is the mean educational attainment (use the variable yrsed, a variable I added to the dataset) for immigrants and for natives?  What if we examine only persons age 30 to 49?  Does educational attainment for adults differ by gender and immigrant status?  How much is this difference, in years of formal education?

4) How many veterans are there in the U.S.? (use variable vetlast).  Do veterans earn more money than non- veterans?  Consider the fact that veterans tend to be male, and they tend to be older.  Figure out what the average age is for veterans of each of the past wars (WWII, Korea, Vietnam), and then compare 1999 earnings (variable inctot) for male veterans' from each war to male non- veterans of approximately the same age.  Now how do veterans compare?  This is a rough way of comparing the earnings of veterans to non- veterans by 'controlling for gender and age'.  How many veterans from each war were interviewed in the CPS? (i.e. look at the unweighted data)  Is this a large enough sample of interviews to justify the comparison, in your opinion?

5) What is the average personal income (inctot) for men and women in the 2000 CPS? Register yourself for ipums- CPS. Make a data extract including at least the variables inctot, sex, age, wtsupp. Download the March, CPS data from 1995, read the data into Stata, and compare the 1995 income results to the 2000 results (if you are going to use wtsupp as a frequency weight, you will need to round the weight into Integer form, using a command like gen wtsupp_rounded=round(wtsupp) ). [Note: for Soc 381, you need to download the data as a fixed-width text file rather than as a Stata file directly, and use the Stata command file and data dictionary to read the data in from the fixed-width text file; your HW1 stata log should reflect that you have done this]. What do you find when comparing income from 1995 and 2000? Check ipums to find out what the missing and Not In Universe values are. What do you think the appropriate way to deal with those values is (I have already done this for the 2000 data). How would the missing and Not-In-Universe values skew the results?