-----------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class2.log
log type: text
opened on: 15 Jan 2013, 13:40:17
*always start with a log!
. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear
. update all
(contacting http://www.stata.com)
ado-files already up to date
(contacting http://www.stata.com)
executable already up to date
(contacting http://www.stata.com)
utilities already up to date
* You should run update all at some point.
. *class starts here
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate hispan
Hispanic origin | Freq. Percent Cum.
------------------------------+-----------------------------------
Not Hispanic | 108,641 81.25 81.25
Mexican American | 6,447 4.82 86.07
Chicano/Chicana | 384 0.29 86.36
Mexican (Mexicano) | 8,155 6.10 92.46
Puerto Rican | 2,280 1.71 94.16
Cuban | 943 0.71 94.87
Other Spanish | 1,863 1.39 96.26
Central/South American | 3,487 2.61 98.87
Do not know | 471 0.35 99.22
N/A (and no response 1985-87) | 1,039 0.78 100.00
------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate hispan race
| Race
Hispanic origin | White Black/Neg American Asian or | Total
----------------------+--------------------------------------------+----------
Not Hispanic | 89,551 12,885 1,646 4,559 | 108,641
Mexican American | 6,337 29 73 8 | 6,447
Chicano/Chicana | 360 0 17 7 | 384
Mexican (Mexicano) | 7,970 55 109 21 | 8,155
Puerto Rican | 2,057 169 19 35 | 2,280
Cuban | 905 34 0 4 | 943
Other Spanish | 1,652 171 15 25 | 1,863
Central/South America | 3,206 238 12 31 | 3,487
Do not know | 461 2 0 8 | 471
N/A (and no response | 976 43 3 17 | 1,039
----------------------+--------------------------------------------+----------
Total | 113,475 13,626 1,894 4,715 | 133,710
* cross tabulation is useful. Note how white the Hispanics are..
. tabulate hispan race [fweight=perwt_rounded], row col
+-------------------+
| Key |
|-------------------|
| frequency |
| row percentage |
| column percentage |
+-------------------+
| Race
Hispanic origin | White Black/Neg American Asian or | Total
----------------------+--------------------------------------------+----------
Not Hispanic | 190202767 34,311,878 2,468,188 10,660,220 | 237643053
| 80.04 14.44 1.04 4.49 | 100.00
| 84.61 96.63 86.68 97.58 | 86.70
----------------------+--------------------------------------------+----------
Mexican American | 9,352,842 54,490 128,576 13,017 | 9,548,925
| 97.95 0.57 1.35 0.14 | 100.00
| 4.16 0.15 4.52 0.12 | 3.48
----------------------+--------------------------------------------+----------
Chicano/Chicana | 460,968 0 12,971 15,150 | 489,089
| 94.25 0.00 2.65 3.10 | 100.00
| 0.21 0.00 0.46 0.14 | 0.18
----------------------+--------------------------------------------+----------
Mexican (Mexicano) |11,375,001 93,507 168,902 25,762 |11,663,172
| 97.53 0.80 1.45 0.22 | 100.00
| 5.06 0.26 5.93 0.24 | 4.26
----------------------+--------------------------------------------+----------
Puerto Rican | 2,632,291 256,312 23,363 47,387 | 2,959,353
| 88.95 8.66 0.79 1.60 | 100.00
| 1.17 0.72 0.82 0.43 | 1.08
----------------------+--------------------------------------------+----------
Cuban | 1,246,246 46,644 0 7,068 | 1,299,958
| 95.87 3.59 0.00 0.54 | 100.00
| 0.55 0.13 0.00 0.06 | 0.47
----------------------+--------------------------------------------+----------
Other Spanish | 1,785,411 252,617 18,833 43,874 | 2,100,735
| 84.99 12.03 0.90 2.09 | 100.00
| 0.79 0.71 0.66 0.40 | 0.77
----------------------+--------------------------------------------+----------
Central/South America | 4,320,291 363,675 22,379 36,440 | 4,742,785
| 91.09 7.67 0.47 0.77 | 100.00
| 1.92 1.02 0.79 0.33 | 1.73
----------------------+--------------------------------------------+----------
Do not know | 1,140,433 9,240 0 19,611 | 1,169,284
| 97.53 0.79 0.00 1.68 | 100.00
| 0.51 0.03 0.00 0.18 | 0.43
----------------------+--------------------------------------------+----------
N/A (and no response | 2,290,702 120,305 4,261 56,199 | 2,471,467
| 92.69 4.87 0.17 2.27 | 100.00
| 1.02 0.34 0.15 0.51 | 0.90
----------------------+--------------------------------------------+----------
Total | 224806952 35,508,668 2,847,473 10,924,728 | 274087821
| 82.02 12.96 1.04 3.99 | 100.00
| 100.00 100.00 100.00 100.00 | 100.00
* If we want to know what percentage of blacks in the US are Puerto Rican, or what percentage of Cubans are white, we use the fweights and we ask tabulate to give us row and column percentages.
. codebook race
-------------------------------------------------------------------------------------
race Race
-------------------------------------------------------------------------------------
type: numeric (int)
label: racelbl
range: [100,650] units: 10
unique values: 4 missing .: 0/133710
tabulation: Freq. Numeric Label
1.1e+05 100 White
13626 200 Black/Negro
1894 300 American Indian/Aleut/Eskimo
4715 650 Asian or Pacific Islander
. codebook hispan
-------------------------------------------------------------------------------------
hispan Hispanic origin
-------------------------------------------------------------------------------------
type: numeric (int)
label: hispanlbl
range: [0,902] units: 1
unique values: 10 missing .: 0/133710
examples: 0 Not Hispanic
0 Not Hispanic
0 Not Hispanic
0 Not Hispanic
* to get codebook to show all the values of hispan, we need to specify a tab(#) with # greater than 10.
. codebook hispan, tab(20)
-------------------------------------------------------------------------------------
hispan Hispanic origin
-------------------------------------------------------------------------------------
type: numeric (int)
label: hispanlbl
range: [0,902] units: 1
unique values: 10 missing .: 0/133710
tabulation: Freq. Numeric Label
1.1e+05 0 Not Hispanic
6447 102 Mexican American
384 104 Chicano/Chicana
8155 108 Mexican (Mexicano)
2280 200 Puerto Rican
943 300 Cuban
1863 400 Other Spanish
3487 410 Central/South American
471 901 Do not know
1039 902 N/A (and no response 1985-87)
. codebook race
-------------------------------------------------------------------------------------
race Race
-------------------------------------------------------------------------------------
type: numeric (int)
label: racelbl
range: [100,650] units: 10
unique values: 4 missing .: 0/133710
tabulation: Freq. Numeric Label
1.1e+05 100 White
13626 200 Black/Negro
1894 300 American Indian/Aleut/Eskimo
4715 650 Asian or Pacific Islander
* Now we are going to create a new variable that takes account of both race and hispanicity, which is usually how the government likes to tabulate things.
. generate race_hisp=1 if race==100
(20235 missing values generated)
. replace race_hisp=2 if race==200
(13626 real changes made)
. replace race_hisp=3 if race==300
(1894 real changes made)
. replace race_hisp=4 if race==650
(4715 real changes made)
. replace race_hisp=5 if hisp>100 & hisp<500
(23559 real changes made)
* one student pointed out that STATA seems to understand “hisp” as short for “hispan.” If there were more than 1 variable that started with “hisp,” STATA would complain.
. label define race_hisp_lbl 1 "NH white" 2 "NH black" 3 "NH Native Amer Indian" 4 " NH Asian" 5 "Hispanic"
* first, I define a label that associates text with numbers. STATA does not know yet what variable this label is for.
. label values race_hisp race_hisp_lbl
* Now I associate the label as a value label with my new variable “race_hisp”
. label var race_hisp "race and ethnicity"
* I also attach a different kind of label to the variable itself- this label just identifies the variable.
. tabulate race_hisp
race and ethnicity | Freq. Percent Cum.
----------------------+-----------------------------------
NH white | 90,988 68.05 68.05
NH black | 12,930 9.67 77.72
NH Native Amer Indian | 1,649 1.23 78.95
NH Asian | 4,584 3.43 82.38
Hispanic | 23,559 17.62 100.00
----------------------+-----------------------------------
Total | 133,710 100.00
* Note the variable label at the top of the table, and the value labels in the rows.
* without the value labels, the variable race_hisp would look like this below, not as informative.
. tabulate race_hisp, nolab
race and |
ethnicity | Freq. Percent Cum.
------------+-----------------------------------
1 | 90,988 68.05 68.05
2 | 12,930 9.67 77.72
3 | 1,649 1.23 78.95
4 | 4,584 3.43 82.38
5 | 23,559 17.62 100.00
------------+-----------------------------------
Total | 133,710 100.00
* Also note that, because we have added a new variable, it is time to consider saving the dataset. You don’t need to worry about saving your log- once you open the log stata takes care of saving the log as you go. But if you add variables to the dataset, you should save the dataset if you want to have the new variables available to you the next time you start a STATA session.
. rename hispan hispanic
* you could, if you want, also rename variables (but I don’t suggest doing this with the CPS download, because I will be referring to those variables by name in my logs).
. tabulate hispanic race_hisp
| race and ethnicity
Hispanic origin | NH white NH black NH Native NH Asian | Total
----------------------+--------------------------------------------+----------
Not Hispanic | 89,551 12,885 1,646 4,559 | 108,641
Mexican American | 0 0 0 0 | 6,447
Chicano/Chicana | 0 0 0 0 | 384
Mexican (Mexicano) | 0 0 0 0 | 8,155
Puerto Rican | 0 0 0 0 | 2,280
Cuban | 0 0 0 0 | 943
Other Spanish | 0 0 0 0 | 1,863
Central/South America | 0 0 0 0 | 3,487
Do not know | 461 2 0 8 | 471
N/A (and no response | 976 43 3 17 | 1,039
----------------------+--------------------------------------------+----------
Total | 90,988 12,930 1,649 4,584 | 133,710
| race and
| ethnicity
Hispanic origin | Hispanic | Total
----------------------+-----------+----------
Not Hispanic | 0 | 108,641
Mexican American | 6,447 | 6,447
Chicano/Chicana | 384 | 384
Mexican (Mexicano) | 8,155 | 8,155
Puerto Rican | 2,280 | 2,280
Cuban | 943 | 943
Other Spanish | 1,863 | 1,863
Central/South America | 3,487 | 3,487
Do not know | 0 | 471
N/A (and no response | 0 | 1,039
----------------------+-----------+----------
Total | 23,559 | 133,710
. tabulate race_hisp race
| Race
race and ethnicity | White Black/Neg American Asian or | Total
----------------------+--------------------------------------------+----------
NH white | 90,988 0 0 0 | 90,988
NH black | 0 12,930 0 0 | 12,930
NH Native Amer Indian | 0 0 1,649 0 | 1,649
NH Asian | 0 0 0 4,584 | 4,584
Hispanic | 22,487 696 245 131 | 23,559
----------------------+--------------------------------------------+----------
Total | 113,475 13,626 1,894 4,715 | 133,710
* When you make a new variable, it is always good to cross-tabulate that new variable with the old variables, to make sure everything works as intended.
. sort sex
. by sex: summarize yrsed if age>=25 & age<=34
-------------------------------------------------------------------------------------
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9027 13.31212 2.967666 0 17
-------------------------------------------------------------------------------------
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9511 13.55657 2.854472 0 17
. table sex if age>=25 & age<=34, contents(freq mean yrsed sd yrsed min yrsed max yrsed)
---------------------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) min(yrsed) max(yrsed)
----------+----------------------------------------------------------------
Male | 9,027 13.31212 2.967666 0 17
Female | 9,511 13.55657 2.854472 0 17
---------------------------------------------------------------------------
* The above are two different syntaxes for comparing the educational attainment (yrsed, in years) of men and women of a certain age. I asked the question: how sure are we that the women in the CPS in this age group have higher average education than the men, and the answer is 100% sure. Within the CPS dataset itself, there is no statistical uncertainty. The more usual question is whether the small difference between women’s and men’s average educations (0.24 years) allows us to be sure that women in the US in this age group have more education than men in this same age group. Or, more formally, do we accept or reject the hypothesis that women and men’s average education is the same? We run a t-test for this.
. ttest yrsed if age>=25 & age <=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* The t-test results in a t-statistic of -5.7, which has a very small associated probability, which STATA reports here as 0, but which is more accurately about 0.0000000105. We will talk more in the class in the future about T tests. For now, all I will say is that the data allow us to reject the null hypothesis that men and women in the US in this age group have the same average educational attainment.
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win20
> 13\class2.log
log type: text
closed on: 15 Jan 2013, 15:49:00
-------------------------------------------------------------------------------------