-----------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class2.log

  log type:  text

 opened on:  15 Jan 2013, 13:40:17

 

*always start with a log!

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

 

. update all

(contacting http://www.stata.com)

ado-files already up to date

 

(contacting http://www.stata.com)

executable already up to date

 

(contacting http://www.stata.com)

utilities already up to date

 

* You should run update all at some point.

 

 

. *class starts here

 

. tabulate race

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |    113,475       84.87       84.87

                          Black/Negro |     13,626       10.19       95.06

         American Indian/Aleut/Eskimo |      1,894        1.42       96.47

            Asian or Pacific Islander |      4,715        3.53      100.00

--------------------------------------+-----------------------------------

                                Total |    133,710      100.00

 

. tabulate hispan

 

              Hispanic origin |      Freq.     Percent        Cum.

------------------------------+-----------------------------------

                 Not Hispanic |    108,641       81.25       81.25

             Mexican American |      6,447        4.82       86.07

              Chicano/Chicana |        384        0.29       86.36

           Mexican (Mexicano) |      8,155        6.10       92.46

                 Puerto Rican |      2,280        1.71       94.16

                        Cuban |        943        0.71       94.87

                Other Spanish |      1,863        1.39       96.26

       Central/South American |      3,487        2.61       98.87

                  Do not know |        471        0.35       99.22

N/A (and no response 1985-87) |      1,039        0.78      100.00

------------------------------+-----------------------------------

                        Total |    133,710      100.00

 

. tabulate hispan race

 

                      |                    Race

      Hispanic origin |     White  Black/Neg  American   Asian or  |     Total

----------------------+--------------------------------------------+----------

         Not Hispanic |    89,551     12,885      1,646      4,559 |   108,641

     Mexican American |     6,337         29         73          8 |     6,447

      Chicano/Chicana |       360          0         17          7 |       384

   Mexican (Mexicano) |     7,970         55        109         21 |     8,155

         Puerto Rican |     2,057        169         19         35 |     2,280

                Cuban |       905         34          0          4 |       943

        Other Spanish |     1,652        171         15         25 |     1,863

Central/South America |     3,206        238         12         31 |     3,487

          Do not know |       461          2          0          8 |       471

N/A (and no response  |       976         43          3         17 |     1,039

----------------------+--------------------------------------------+----------

                Total |   113,475     13,626      1,894      4,715 |   133,710

 

* cross tabulation is useful. Note how white the Hispanics are..

 

 

. tabulate hispan race [fweight=perwt_rounded], row col

 

+-------------------+

| Key               |

|-------------------|

|     frequency     |

|  row percentage   |

| column percentage |

+-------------------+

 

                      |                    Race

      Hispanic origin |     White  Black/Neg  American   Asian or  |     Total

----------------------+--------------------------------------------+----------

         Not Hispanic | 190202767 34,311,878  2,468,188 10,660,220 | 237643053

                      |     80.04      14.44       1.04       4.49 |    100.00

                      |     84.61      96.63      86.68      97.58 |     86.70

----------------------+--------------------------------------------+----------

     Mexican American | 9,352,842     54,490    128,576     13,017 | 9,548,925

                      |     97.95       0.57       1.35       0.14 |    100.00

                      |      4.16       0.15       4.52       0.12 |      3.48

----------------------+--------------------------------------------+----------

      Chicano/Chicana |   460,968          0     12,971     15,150 |   489,089

                      |     94.25       0.00       2.65       3.10 |    100.00

                      |      0.21       0.00       0.46       0.14 |      0.18

----------------------+--------------------------------------------+----------

   Mexican (Mexicano) |11,375,001     93,507    168,902     25,762 |11,663,172

                      |     97.53       0.80       1.45       0.22 |    100.00

                      |      5.06       0.26       5.93       0.24 |      4.26

----------------------+--------------------------------------------+----------

         Puerto Rican | 2,632,291    256,312     23,363     47,387 | 2,959,353

                      |     88.95       8.66       0.79       1.60 |    100.00

                      |      1.17       0.72       0.82       0.43 |      1.08

----------------------+--------------------------------------------+----------

                Cuban | 1,246,246     46,644          0      7,068 | 1,299,958

                      |     95.87       3.59       0.00       0.54 |    100.00

                      |      0.55       0.13       0.00       0.06 |      0.47

----------------------+--------------------------------------------+----------

        Other Spanish | 1,785,411    252,617     18,833     43,874 | 2,100,735

                      |     84.99      12.03       0.90       2.09 |    100.00

                      |      0.79       0.71       0.66       0.40 |      0.77

----------------------+--------------------------------------------+----------

Central/South America | 4,320,291    363,675     22,379     36,440 | 4,742,785

                      |     91.09       7.67       0.47       0.77 |    100.00

                      |      1.92       1.02       0.79       0.33 |      1.73

----------------------+--------------------------------------------+----------

          Do not know | 1,140,433      9,240          0     19,611 | 1,169,284

                      |     97.53       0.79       0.00       1.68 |    100.00

                      |      0.51       0.03       0.00       0.18 |      0.43

----------------------+--------------------------------------------+----------

N/A (and no response  | 2,290,702    120,305      4,261     56,199 | 2,471,467

                      |     92.69       4.87       0.17       2.27 |    100.00

                      |      1.02       0.34       0.15       0.51 |      0.90

----------------------+--------------------------------------------+----------

                Total | 224806952 35,508,668  2,847,473 10,924,728 | 274087821

                      |     82.02      12.96       1.04       3.99 |    100.00

                      |    100.00     100.00     100.00     100.00 |    100.00

 

* If we want to know what percentage of blacks in the US are Puerto Rican, or what percentage of Cubans are white, we use the fweights and we ask tabulate to give us row and column percentages.

 

 

. codebook race

 

-------------------------------------------------------------------------------------

race                                                                             Race

-------------------------------------------------------------------------------------

 

                  type:  numeric (int)

                 label:  racelbl

 

                 range:  [100,650]                    units:  10

         unique values:  4                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                        1.1e+05      100  White

                         13626       200  Black/Negro

                          1894       300  American Indian/Aleut/Eskimo

                          4715       650  Asian or Pacific Islander

 

. codebook hispan

 

-------------------------------------------------------------------------------------

hispan                                                                Hispanic origin

-------------------------------------------------------------------------------------

 

                  type:  numeric (int)

                 label:  hispanlbl

 

                 range:  [0,902]                      units:  1

         unique values:  10                       missing .:  0/133710

 

              examples:  0     Not Hispanic

                         0     Not Hispanic

                         0     Not Hispanic

                         0     Not Hispanic

 

* to get codebook to show all the values of hispan, we need to specify a tab(#) with # greater than 10.

 

. codebook hispan, tab(20)

 

-------------------------------------------------------------------------------------

hispan                                                                Hispanic origin

-------------------------------------------------------------------------------------

 

                  type:  numeric (int)

                 label:  hispanlbl

 

                 range:  [0,902]                      units:  1

         unique values:  10                       missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                        1.1e+05        0  Not Hispanic

                          6447       102  Mexican American

                           384       104  Chicano/Chicana

                          8155       108  Mexican (Mexicano)

                          2280       200  Puerto Rican

                           943       300  Cuban

                          1863       400  Other Spanish

                          3487       410  Central/South American

                           471       901  Do not know

                          1039       902  N/A (and no response 1985-87)

 

. codebook race

 

-------------------------------------------------------------------------------------

race                                                                             Race

-------------------------------------------------------------------------------------

 

                  type:  numeric (int)

                 label:  racelbl

 

                 range:  [100,650]                    units:  10

         unique values:  4                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                        1.1e+05      100  White

                         13626       200  Black/Negro

                          1894       300  American Indian/Aleut/Eskimo

                          4715       650  Asian or Pacific Islander

 

* Now we are going to create a new variable that takes account of both race and hispanicity, which is usually how the government likes to tabulate things.

 

. generate race_hisp=1 if race==100

(20235 missing values generated)

 

. replace race_hisp=2 if race==200

(13626 real changes made)

 

. replace race_hisp=3 if race==300

(1894 real changes made)

 

. replace race_hisp=4 if race==650

(4715 real changes made)

 

. replace race_hisp=5 if hisp>100 & hisp<500

(23559 real changes made)

 

* one student pointed out that STATA seems to understand “hisp” as short for “hispan.” If there were more than 1 variable that started with “hisp,” STATA would complain.

 

 

. label define race_hisp_lbl 1 "NH white" 2 "NH black" 3 "NH Native Amer Indian" 4 " NH Asian" 5 "Hispanic"

 

* first, I define a label that associates text with numbers. STATA does not know yet what variable this label is for.

 

. label values  race_hisp race_hisp_lbl

 

* Now I associate the label as a value label with my new variable “race_hisp”

 

. label var race_hisp "race and ethnicity"

 

* I also attach a different kind of label to the variable itself- this label just identifies the variable.

 

. tabulate  race_hisp

 

   race and ethnicity |      Freq.     Percent        Cum.

----------------------+-----------------------------------

             NH white |     90,988       68.05       68.05

             NH black |     12,930        9.67       77.72

NH Native Amer Indian |      1,649        1.23       78.95

             NH Asian |      4,584        3.43       82.38

             Hispanic |     23,559       17.62      100.00

----------------------+-----------------------------------

                Total |    133,710      100.00

 

* Note the variable label at the top of the table, and the value labels in the rows.

 

* without the value labels, the variable race_hisp would look like this below, not as informative.

 

. tabulate  race_hisp, nolab

 

   race and |

  ethnicity |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |     90,988       68.05       68.05

          2 |     12,930        9.67       77.72

          3 |      1,649        1.23       78.95

          4 |      4,584        3.43       82.38

          5 |     23,559       17.62      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

* Also note that, because we have added a new variable, it is time to consider saving the dataset. You don’t need to worry about saving your log- once you open the log stata takes care of saving the log as you go. But if you add variables to the dataset, you should save the dataset if you want to have the new variables available to you the next time you start a STATA session.

 

. rename hispan hispanic

 

* you could, if you want, also rename variables (but I don’t suggest doing this with the CPS download, because I will be referring to those variables by name in my logs).

 

 

. tabulate hispanic race_hisp

 

                      |             race and ethnicity

      Hispanic origin |  NH white   NH black  NH Native   NH Asian |     Total

----------------------+--------------------------------------------+----------

         Not Hispanic |    89,551     12,885      1,646      4,559 |   108,641

     Mexican American |         0          0          0          0 |     6,447

      Chicano/Chicana |         0          0          0          0 |       384

   Mexican (Mexicano) |         0          0          0          0 |     8,155

         Puerto Rican |         0          0          0          0 |     2,280

                Cuban |         0          0          0          0 |       943

        Other Spanish |         0          0          0          0 |     1,863

Central/South America |         0          0          0          0 |     3,487

          Do not know |       461          2          0          8 |       471

N/A (and no response  |       976         43          3         17 |     1,039

----------------------+--------------------------------------------+----------

                Total |    90,988     12,930      1,649      4,584 |   133,710

 

 

                      |  race and

                      | ethnicity

      Hispanic origin |  Hispanic |     Total

----------------------+-----------+----------

         Not Hispanic |         0 |   108,641

     Mexican American |     6,447 |     6,447

      Chicano/Chicana |       384 |       384

   Mexican (Mexicano) |     8,155 |     8,155

         Puerto Rican |     2,280 |     2,280

                Cuban |       943 |       943

        Other Spanish |     1,863 |     1,863

Central/South America |     3,487 |     3,487

          Do not know |         0 |       471

N/A (and no response  |         0 |     1,039

----------------------+-----------+----------

                Total |    23,559 |   133,710

 

 

. tabulate  race_hisp race

 

                      |                    Race

   race and ethnicity |     White  Black/Neg  American   Asian or  |     Total

----------------------+--------------------------------------------+----------

             NH white |    90,988          0          0          0 |    90,988

             NH black |         0     12,930          0          0 |    12,930

NH Native Amer Indian |         0          0      1,649          0 |     1,649

             NH Asian |         0          0          0      4,584 |     4,584

             Hispanic |    22,487        696        245        131 |    23,559

----------------------+--------------------------------------------+----------

                Total |   113,475     13,626      1,894      4,715 |   133,710

 

 

* When you make a new variable, it is always good to cross-tabulate that new variable with the old variables, to make sure everything works as intended.

 

 

. sort sex

 

. by sex: summarize yrsed if age>=25 & age<=34

 

-------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9027    13.31212    2.967666          0         17

 

-------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9511    13.55657    2.854472          0         17

 

 

. table sex if age>=25 & age<=34, contents(freq mean yrsed sd yrsed min yrsed max yrsed)

 

---------------------------------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)   min(yrsed)   max(yrsed)

----------+----------------------------------------------------------------

     Male |       9,027     13.31212     2.967666            0           17

   Female |       9,511     13.55657     2.854472            0           17

---------------------------------------------------------------------------

 

* The above are two different syntaxes for comparing the educational attainment (yrsed, in years) of men and women of a certain age. I asked the question: how sure are we that the women in the CPS in this age group have higher average education than the men, and the answer is 100% sure. Within the CPS dataset itself, there is no statistical uncertainty. The more usual question is whether the small difference between women’s and men’s average educations (0.24 years) allows us to be sure that women in the US in this age group have more education than men in this same age group. Or, more formally, do we accept or reject the hypothesis that women and men’s average education is the same? We run a t-test for this.

 

 

. ttest yrsed if age>=25 & age <=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* The t-test results in a t-statistic of -5.7, which has a very small associated probability, which STATA reports here as 0, but which is more accurately about 0.0000000105. We will talk more in the class in the future about T tests. For now, all I will say is that the data allow us to reject the null hypothesis that men and women in the US in this age group have the same average educational attainment.

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win20

> 13\class2.log

  log type:  text

 closed on:  15 Jan 2013, 15:49:00

-------------------------------------------------------------------------------------