----------------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3

> \2011_180B_logs\class2.log

  log type:  text

 opened on:  27 Jan 2011, 12:02:18

 

* Don't forget to start by opening a log!

 

*First, I went through the process of opening and reading in a CPS file that one has downloaded from ipums. First you put the compressed downloaded file in its own folder, then uncompress the contents to that same folder. Then set the default directory in Stata to that folder, using the cd command.

. cd "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps"

C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps

 

* Then go to the File menu in Stata, pick the do function, and select the do-file in your special directory.

. do "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\cps_00008.do"

 

. /* Important: you need to put the .dat and .do files in one folder/

>    directory and then set the working folder to that folder. */

.

. set more off

 

.

. clear

 

. infix ///

>  int     year                                 1-4 ///

>  byte    age                                  5-6 ///

>  byte    sex                                  7 ///

>  using cps_00008.dat

(210648 observations read)

 

.

. label var year `"Survey year"'

 

. label var age `"Age"'

 

. label var sex `"Sex"'

 

.

. label define agelbl 00 `"Under 1 year"'

 

. label define agelbl 01 `"1"', add

 

. label define agelbl 02 `"2"', add

 

. label define agelbl 03 `"3"', add

 

. label define agelbl 04 `"4"', add

 

. label define agelbl 05 `"5"', add

 

. label define agelbl 06 `"6"', add

 

. label define agelbl 07 `"7"', add

 

. label define agelbl 08 `"8"', add

 

. label define agelbl 09 `"9"', add

 

. label define agelbl 10 `"10"', add

 

. label define agelbl 11 `"11"', add

 

. label define agelbl 12 `"12"', add

 

. label define agelbl 13 `"13"', add

 

. label define agelbl 14 `"14"', add

 

. label define agelbl 15 `"15"', add

 

. label define agelbl 16 `"16"', add

 

. label define agelbl 17 `"17"', add

 

. label define agelbl 18 `"18"', add

 

. label define agelbl 19 `"19"', add

 

. label define agelbl 20 `"20"', add

 

. label define agelbl 21 `"21"', add

 

. label define agelbl 22 `"22"', add

 

. label define agelbl 23 `"23"', add

 

. label define agelbl 24 `"24"', add

 

. label define agelbl 25 `"25"', add

 

. label define agelbl 26 `"26"', add

 

. label define agelbl 27 `"27"', add

 

. label define agelbl 28 `"28"', add

 

. label define agelbl 29 `"29"', add

 

. label define agelbl 30 `"30"', add

 

. label define agelbl 31 `"31"', add

 

. label define agelbl 32 `"32"', add

 

. label define agelbl 33 `"33"', add

 

. label define agelbl 34 `"34"', add

 

. label define agelbl 35 `"35"', add

 

. label define agelbl 36 `"36"', add

 

. label define agelbl 37 `"37"', add

 

. label define agelbl 38 `"38"', add

 

. label define agelbl 39 `"39"', add

 

. label define agelbl 40 `"40"', add

 

. label define agelbl 41 `"41"', add

 

. label define agelbl 42 `"42"', add

 

. label define agelbl 43 `"43"', add

 

. label define agelbl 44 `"44"', add

 

. label define agelbl 45 `"45"', add

 

. label define agelbl 46 `"46"', add

 

. label define agelbl 47 `"47"', add

 

. label define agelbl 48 `"48"', add

 

. label define agelbl 49 `"49"', add

 

. label define agelbl 50 `"50"', add

 

. label define agelbl 51 `"51"', add

 

. label define agelbl 52 `"52"', add

 

. label define agelbl 53 `"53"', add

 

. label define agelbl 54 `"54"', add

 

. label define agelbl 55 `"55"', add

 

. label define agelbl 56 `"56"', add

 

. label define agelbl 57 `"57"', add

 

. label define agelbl 58 `"58"', add

 

. label define agelbl 59 `"59"', add

 

. label define agelbl 60 `"60"', add

 

. label define agelbl 61 `"61"', add

 

. label define agelbl 62 `"62"', add

 

. label define agelbl 63 `"63"', add

 

. label define agelbl 64 `"64"', add

 

. label define agelbl 65 `"65"', add

 

. label define agelbl 66 `"66"', add

 

. label define agelbl 67 `"67"', add

 

. label define agelbl 68 `"68"', add

 

. label define agelbl 69 `"69"', add

 

. label define agelbl 70 `"70"', add

 

. label define agelbl 71 `"71"', add

 

. label define agelbl 72 `"72"', add

 

. label define agelbl 73 `"73"', add

 

. label define agelbl 74 `"74"', add

 

. label define agelbl 75 `"75"', add

 

. label define agelbl 76 `"76"', add

 

. label define agelbl 77 `"77"', add

 

. label define agelbl 78 `"78"', add

 

. label define agelbl 79 `"79"', add

 

. label define agelbl 80 `"80"', add

 

. label define agelbl 81 `"81"', add

 

. label define agelbl 82 `"82"', add

 

. label define agelbl 83 `"83"', add

 

. label define agelbl 84 `"84"', add

 

. label define agelbl 85 `"85"', add

 

. label define agelbl 86 `"86"', add

 

. label define agelbl 87 `"87"', add

 

. label define agelbl 88 `"88"', add

 

. label define agelbl 89 `"89"', add

 

. label define agelbl 90 `"90 (90+, 1988-2002)"', add

 

. label define agelbl 91 `"91"', add

 

. label define agelbl 92 `"92"', add

 

. label define agelbl 93 `"93"', add

 

. label define agelbl 94 `"94"', add

 

. label define agelbl 95 `"95"', add

 

. label define agelbl 96 `"96"', add

 

. label define agelbl 97 `"97"', add

 

. label define agelbl 98 `"98"', add

 

. label define agelbl 99 `"99+"', add

 

. label values age agelbl

 

.

. label define sexlbl 1 `"Male"'

 

. label define sexlbl 2 `"Female"', add

 

. label values sex sexlbl

 

.

.

end of do-file

 

* At the end of the do-file, you will see the variables in your variable window. It is time to save the dataset (easiest to do using the save command under the File menu.

 

. save "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\trial_cps.dta"

file C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\intro soc methods\trial cps\trial_cps.dta saved

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |    102,202       48.52       48.52

     Female |    108,446       51.48      100.00

------------+-----------------------------------

      Total |    210,648      100.00

 

. tabulate year

 

Survey year |      Freq.     Percent        Cum.

------------+-----------------------------------

       2005 |    210,648      100.00      100.00

------------+-----------------------------------

      Total |    210,648      100.00

 

. clear all

 

* OK, enough with the trial CPS dataset, let's get back to our regular 2000 CPS dataset, we select it using the Open command under the file menu.

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

 

. tabulate union

 

                 Union membership |      Freq.     Percent        Cum.

----------------------------------+-----------------------------------

                              NIU |    120,249       89.93       89.93

                No union coverage |     11,383        8.51       98.45

            Member of labor union |      1,883        1.41       99.85

Covered by union but not a member |        195        0.15      100.00

----------------------------------+-----------------------------------

                            Total |    133,710      100.00

 

* Union is a curious variable, because there are so many NIU, not in universe cases..
* The problem with NIU cases is that it throws the averages way off. That is, this makes it look like only 1.4% of people are in unions, but that seems too low..

 

. tabulate union, nolab

 

      Union |

 membership |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |    120,249       89.93       89.93

          1 |     11,383        8.51       98.45

          2 |      1,883        1.41       99.85

          3 |        195        0.15      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

* let's make a new union variable.

. generate union_new=union

 

. tabulate union union_new

 

* So far, the new union variable is just like the old one, but without the value labels.

                      |                  union_new

     Union membership |         0          1          2          3 |     Total

----------------------+--------------------------------------------+----------

                  NIU |   120,249          0          0          0 |   120,249

    No union coverage |         0     11,383          0          0 |    11,383

Member of labor union |         0          0      1,883          0 |     1,883

Covered by union but  |         0          0          0        195 |       195

----------------------+--------------------------------------------+----------

                Total |   120,249     11,383      1,883        195 |   133,710

 

 

. tabulate union union_new, nolab

 

     Union |                  union_new

membership |         0          1          2          3 |     Total

-----------+--------------------------------------------+----------

         0 |   120,249          0          0          0 |   120,249

         1 |         0     11,383          0          0 |    11,383

         2 |         0          0      1,883          0 |     1,883

         3 |         0          0          0        195 |       195

-----------+--------------------------------------------+----------

     Total |   120,249     11,383      1,883        195 |   133,710

 

 

* Let's add some value labels to the new union variable. First we define the new label.

. label define new_union_lbl 1 "no union coverage" 2 "union member" 3 "covered by union but not a member"

 

* Then we associate the new labels with the values of variable union_new.

. label val  union_new new_union_lbl

 

. tabulate  union_new

 

                        union_new |      Freq.     Percent        Cum.

----------------------------------+-----------------------------------

                                0 |    120,249       89.93       89.93

                no union coverage |     11,383        8.51       98.45

                     union member |      1,883        1.41       99.85

covered by union but not a member |        195        0.15      100.00

----------------------------------+-----------------------------------

                            Total |    133,710      100.00

 

. replace union_new=. if union==0

(120249 real changes made, 120249 to missing)

 

* Then we take the key step of setting all those NIU values to missing. Stata's default missing value code for numeric variables is the period. Note also the double equal sign after the if.

 

* Now when we tabulate the new union variable, the missing values are left out and we get a more sensible tabulation that shows that 14% of persons who were asked the union question were in fact in a union.

. tabulate  union_new

 

                        union_new |      Freq.     Percent        Cum.

----------------------------------+-----------------------------------

                no union coverage |     11,383       84.56       84.56

                     union member |      1,883       13.99       98.55

covered by union but not a member |        195        1.45      100.00

----------------------------------+-----------------------------------

                            Total |     13,461      100.00

 

* In truth, we didn't need to create a new variable, we could have tabulated the old variable just excluding the NIU code of zero. "~=" means "not equal to". See my intro to stata page.

 

. tabulate union if union~=0

 

                 Union membership |      Freq.     Percent        Cum.

----------------------------------+-----------------------------------

                No union coverage |     11,383       84.56       84.56

            Member of labor union |      1,883       13.99       98.55

Covered by union but not a member |        195        1.45      100.00

----------------------------------+-----------------------------------

                            Total |     13,461      100.00

 

. tabulate educrec

 

 Educational attainment |

                 recode |      Freq.     Percent        Cum.

------------------------+-----------------------------------

                    NIU |     30,484       22.80       22.80

      None or preschool |        457        0.34       23.14

   Grades 1, 2, 3, or 4 |      1,187        0.89       24.03

   Grades 5, 6, 7, or 8 |      6,847        5.12       29.15

                Grade 9 |      4,161        3.11       32.26

               Grade 10 |      4,695        3.51       35.77

               Grade 11 |      4,721        3.53       39.30

               Grade 12 |     33,461       25.03       64.33

1 to 3 years of college |     25,883       19.36       83.69

    4+ years of college |     21,814       16.31      100.00

------------------------+-----------------------------------

                  Total |    133,710      100.00

 

. tabulate educrec, nolab

 

Educational |

 attainment |

     recode |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |     30,484       22.80       22.80

          1 |        457        0.34       23.14

          2 |      1,187        0.89       24.03

          3 |      6,847        5.12       29.15

          4 |      4,161        3.11       32.26

          5 |      4,695        3.51       35.77

          6 |      4,721        3.53       39.30

          7 |     33,461       25.03       64.33

          8 |     25,883       19.36       83.69

          9 |     21,814       16.31      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

* The problem with educrec is that the numbers do not correspond to actual years of education, so you can't take the average of educrec. That is why I created yrsed, to have real numbers representing the years of educational attainment of educrec.

 

. table educrec, contents (mean yrsed)

 

-------------------------------------

Educational attainment  |

recode                  | mean(yrsed)

------------------------+------------

                    NIU |           

      None or preschool |           0

   Grades 1, 2, 3, or 4 |         2.5

   Grades 5, 6, 7, or 8 |         6.5

                Grade 9 |           9

               Grade 10 |          10

               Grade 11 |          11

               Grade 12 |          12

1 to 3 years of college |          14

    4+ years of college |          17

-------------------------------------

 

* Table is like tabulate, except with table you can put anything (specified in contents) into the table.

 

. sort sex

 

. by sex: summarize yrsed if age>24 & age<35

 

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9027    13.31212    2.967666          0         17

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9511    13.55657    2.854472          0         17

 

* Is this difference between young men and women's educational attainment a big difference? It is about .24 years, or about 3 months. The real question is whether we can be sure, based on this CPS sample, that there really is a difference between men and women's education in the wider US society. That is, can we exclude the possibility that men and women have the same level of education in the US, and the women in this sample just happen to have a little bit more by chance? If we start with the assumption that women and men age 25-34 in the US have the same average educational attainment, how likely are we to find a random subsample with this big of a difference between men and women (and after all, this difference is not very big)? If the Probability of this null hypothesis of no difference turns out to be small, let's say smaller than 5%, we may reject the null hypothesis, and convince ourselves that young women in the US have more educational attainment than young men.

 

. ttest yrsed if age>24 & age <35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

 

* I mentioned in class that the probability associated with a T-statistic of 5.7 was a small probability, certainly much smaller than needed to reject the null hypothesis of no difference between women and men. Stata lists the probability above as 0.0000, the middle probability. We could quantify the probability a little more carefully by asking Stata to produce the exact tail distribution of the T-distribution of value 5.7164, with 18,536 degrees of freedom. Then we would double the Probability to get a 2-tail test which I will explain in class. I didn't do this in class because I will be covering this later, but here it is now:

 

. display 2*ttail(18536, 5.7164)

1.105e-08

 

* So the probability of the null hypothesis (of no difference between women and men) is 10 e-8, or about 0.00000001, or about 1 in 100 million. That is a small probability, to be sure… What is really driving this probability to be so small is that although our difference was small, our sample size was large. Larger sample sizes allow one to be more certain of any small change.

 

* Note that you can use table to generate the same kind of statistics that summarize generates.

 

. table sex if age>24 & age< 35, contents (freq mean yrsed sd yrsed min yrsed max yrsed)

 

---------------------------------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)   min(yrsed)   max(yrsed)

----------+----------------------------------------------------------------

     Male |       9,027     13.31212     2.967666            0           17

   Female |       9,511     13.55657     2.854472            0           17

---------------------------------------------------------------------------

 

* Let's briefly go back to a male-female difference from last class that seemed like a big difference.

. by sex: summarize incwage if incwage>0 & age>25 & age<65

 

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |     27097     42865.9    37545.68          1     364302

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |     25198     25866.4     22697.5          1     333564

 

 

. ttest incwage if  incwage>0 & age>25  & age<65, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |   27097     42865.9    228.0864    37545.68    42418.84    43312.96

  Female |   25198     25866.4    142.9865     22697.5    25586.13    26146.66

---------+--------------------------------------------------------------------

combined |   52295     34674.8    141.7524    32416.08    34396.97    34952.64

---------+--------------------------------------------------------------------

    diff |            16999.51    273.7817                16462.89    17536.12

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  62.0915

Ho: diff = 0                                     degrees of freedom =    52293

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

 

* The difference here is so large, the t-statistic is enormous, and the resulting p-value is so small that Stata will not give us a number more specific than "zero". The smallest number that Stata can express is 10 e-307. That is a really small number, when you consider that 10 e83 is roughly the number of atoms in the universe.

. display 2*ttail(52293, 62.09)

0

 

 

. describe yrsed

 

              storage  display     value

variable name   type   format      label      variable label

---------------------------------------------------------------------------------------

yrsed           float  %9.0g                  based on educrec

 

. log close

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\

> soc_meth_proj3\2011_180B_logs\class2.log

  log type:  text

 closed on:  27 Jan 2011, 15:49:59

--------------------------------------------------------------------------------------