---------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2010_s381_logs\class2.log

  log type:  text

 opened on:  23 Sep 2010, 14:17:39

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

 

. summarize incwelfr

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |    103226    40.62242    478.8231          0      25000

 

* If you take the average of something with a lot of zeros, you get a skewed view of the data.

 

. summarize incwelfr if incwelfr>0

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |      1289    3253.134    2813.505          1      25000

 

* What we really want to know is what is the average welfare income for those who receive welfare. $3,000 makes more sense than $40.

 

. sort sex

 

. by sex: summarize incwelfr if incwelfr>0

 

--------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |       188    2979.622    2644.509          1      13800

 

--------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |      1101    3299.837    2839.866          1      25000

 

* Most welfare recipients are female.

 

 

. by sex: summarize incwelfr if incwelfr>0 [fweight= perwt_rounded]

 

--------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |    357702     2897.24    2577.316          1      13800

 

--------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |   2193544    3100.608    2837.588          1      25000

 

* And there are about 2.2 million women, and 360K men on welfare.

 

 

. by sex: summarize incwelfr if incwelfr>0 &age>20 [fweight= perwt_rounded]

 

--------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |    256209    2972.657    2636.861          1      13800

 

--------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |   1886278    3237.094    2906.139          1      25000

 

* Are some of the welfare recipients younger than 21? Apparently yes (note the number of observations is lower here than above).

 

 

. by sex: summarize incwelfr if incwelfr>0 &age>12 [fweight= perwt_rounded]

 

--------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |    357702     2897.24    2577.316          1      13800

 

--------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

    incwelfr |   2193544    3100.608    2837.588          1      25000

 

* All welfare recipients are over 12.

 

 

. table sex if incwelfr>0, contents (freq mean incwelfr)

 

------------------------------------------

      Sex |          Freq.  mean(incwelfr)

----------+-------------------------------

     Male |         15,626      2979.62234

   Female |         16,147     3299.837421

------------------------------------------

 

* You can use the table command to generate statistics, and averages, just like the summarize command. Here, however, the count of observations of men and women (unweighted) who have welfare income greater than zero is much larger than the unweighted count we got from summarize above, even though the means are exactly the same. What is the problem. The problem is that people with incwelfr==. (i.e. missing values) are counted as being >0, which is a weird property of the missing value code.

 

. table sex if incwelfr>0 , contents (freq mean incwelfr)

 

------------------------------------------

      Sex |          Freq.  mean(incwelfr)

----------+-------------------------------

     Male |         15,626      2979.62234

   Female |         16,147     3299.837421

------------------------------------------

 

. table sex if incwelfr>0 & incwelfr~=. , contents (freq mean incwelfr)

 

------------------------------------------

      Sex |          Freq.  mean(incwelfr)

----------+-------------------------------

     Male |            188      2979.62234

   Female |          1,101     3299.837421

------------------------------------------

* So if we exclude the missing values by hand, we get exactly the same unweighted count as we got with summarize.

 

 

 

 

. table sex if incwelfr>0 & incwelfr~=. [fweight= perwt_rounded] , contents (freq mean incwelfr)

 

------------------------------------------

      Sex |          Freq.  mean(incwelfr)

----------+-------------------------------

     Male |        357,702     2897.240312

   Female |        2193544     3100.608278

------------------------------------------

 

* Again, exactly like the summarize command..

 

 

*Now let's look at wage income, which is more broadly relevant.

 

. by sex: summarize incwage if age>25 & age<35

 

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |      8229    30226.95    27174.71          0     362302

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |      8643     18039.8    20632.34          0     333564

 

 

. by sex: summarize incwage if age>24 & age<35

 

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

     incwage |      9027    29510.62    26619.54          0     362302

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

 

     incwage |      9511    17728.95    20249.23          0     333564

 

 

. table sex if age>24 & age<35, contents(freq mean incwage mean yrsed)

 

-------------------------------------------------------

      Sex |         Freq.  mean(incwage)    mean(yrsed)

----------+--------------------------------------------

     Male |         9,027    29510.61781       13.31212

   Female |         9,511    17728.94764       13.55657

-------------------------------------------------------

* women have more education, but earn less…

 

. table sex if age>24 & age<35&  occ1990==178, contents(freq mean incwage mean yrsed)

 

-------------------------------------------------------

      Sex |         Freq.  mean(incwage)    mean(yrsed)

----------+--------------------------------------------

     Male |            60    56928.93333             17

   Female |            41    59430.68293       16.92683

-------------------------------------------------------

 

* occ1990==178 are the lawyers (you can look up the codes on the ipums website, or tabulate the variable, or codebook the variable, or list the value label attached to the variable). It looks like young women lawyers make a bit more money than young male lawyers…

 

. table sex if age>24 & age<35&  occ1990==178 [fweight= perwt_rounded], contents(freq mean incwage mean yrsed)

 

-------------------------------------------------------

      Sex |         Freq.  mean(incwage)    mean(yrsed)

----------+--------------------------------------------

     Male |       137,314    58326.18129             17

   Female |       110,119    62426.92046       16.92127

-------------------------------------------------------

 

. summarize age

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

         age |    133710    35.17964    22.21722          0         90

 

*Why is age==90 the highest age in this dataset? The answer is that age is topcoded, to protect the individual identity of the people who happen to be old age outliers.

 

. table sex if age>24 & age<35&  occ1990==178 [fweight= perwt_rounded], contents(freq mean incwage max incwage mean yrsed)

 

----------------------------------------------------------------------

      Sex |         Freq.  mean(incwage)   max(incwage)    mean(yrsed)

----------+-----------------------------------------------------------

     Male |       137,314    58326.18129         229339             17

   Female |       110,119    62426.92046         150000       16.92127

----------------------------------------------------------------------

 

* Is it possible that the similarity of earnings for young male and young female lawyers is because the highest earning men are topcoded, which would skew our comparison? Actually, it turns out not to be the case. All of the young lawyers in our sample are below the topcode income of $362,302

 

. summarize incwage, detail

 

                   Wage and salary income

-------------------------------------------------------------

      Percentiles      Smallest

 1%            0              0

 5%            0              0

10%            0              0       Obs              103226

25%            0              0       Sum of Wgt.      103226

 

50%        10000                      Mean           19462.59

                        Largest       Std. Dev.      28843.38

75%        30000         362302

90%        50000         362302       Variance       8.32e+08

95%        66500         362302       Skewness       3.583439

99%       125000         364302       Kurtosis       24.50639

 

. table sex if age>24 & age<35, contents(freq mean yrsed)

 

------------------------------------

      Sex |       Freq.  mean(yrsed)

----------+-------------------------

     Male |       9,027     13.31212

   Female |       9,511     13.55657

------------------------------------

 

. display 13.55657-13.31212

.24445

 

* You can use the display function as a calculator

 

. ttest yrsed if age>24 & age<35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* repeats our ttest of yrsed, but with age group consistent with my Excel file.

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. tabulate sex, nolabel

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |     64,791       48.46       48.46

          2 |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

* now to generate a new dummy variable which we will use in regression..

 

. generate male=0

 

. replace male=1 if sex==1

(64791 real changes made)

 

. label define male_lbl 0 "female" 1 "male"

 

* making a value label

 

. label val male male_lbl

 

* attaching that value label to the variable.

 

. tabulate sex male

 

           |         male

       Sex |    female       male |     Total

-----------+----------------------+----------

      Male |         0     64,791 |    64,791

    Female |    68,919          0 |    68,919

-----------+----------------------+----------

     Total |    68,919     64,791 |   133,710

 

 

. tabulate sex male, nolab

 

           |         male

       Sex |         0          1 |     Total

-----------+----------------------+----------

         1 |         0     64,791 |    64,791

         2 |    68,919          0 |    68,919

-----------+----------------------+----------

     Total |    68,919     64,791 |   133,710

 

 

. tabulate sex male, nolab miss

 

           |         male

       Sex |         0          1 |     Total

-----------+----------------------+----------

         1 |         0     64,791 |    64,791

         2 |    68,919          0 |    68,919

-----------+----------------------+----------

     Total |    68,919     64,791 |   133,710

 

* There are in this dataset no missing values for sex. The reason is that if the respondent left it missing, the Census Bureau imputed it. Imputation flags are available from ipums.

 

 

. regress yrsed male if age>24 & ager<35

ager not found

r(111);

 

. regress yrsed male if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

       _cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

 

* Note how this gives you the same coefficient, the same t-statistic, and therefore the same answer as the t-test above.

 

. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace

file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved

 

* I added a new variable (male) and I want to keep it, so I saved the dataset.

 

. clear

 

* Then I clear the dataset to make way for the new one.

 

. cd "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps"

C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps

 

*You have to execute the cd command so that Stata knows where to find your uncompressed dataset.

 

*The following is what it looks like in Stata when you import data. First it lists the do file, running it step by step. You can pick the do file to run from the menus, File> Do

 

. do "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps\cps_00008.do"

 

. /* Important: you need to put the .dat and .do files in one folder/

>    directory and then set the working folder to that folder. */

.

. set more off

 

.

. clear

 

. infix ///

>  int     year                                 1-4 ///

>  byte    age                                  5-6 ///

>  byte    sex                                  7 ///

>  using cps_00008.dat

(210648 observations read)

 

.

. label var year `"Survey year"'

 

. label var age `"Age"'

 

. label var sex `"Sex"'

 

.

. label define agelbl 00 `"Under 1 year"'

 

. label define agelbl 01 `"1"', add

 

. label define agelbl 02 `"2"', add

 

. label define agelbl 03 `"3"', add

 

. label define agelbl 04 `"4"', add

 

. label define agelbl 05 `"5"', add

 

. label define agelbl 06 `"6"', add

 

. label define agelbl 07 `"7"', add

 

. label define agelbl 08 `"8"', add

 

. label define agelbl 09 `"9"', add

 

. label define agelbl 10 `"10"', add

 

. label define agelbl 11 `"11"', add

 

. label define agelbl 12 `"12"', add

 

. label define agelbl 13 `"13"', add

 

. label define agelbl 14 `"14"', add

 

. label define agelbl 15 `"15"', add

 

. label define agelbl 16 `"16"', add

 

. label define agelbl 17 `"17"', add

 

. label define agelbl 18 `"18"', add

 

. label define agelbl 19 `"19"', add

 

. label define agelbl 20 `"20"', add

 

. label define agelbl 21 `"21"', add

 

. label define agelbl 22 `"22"', add

 

. label define agelbl 23 `"23"', add

 

. label define agelbl 24 `"24"', add

 

. label define agelbl 25 `"25"', add

 

. label define agelbl 26 `"26"', add

 

. label define agelbl 27 `"27"', add

 

. label define agelbl 28 `"28"', add

 

. label define agelbl 29 `"29"', add

 

. label define agelbl 30 `"30"', add

 

. label define agelbl 31 `"31"', add

 

. label define agelbl 32 `"32"', add

 

. label define agelbl 33 `"33"', add

 

. label define agelbl 34 `"34"', add

 

. label define agelbl 35 `"35"', add

 

. label define agelbl 36 `"36"', add

 

. label define agelbl 37 `"37"', add

 

. label define agelbl 38 `"38"', add

 

. label define agelbl 39 `"39"', add

 

. label define agelbl 40 `"40"', add

 

. label define agelbl 41 `"41"', add

 

. label define agelbl 42 `"42"', add

 

. label define agelbl 43 `"43"', add

 

. label define agelbl 44 `"44"', add

 

. label define agelbl 45 `"45"', add

 

. label define agelbl 46 `"46"', add

 

. label define agelbl 47 `"47"', add

 

. label define agelbl 48 `"48"', add

 

. label define agelbl 49 `"49"', add

 

. label define agelbl 50 `"50"', add

 

. label define agelbl 51 `"51"', add

 

. label define agelbl 52 `"52"', add

 

. label define agelbl 53 `"53"', add

 

. label define agelbl 54 `"54"', add

 

. label define agelbl 55 `"55"', add

 

. label define agelbl 56 `"56"', add

 

. label define agelbl 57 `"57"', add

 

. label define agelbl 58 `"58"', add

 

. label define agelbl 59 `"59"', add

 

. label define agelbl 60 `"60"', add

 

. label define agelbl 61 `"61"', add

 

. label define agelbl 62 `"62"', add

 

. label define agelbl 63 `"63"', add

 

. label define agelbl 64 `"64"', add

 

. label define agelbl 65 `"65"', add

 

. label define agelbl 66 `"66"', add

 

. label define agelbl 67 `"67"', add

 

. label define agelbl 68 `"68"', add

 

. label define agelbl 69 `"69"', add

 

. label define agelbl 70 `"70"', add

 

. label define agelbl 71 `"71"', add

 

. label define agelbl 72 `"72"', add

 

. label define agelbl 73 `"73"', add

 

. label define agelbl 74 `"74"', add

 

. label define agelbl 75 `"75"', add

 

. label define agelbl 76 `"76"', add

 

. label define agelbl 77 `"77"', add

 

. label define agelbl 78 `"78"', add

 

. label define agelbl 79 `"79"', add

 

. label define agelbl 80 `"80"', add

 

. label define agelbl 81 `"81"', add

 

. label define agelbl 82 `"82"', add

 

. label define agelbl 83 `"83"', add

 

. label define agelbl 84 `"84"', add

 

. label define agelbl 85 `"85"', add

 

. label define agelbl 86 `"86"', add

 

. label define agelbl 87 `"87"', add

 

. label define agelbl 88 `"88"', add

 

. label define agelbl 89 `"89"', add

 

. label define agelbl 90 `"90 (90+, 1988-2002)"', add

 

. label define agelbl 91 `"91"', add

 

. label define agelbl 92 `"92"', add

 

. label define agelbl 93 `"93"', add

 

. label define agelbl 94 `"94"', add

 

. label define agelbl 95 `"95"', add

 

. label define agelbl 96 `"96"', add

 

. label define agelbl 97 `"97"', add

 

. label define agelbl 98 `"98"', add

 

. label define agelbl 99 `"99+"', add

 

. label values age agelbl

 

.

. label define sexlbl 1 `"Male"'

 

. label define sexlbl 2 `"Female"', add

 

. label values sex sexlbl

 

.

.

end of do-file

 

. save "C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps\my fun cps dataset trial.dta"

 

* We have just created a new Stata dataset, so we have to save it.

file C:\Documents and Settings\Michael Rosenfeld\My Documents\current class files\Soc 381\trial cps\my fun cps dataset trial.dta saved

 

. clear all

 

. exit, clear