---------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2010_s381_logs\first_class.log

  log type:  text

 opened on:  21 Sep 2010, 14:44:57

 

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

* Generally, it is easiest to open Stata first, then set mem, then open your log, then use the menus to open the data file you want to open.

 

. memory

                                                  bytes

--------------------------------------------------------------------

Details of set memory usage

    overhead (pointers)                         534,840        1.02%

    data                                     14,574,390       27.80%

                                        ----------------------------

    data + overhead                          15,109,230       28.82%

    free                                     37,319,562       71.18%

                                        ----------------------------

    Total allocated                          52,428,792      100.00%

--------------------------------------------------------------------

Other memory usage

    set maxvar usage                          2,001,730

    set matsize usage                         1,315,200

    programs, saved results, etc.                51,954

                                        ---------------

    Total                                     3,368,884

-------------------------------------------------------

Grand total                                  55,797,676

 

. *you might have to set mem to make enough room/

* In the logs, my comments will be preceded by an asterisk. This way Stata knows not to try to execute my comments…

 

. set mem 45m

no; data in memory would be lost

r(4);

 

. describe

 

Contains data from C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new

> .dta

  obs:       133,710                         

 vars:            55                          1 Feb 2009 13:36

 size:    15,109,230 (71.2% of memory free)

---------------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

---------------------------------------------------------------------------------------

year            int    %8.0g       yearlbl    Survey year

serial          long   %12.0g      seriallbl

                                              Household serial number

hhwt            float  %9.0g       hhwtlbl    Household weight

region          byte   %27.0g      regionlbl

                                              Region and division

statefip        byte   %57.0g      statefiplbl

                                              State (FIPS code)

metro           byte   %27.0g      metrolbl   Metropolitan central city status

metarea         int    %50.0g      metarealbl

                                              Metropolitan area

ownershp        byte   %21.0g      ownershplbl

                                              Ownership of dwelling

hhincome        long   %12.0g      hhincomelbl

                                              Total household income

pubhous         byte   %8.0g       pubhouslbl

                                              Living in public housing

foodstmp        byte   %8.0g       foodstmplbl

                                              Food stamp recipiency

pernum          byte   %8.0g       pernumlbl

                                              Person number in sample unit

perwt           float  %9.0g       perwtlbl   Person weight

momloc          byte   %8.0g       momloclbl

                                              Mother's location in the household

poploc          byte   %8.0g       poploclbl

                                              Father's location in the household

sploc           byte   %8.0g       sploclbl   Spouse's location in household

famsize         byte   %25.0g      famsizelbl

                                              Number of own family members in hh

nchild          byte   %18.0g      nchildlbl

                                              Number of own children in household

nchlt5          byte   %23.0g      nchlt5lbl

                                              Number of own children under age 5 in hh

nsibs           byte   %18.0g      nsibslbl   Number of own siblings in household

relate          int    %34.0g      relatelbl

                                              Relationship to household head

age             byte   %19.0g      agelbl     Age

sex             byte   %8.0g       sexlbl     Sex

race            int    %37.0g      racelbl    Race

marst           byte   %23.0g      marstlbl   Marital status

popstat         byte   %14.0g      popstatlbl

                                              Adult civilian, armed forces, or child

bpl             long   %27.0g      bpllbl     Birthplace

yrimmig         int    %11.0g      yrimmiglbl

                                              Year of immigration

citizen         byte   %31.0g      citizenlbl

                                              Citizenship status

mbpl            long   %27.0g      mbpllbl    Mother's birthplace

fbpl            long   %27.0g      fbpllbl    Father's birthplace

hispan          int    %29.0g      hispanlbl

                                              Hispanic origin

educ99          byte   %38.0g      educ99lbl

                                              Educational attainment, 1990

educrec         byte   %23.0g      educreclbl

                                              Educational attainment recode

schlcoll        byte   %45.0g      schlcolllbl

                                              School or college attendance

empstat         byte   %30.0g      empstatlbl

                                              Employment status

occ1990         int    %78.0g      occ1990lbl

                                              Occupation, 1990 basis

wkswork1        byte   %8.0g       wkswork1lbl

                                              Weeks worked last year

hrswork         byte   %8.0g       hrsworklbl

                                              Hours worked last week

uhrswork        byte   %13.0g      uhrsworklbl

                                              Usual hours worked per week (last yr)

hourwage        int    %8.0g       hourwagelbl

                                              Hourly wage

union           byte   %33.0g      unionlbl   Union membership

inctot          long   %12.0g                 Total personal income

incwage         long   %12.0g                 Wage and salary income

incss           long   %12.0g                 Social Security income

incwelfr        long   %12.0g                 Welfare (public assistance) income

vetstat         byte   %10.0g      vetstatlbl

                                              Veteran status

vetlast         byte   %26.0g      vetlastlbl

                                              Veteran's most recent period of service

disabwrk        byte   %34.0g      disabwrklbl

                                              Work disability

health          byte   %9.0g       healthlbl

                                              Health status

inclugh         byte   %8.0g       inclughlbl

                                              Included in employer group health plan

                                                last year

himcaid         byte   %8.0g       himcaidlbl

                                              Covered by Medicaid last year

ftotval         double %10.0g      ftotvallbl

                                              Total family income

perwt_rounded   float  %9.0g                  integer perwt, negative values recoded to

                                                0

yrsed           float  %9.0g                  based on educrec

---------------------------------------------------------------------------------------

Sorted by:  race

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. tabulate sex, nolab

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |     64,791       48.46       48.46

          2 |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. tabulate sex [fweight= perwt_rounded]

* Note the square brackets and the "fweight= weight name" syntax. There are other kinds of weights as well, and other ways to tell Stata to use the same weights. fweights are frequency weights, which means you are telling stata to multiply each observation by the weight to get the proper weighted frequency.

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |133,932,994       48.86       48.86

     Female |140,154,827       51.14      100.00

------------+-----------------------------------

      Total |274,087,821      100.00

 

. summarize  perwt_rounded

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

perwt_roun~d |    133710    2049.868    1083.244         93      14281

 

. tabulate  perwt_rounded

--Break--

r(1);

* If you try to tabulate a continuous variable, you would get a table 100,000 lines long, and the table would not be very informative.

 

. taabulate race

unrecognized command:  taabulate

r(199);

*I misspelled tabulate, and Stata didn't like it.

 

. tabulate race

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |    113,475       84.87       84.87

                          Black/Negro |     13,626       10.19       95.06

         American Indian/Aleut/Eskimo |      1,894        1.42       96.47

            Asian or Pacific Islander |      4,715        3.53      100.00

--------------------------------------+-----------------------------------

                                Total |    133,710      100.00

 

. tabulate race [fweight= perwt_rounded]

 

                                 Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

                                White |224,806,952       82.02       82.02

                          Black/Negro | 35,508,668       12.96       94.98

         American Indian/Aleut/Eskimo |  2,847,473        1.04       96.01

            Asian or Pacific Islander | 10,924,728        3.99      100.00

--------------------------------------+-----------------------------------

                                Total |274,087,821      100.00

 

. summarize  perwt_rounded

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

perwt_roun~d |    133710    2049.868    1083.244         93      14281

 

. summarize  perwt_rounded, detail

 

         integer perwt, negative values recoded to 0

-------------------------------------------------------------

      Percentiles      Smallest

 1%          284             93

 5%          428             93

10%          603             93       Obs              133710

25%         1188             96       Sum of Wgt.      133710

 

50%         2049                      Mean           2049.868

                        Largest       Std. Dev.      1083.244

75%         2649          11824

90%         3534          12547       Variance        1173417

95%         3967          12905       Skewness       .6144906

99%         4893          14281       Kurtosis       4.006292

 

* so the average weight is about 2049, meaning the CPS is roughly a 1/2000 survey. One out of every 2000 non institutionalized persons were included in the CPS.

 

. sort sex

 

. by sex: summarize yrsed

 

* If you are going to use the by: syntax, you need to sort the data first. You can sort on more than one variable at a time.

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |     49353    12.79632    3.217925          0         17

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |     53873    12.75218    3.098084          0         17

 

* That is all well and good, but the educational attainment of very young and of very old people might not be relevant.

 

 

. by sex: summarize yrsed if age>30 & age<40

 

---------------------------------------------------------------------------------------

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9001     13.3749    2.929584          0         17

 

---------------------------------------------------------------------------------------

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9450    13.50429    2.848776          0         17

 

* In the CPS unweighted sample, there are more than 18,000 individuals. Among these individuals, the women have higher educational attainment than the men. Not by a lot, by .12 years or so. And even if the difference is a small one, there is no uncertainty about it: 13.5 is more than 13.37. The question we are going to be interested in answering is whether the data suggests that women in their 30s in the US as a whole have higher educational attainment, on average, compared to men in their 30s.

 

. ttest yrsed if age>30 & age<40, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9001     13.3749    .0308788    2.929584    13.31437    13.43543

  Female |    9450    13.50429     .029305    2.848776    13.44684    13.56173

---------+--------------------------------------------------------------------

combined |   18451    13.44117    .0212695    2.889124    13.39948    13.48286

---------+--------------------------------------------------------------------

    diff |           -.1293829     .042542               -.2127692   -.0459967

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -3.0413

Ho: diff = 0                                     degrees of freedom =    18449

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0012         Pr(|T| > |t|) = 0.0024          Pr(T > t) = 0.9988

 

* The difference between the two groups is .1293 years, which is just what we would get by subtracting the two means. What we want to know is how sure are we that 30 something men and women in the US do not have the same average level of education? According to this t-test, which we will be explaining in further detail in the class, the probability that men and women in the US in their 30s have equal educational attainments is exceedingly small, 0.0024, or 2 parts in a thousand. Another way to think about this is that if men and women in the US in their 30s actually had the same level of education, how likely is it that a sample of 18000 individuals would reveal as big a difference between men and women as we found? The probability of finding such a difference by random is 0.0024, which quite small, though of course larger than zero. In this case the "null hypothesis" is that men and women have the same educational attainment, and our ttest suggests that the null hypothesis is fairly unlikely, and probably ought to be discarded. Usually we are willing to discard null hypotheses when their probability given the data is less than 0.05, or 5%, but that cutoff is arbitrary.

 

. exit, clear

 

* I quit the program using the menus.

 

* Because we have not made any changes to the dataset (we have not added any new variables) we don't need to save the dataset, and we can just quit. The log is automatically saved, and will be available for inspection if you remember where you saved it!