first log

---------------------------------------------------------------------------------------

name: <unnamed>

log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2010_s381_logs\first_class.log

log type: text

opened on: 21 Sep 2010, 14:44:57

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

* Generally, it is easiest to open Stata first, then set mem, then open your log, then use the menus to open the data file you want to open.

. memory

bytes

--------------------------------------------------------------------

Details of set memory usage

overhead (pointers) 534,840 1.02%

data 14,574,390 27.80%

----------------------------

data + overhead 15,109,230 28.82%

free 37,319,562 71.18%

----------------------------

Total allocated 52,428,792 100.00%

--------------------------------------------------------------------

Other memory usage

set maxvar usage 2,001,730

set matsize usage 1,315,200

programs, saved results, etc. 51,954

---------------

Total 3,368,884

-------------------------------------------------------

Grand total 55,797,676

. *you might have to set mem to make enough room/

* In the logs, my comments will be preceded by an asterisk. This way Stata knows not to try to execute my comments…

. set mem 45m

no; data in memory would be lost

r(4);

. describe

Contains data from C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new

> .dta

obs: 133,710

vars: 55 1 Feb 2009 13:36

size: 15,109,230 (71.2% of memory free)

---------------------------------------------------------------------------------------

storage display value

variable name type format label variable label

---------------------------------------------------------------------------------------

year int %8.0g yearlbl Survey year

serial long %12.0g seriallbl

Household serial number

hhwt float %9.0g hhwtlbl Household weight

region byte %27.0g regionlbl

Region and division

statefip byte %57.0g statefiplbl

State (FIPS code)

metro byte %27.0g metrolbl Metropolitan central city status

metarea int %50.0g metarealbl

Metropolitan area

ownershp byte %21.0g ownershplbl

Ownership of dwelling

hhincome long %12.0g hhincomelbl

Total household income

pubhous byte %8.0g pubhouslbl

Living in public housing

foodstmp byte %8.0g foodstmplbl

Food stamp recipiency

pernum byte %8.0g pernumlbl

Person number in sample unit

perwt float %9.0g perwtlbl Person weight

momloc byte %8.0g momloclbl

Mother's location in the household

poploc byte %8.0g poploclbl

Father's location in the household

sploc byte %8.0g sploclbl Spouse's location in household

famsize byte %25.0g famsizelbl

Number of own family members in hh

nchild byte %18.0g nchildlbl

Number of own children in household

nchlt5 byte %23.0g nchlt5lbl

Number of own children under age 5 in hh

nsibs byte %18.0g nsibslbl Number of own siblings in household

relate int %34.0g relatelbl

Relationship to household head

age byte %19.0g agelbl Age

sex byte %8.0g sexlbl Sex

race int %37.0g racelbl Race

marst byte %23.0g marstlbl Marital status

popstat byte %14.0g popstatlbl

Adult civilian, armed forces, or child

bpl long %27.0g bpllbl Birthplace

yrimmig int %11.0g yrimmiglbl

Year of immigration

citizen byte %31.0g citizenlbl

Citizenship status

mbpl long %27.0g mbpllbl Mother's birthplace

fbpl long %27.0g fbpllbl Father's birthplace

hispan int %29.0g hispanlbl

Hispanic origin

educ99 byte %38.0g educ99lbl

Educational attainment, 1990

educrec byte %23.0g educreclbl

Educational attainment recode

schlcoll byte %45.0g schlcolllbl

School or college attendance

empstat byte %30.0g empstatlbl

Employment status

occ1990 int %78.0g occ1990lbl

Occupation, 1990 basis

wkswork1 byte %8.0g wkswork1lbl

Weeks worked last year

hrswork byte %8.0g hrsworklbl

Hours worked last week

uhrswork byte %13.0g uhrsworklbl

Usual hours worked per week (last yr)

hourwage int %8.0g hourwagelbl

Hourly wage

union byte %33.0g unionlbl Union membership

inctot long %12.0g Total personal income

incwage long %12.0g Wage and salary income

incss long %12.0g Social Security income

incwelfr long %12.0g Welfare (public assistance) income

vetstat byte %10.0g vetstatlbl

Veteran status

vetlast byte %26.0g vetlastlbl

Veteran's most recent period of service

disabwrk byte %34.0g disabwrklbl

Work disability

health byte %9.0g healthlbl

Health status

inclugh byte %8.0g inclughlbl

Included in employer group health plan

last year

himcaid byte %8.0g himcaidlbl

Covered by Medicaid last year

ftotval double %10.0g ftotvallbl

Total family income

perwt_rounded float %9.0g integer perwt, negative values recoded to

yrsed float %9.0g based on educrec

---------------------------------------------------------------------------------------

Sorted by: race

. tabulate sex

Sex | Freq. Percent Cum.

------------+-----------------------------------

Male | 64,791 48.46 48.46

Female | 68,919 51.54 100.00

------------+-----------------------------------

Total | 133,710 100.00

. tabulate sex, nolab

Sex | Freq. Percent Cum.

------------+-----------------------------------

1 | 64,791 48.46 48.46

2 | 68,919 51.54 100.00

------------+-----------------------------------

Total | 133,710 100.00

. tabulate sex [fweight= perwt_rounded]

* Note the square brackets and the "fweight= weight name" syntax. There are other kinds of weights as well, and other ways to tell Stata to use the same weights. fweights are frequency weights, which means you are telling stata to multiply each observation by the weight to get the proper weighted frequency.

Sex | Freq. Percent Cum.

------------+-----------------------------------

Male |133,932,994 48.86 48.86

Female |140,154,827 51.14 100.00

------------+-----------------------------------

Total |274,087,821 100.00

. summarize perwt_rounded

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

perwt_roun~d | 133710 2049.868 1083.244 93 14281

. tabulate perwt_rounded

--Break--

r(1);

* If you try to tabulate a continuous variable, you would get a table 100,000 lines long, and the table would not be very informative.

. taabulate race

unrecognized command: taabulate

r(199);

*I misspelled tabulate, and Stata didn't like it.

. tabulate race

Race | Freq. Percent Cum.

--------------------------------------+-----------------------------------

White | 113,475 84.87 84.87

Black/Negro | 13,626 10.19 95.06

American Indian/Aleut/Eskimo | 1,894 1.42 96.47

Asian or Pacific Islander | 4,715 3.53 100.00

--------------------------------------+-----------------------------------

Total | 133,710 100.00

. tabulate race [fweight= perwt_rounded]

Race | Freq. Percent Cum.

--------------------------------------+-----------------------------------

White |224,806,952 82.02 82.02

Black/Negro | 35,508,668 12.96 94.98

American Indian/Aleut/Eskimo | 2,847,473 1.04 96.01

Asian or Pacific Islander | 10,924,728 3.99 100.00

--------------------------------------+-----------------------------------

Total |274,087,821 100.00

. summarize perwt_rounded

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

perwt_roun~d | 133710 2049.868 1083.244 93 14281

. summarize perwt_rounded, detail

integer perwt, negative values recoded to 0

-------------------------------------------------------------

Percentiles Smallest

1% 284 93

5% 428 93

10% 603 93 Obs 133710

25% 1188 96 Sum of Wgt. 133710

50% 2049 Mean 2049.868

Largest Std. Dev. 1083.244

75% 2649 11824

90% 3534 12547 Variance 1173417

95% 3967 12905 Skewness .6144906

99% 4893 14281 Kurtosis 4.006292

* so the average weight is about 2049, meaning the CPS is roughly a 1/2000 survey. One out of every 2000 non institutionalized persons were included in the CPS.

. sort sex

. by sex: summarize yrsed

* If you are going to use the by: syntax, you need to sort the data first. You can sort on more than one variable at a time.

---------------------------------------------------------------------------------------

-> sex = Male

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

yrsed | 49353 12.79632 3.217925 0 17

---------------------------------------------------------------------------------------

-> sex = Female

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

yrsed | 53873 12.75218 3.098084 0 17

* That is all well and good, but the educational attainment of very young and of very old people might not be relevant.

. by sex: summarize yrsed if age>30 & age<40

---------------------------------------------------------------------------------------

-> sex = Male

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

yrsed | 9001 13.3749 2.929584 0 17

---------------------------------------------------------------------------------------

-> sex = Female

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

yrsed | 9450 13.50429 2.848776 0 17

* In the CPS unweighted sample, there are more than 18,000 individuals. Among these individuals, the women have higher educational attainment than the men. Not by a lot, by .12 years or so. And even if the difference is a small one, there is no uncertainty about it: 13.5 is more than 13.37. The question we are going to be interested in answering is whether the data suggests that women in their 30s in the US as a whole have higher educational attainment, on average, compared to men in their 30s.

. ttest yrsed if age>30 & age<40, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9001 13.3749 .0308788 2.929584 13.31437 13.43543

Female | 9450 13.50429 .029305 2.848776 13.44684 13.56173

---------+--------------------------------------------------------------------

combined | 18451 13.44117 .0212695 2.889124 13.39948 13.48286

---------+--------------------------------------------------------------------

diff | -.1293829 .042542 -.2127692 -.0459967

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -3.0413

Ho: diff = 0 degrees of freedom = 18449

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0012 Pr(|T| > |t|) = 0.0024 Pr(T > t) = 0.9988

* The difference between the two groups is .1293 years, which is just what we would get by subtracting the two means. What we want to know is how sure are we that 30 something men and women in the US do not have the same average level of education? According to this t-test, which we will be explaining in further detail in the class, the probability that men and women in the US in their 30s have equal educational attainments is exceedingly small, 0.0024, or 2 parts in a thousand. Another way to think about this is that if men and women in the US in their 30s actually had the same level of education, how likely is it that a sample of 18000 individuals would reveal as big a difference between men and women as we found? The probability of finding such a difference by random is 0.0024, which quite small, though of course larger than zero. In this case the "null hypothesis" is that men and women have the same educational attainment, and our ttest suggests that the null hypothesis is fairly unlikely, and probably ought to be discarded. Usually we are willing to discard null hypotheses when their probability given the data is less than 0.05, or 5%, but that cutoff is arbitrary.

. exit, clear

* I quit the program using the menus.

* Because we have not made any changes to the dataset (we have not added any new variables) we don't need to save the dataset, and we can just quit. The log is automatically saved, and will be available for inspection if you remember where you saved it!