HERE IS THE BASIC DRILL FOR MATCHING HUSBANDS TO WIVES. FIRST, MY DATASET CONTAINED ONLY MARRIED (SPOUSE PRESENT) INDIVIDUALS, SO WE EXPECT EVERYONE TO HAVE A SPOUSE MATCH (THIS IS DIFFERENT FROM THE HOUSEHOLDER-PARTNER MATCH DESCRIBED BELOW, WHERE MOST HOUSEHOLDERS DO NOT HAVE A PARTNER.

PROCEDURE IS THIS:

1) DOWLOAD THE DATA AND SAVE INDIVUAL LEVEL DATASET OF MARRIED PERSONS

2) KEEP ONE SEX, RENAME ALL THE VARIABLES APPROPRIATELY, AND SORT ON A UNIQUE COUPLE ID

3) OPEN THE INDIVIDUALS DATASET, KEEP THE OTHER SEX, RENAME THE VARIABLES APPROPRIATELY, THEN SORT ON UNIQUE COUPLE ID

4) MERGE THE TWO DATASETS TOGETHER ON THE UNIQUE COUPLE ID

5) CHECK RESULTS

 

-------------------------------------------------------------------------------

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\New stata

>  files\20th cent compare ed and race intermar\an ed endogamy 1pct redo checke

> r for SF.log

  log type:  text

 opened on:   5 Sep 2007, 23:19:35

 

. cd "F:\AAA Miker Data folder\1940-2000 1% married with race and ed"

F:\AAA Miker Data folder\1940-2000 1% married with race and ed

 

. do "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\mrosenfe_s

> tanford_edu_092.do"

 

. /* Important: you need to put the .dat and .do files in one folder/

>    directory and then set the working folder to that folder. */

.

. set more off

 

.

. clear

 

. infix ///

>  byte    year                  1-2 ///

>  double  serial                3-10 ///

>  int     hhwt                 11-14 ///

>  byte    statefip             15-16 ///

>  byte    gq                   17 ///

>  byte    pernum               18-19 ///

>  byte    sploc                20-21 ///

>  int     age                  22-24 ///

>  byte    sex                  25 ///

>  byte    marst                26 ///

>  int     race                 27 ///

>  long    bpl                  28-30 ///

>  int     hispan               31 ///

>  byte    educrec              32 ///

>  int     higrade              33-34 ///

>  byte    educ99               35-36 ///

>  using mrosenfe_stanford_edu_092.dat

(5495702 observations read)

 

.

 

. tabulate year

 

Census year |      Freq.     Percent        Cum.

------------+-----------------------------------

       2000 |  1,152,040       20.96       20.96

       1940 |    570,800       10.39       31.35

       1960 |    809,206       14.72       46.07

       1970 |    886,714       16.13       62.21

       1980 |    993,668       18.08       80.29

       1990 |  1,083,274       19.71      100.00

------------+-----------------------------------

      Total |  5,495,702      100.00

 

. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\individuals.dta"

file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\individuals.dta saved

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |  2,747,851       50.00       50.00

     Female |  2,747,851       50.00      100.00

------------+-----------------------------------

      Total |  5,495,702      100.00

 

. tabulate sex, nolab

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |  2,747,851       50.00       50.00

          2 |  2,747,851       50.00      100.00

------------+-----------------------------------

      Total |  5,495,702      100.00

 

. keep if sex==1

(2747851 observations deleted)

 

. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta"

file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta saved

 

. rename age mage

 

. rename race mrace

 

. rename hispan mhispan

 

. rename bpl mbpl

 

. rename educrec

varname required

r(100);

 

. rename educrec meducrec

 

. rename higrade mhigrade

 

. rename educ99 meduc99

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |  2,747,851      100.00      100.00

------------+-----------------------------------

      Total |  2,747,851      100.00

 

. drop sex

 

. tabulate year, nolab

 

Census year |      Freq.     Percent        Cum.

------------+-----------------------------------

          0 |    576,020       20.96       20.96

         94 |    285,400       10.39       31.35

         96 |    404,603       14.72       46.07

         97 |    443,357       16.13       62.21

         98 |    496,834       18.08       80.29

         99 |    541,637       19.71      100.00

------------+-----------------------------------

      Total |  2,747,851      100.00

 

. gen str14 cupid=string(year)+string(statefip)+string(serial)+string(pernum)

 

/*NOTE THAT APPLIES TO ALL COUPLE-ID GENERATING STATEMENTS: IN NEWER VERSIONS OF STATA, OR WITH LARGE DATASETS AND CORRESPONDINGLY LARGER VALUES OF SERIAL, YOU HAVE TO BE CAREFUL LEST STATA USE A NON-FIXED-WIDTH VERSION OF THE NUMBER TO MAKE A STRING FROM, AND THEN LEAVE YOU WITH DUPLICATE COUPLE IDENTIFIERS BECAUSE IS 251 AND SPLOC IS 1, VERSUS SERIAL OF 25 AND SPLOC OF 11, WHICH WOULD BOTH COMBINE TO 2511. WHAT YOU NEED TO DO IN RECENT VERSIONS OF STATA IS INSIST UPON FIXED LENGTH FORMATS FOR THE NUMBERS, WITH LEFT JUSTIFIED NUMBERS SO THAT TRAILING SPACES GET LEFT IN THE STRING. FOR INSTANCE:

gen str17 cupid=string(year, "%4.0f")+string(datanum, "%1.0f")+string(statefip, "%-2.0f")+string(serial, "%-8.0f")+string(pernum, "%-2.0f")

I REALIZE THAT THIS ABOVE SYNTAX IS MORE UNWIELDY, BUT IF THE SIMPLER SYNTAX RESULTS IN ANY DUPLICATE IDS, YOU HAVE TO RESORT TO THE MORE COMPLICATED SYNTAX

*/

 

 

. drop sploc

 

. sort cupid

 

. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta", replace

file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\husbands.dta saved

 

. clear all

 

. use "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\individuals.dta", clear

 

. keep if sex==2

(2747851 observations deleted)

 

. rename age fage

 

. rename marst fmarst

 

. tabulate marst

variable marst not found

r(111);

 

. tabulate fmarst

 

         Marital status |      Freq.     Percent        Cum.

------------------------+-----------------------------------

Married, spouse present |  2,747,851      100.00      100.00

------------------------+-----------------------------------

                  Total |  2,747,851      100.00

 

. drop fmarst

 

. rename race frace

 

. rename bpl fbpl

 

. rename hispan fhispan

 

. rename educrec feducrec

 

. rename higrade fhigrade

 

. rename educ99 feduc99

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

     Female |  2,747,851      100.00      100.00

------------+-----------------------------------

      Total |  2,747,851      100.00

 

. drop sex

 

. gen str14 cupid=string(year)+string(statefip)+string(serial)+string(sploc)

 

 

. rename pernum fpernum

 

. rename sploc fsploc

 

. drop year serial hhwt statefip gq

 

. sort cupid

 

. merge cupid using husbands

(label educ99lbl already defined)

(label higradelbl already defined)

(label educreclbl already defined)

(label hispanlbl already defined)

(label bpllbl already defined)

(label racelbl already defined)

(label marstlbl already defined)

(label agelbl already defined)

(label gqlbl already defined)

(label statefiplbl already defined)

(label yearlbl already defined)

 

. tabulate _merge

 

     _merge |      Freq.     Percent        Cum.

------------+-----------------------------------

          3 |  2,747,851      100.00      100.00

------------+-----------------------------------

      Total |  2,747,851      100.00

 

. drop _merge

 

* MERGE OF ALL _3 MEANS A PERFECT 1-TO-1 MATCH OF HUSBANDS TO WIVES, WHICH IS WHAT WE WANT.

 

. drop cupid

 

. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta"

file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta saved

 

. tabulate feducrec

 

 Educational attainment |

                 recode |      Freq.     Percent        Cum.

------------------------+-----------------------------------

      None or preschool |     26,902        0.98        0.98

    Grade 1, 2, 3, or 4 |     60,168        2.19        3.17

    Grade 5, 6, 7, or 8 |    412,543       15.01       18.18

                Grade 9 |    122,785        4.47       22.65

               Grade 10 |    159,040        5.79       28.44

               Grade 11 |    128,554        4.68       33.12

               Grade 12 |    983,211       35.78       68.90

1 to 3 years of college |    497,851       18.12       87.02

    4+ years of college |    356,797       12.98      100.00

------------------------+-----------------------------------

                  Total |  2,747,851      100.00

 

. tabulate feducrec, nolab miss

 

Educational |

 attainment |

     recode |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |     26,902        0.98        0.98

          2 |     60,168        2.19        3.17

          3 |    412,543       15.01       18.18

          4 |    122,785        4.47       22.65

          5 |    159,040        5.79       28.44

          6 |    128,554        4.68       33.12

          7 |    983,211       35.78       68.90

          8 |    497,851       18.12       87.02

          9 |    356,797       12.98      100.00

------------+-----------------------------------

      Total |  2,747,851      100.00

 

. gen byte mBAplus=0

 

. replace mBAplus=1 if meducrec=9

invalid syntax

r(198);

 

. replace mBAplus=1 if meducrec==9

(492258 real changes made)

 

. gen byte fBAplus=0

 

. replace fBAplus=1 if feducrec==9

(356797 real changes made)

 

. save "F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta", replace

file F:\AAA Miker Data folder\1940-2000 1% married with race and ed\couples.dta saved

 

. sort year

 

. tabulate  mBAplus fBAplus if mage>19 & mage<30 & fage>19 & fage<30 & mbpl<100 & fbpl<100, ma

> tcell(matrix)

 

           |        fBAplus

   mBAplus |         0          1 |     Total

-----------+----------------------+----------

         0 |   232,327     11,961 |   244,288

         1 |    22,317     22,364 |    44,681

-----------+----------------------+----------

     Total |   254,644     34,325 |   288,969

 

 

. tabulate  mBAplus fBAplus if year==94 & mage>19 & mage<30 & fage>19 & fage<30 & mbpl<100 & f

> bpl<100, matcell(matrix)

 

           |        fBAplus

   mBAplus |         0          1 |     Total

-----------+----------------------+----------

         0 |    33,840        421 |    34,261

         1 |     1,267        544 |     1,811

-----------+----------------------+----------

     Total |    35,107        965 |    36,072

 

 

. orse matrix

odds ratio is      34.512033

log odds ratio is  3.541308

SE of Log OR=      .07093905

inverse OR (intermar OR)=  .0289754

off-diag (or intermar) log OR=  -3.541308

 

. display 33840*544/(421*1267)

34.512033

 

. *very close to what I had before..

* ORSE IS A LITTLE STATA PROGRAM I WROTE MYSELF TO DISPLAY THE ODDS RATIO, LOG ODDS RATIO AND STANDARD ERROR OF THE LOR FROM A 2X2 TABLE.

 

---------------------------------------------------------------------

WHAT FOLLOWS IS A SAMPLE LOG FOR IMPORTING CENSUS DATA AND MATCHING HEADS OF HOUSEHOLD TO PARTNERS. REMEMBER THAT EVERY PERSON IN THE HOUSEHOLD GIVES THEIR RELATIONSHIP TO THE HEAD OF THE HOUSEHOLD; ONLY PARTNERS OF THE HEAD OF HOUSEHOLD GET RECORDED. SO IN THIS PROCEDURE WE SEPARATE OUT HEADS OF HOUSEHOLD (RELATED==101) AND PARTNERS (OF THE HEAD OF HH, RELATED=1114) AND THEN MATCH THEM. THE ID FOR MATCHING HEADS OF HH TO PARTNERS IS SIMPLY YEAR+SERIAL# OR YEAR+STATE+SERIAL# TO BE MORE CERTAIN. EVERY HH CAN CONTAIN AT MOST ONE UNMARRIED PARTNER COUPLE. HEADS OF HH WITHOUT PARTNERS ARE DROPPED, AND THEN WHAT IS LEFT IS HEADS OF HH AND THEIR PARTNERS. NOTE THAT THIS DATA EXTRACTION CREATES ONLY THE HOUSEHOLDER-PARTNER COUPLES. ANOTHER, DIFFERENT DATA EXTRACTION WOULD BE REQUIRED TO MATCH HUSBANDS TO WIVES.

 

-----------------------------------------------------------------------------------------------------

       log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\New stata files\family structur

> e\5% 2000 HH and partners.log

  log type:  text

 opened on:   2 Feb 2004, 15:30:29

 

. set mem 600m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.733M

    set memory          600M    max. data space                600.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               602.987M

 

. cd "C:\AAA Miker Data folder\2000 5% for hh and partner"

C:\AAA Miker Data folder\2000 5% for hh and partner

 

 

 

. do "C:\AAA Miker Data folder\2000 5% for hh and partner\mrose009.do"

 

. infix using mrose009.dct if (related==101 | related==1114)

infix dictionary using mrose009.dat {

str8    serial     1-  8

int     hhwt       9- 12

byte    statefip  13- 14

byte    metro     15- 15

int     metareag  16- 18

*       pernum    19- 20

int     related   21- 24

byte    age       25- 27

byte    sex       28- 28

byte    raceg     29- 29

byte    marst     30- 30

int     bplg      31- 33

byte    hispang   34- 34

byte    educrec   35- 35

}

(5527209 observations read)

 

.

. /*Important: you need to put the .dat, .do, and .dct files all in one folder/directory

 

.

.

end of do-file

 

. describe

 

Contains data

  obs:     5,527,209                         

 vars:            13                         

 size:   154,761,852 (75.4% of memory free)

-------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

-------------------------------------------------------------------------------

serial          str8   %9s                    Household serial number

hhwt            int    %8.0g       hhwtlbl    Household weight

statefip        byte   %57.0g      statefiplbl

                                              State (FIPS code)

metro           byte   %36.0g      metrolbl   Metropolitan status

metareag        int    %43.0g      metareaglbl

                                              Metropolitan area -- General

related         int    %60.0g      relatedlbl

                                              Relationship to household head

                                                -- Detailed

age             byte   %8.0g       agelbl     Age

sex             byte   %8.0g       sexlbl     Sex

raceg           byte   %23.0g      raceglbl   Race -- General

marst           byte   %26.0g      marstlbl   Marital status

bplg            int    %27.0g      bplglbl    Birthplace -- General

hispang         byte   %19.0g      hispanglbl

                                              Hispanic origin -- General

educrec         byte   %23.0g      educreclbl

                                              Educational attainment recode

-------------------------------------------------------------------------------

Sorted by: 

     Note:  dataset has changed since last saved

 

. save "C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta"

file C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta sav

> ed

 

. tabulate related

 

      Relationship to household head -- |

                               Detailed |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

                       Head/Householder |  5,273,998       95.42       95.42

                      Unmarried Partner |    253,211        4.58      100.00

----------------------------------------+-----------------------------------

                                  Total |  5,527,209      100.00

 

 

 

 

. set mem 400m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.733M

    set memory          400M    max. data space                400.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               402.987M

 

. use "C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta",

>  clear

 

. tabulate related

 

      Relationship to household head -- |

                               Detailed |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

                       Head/Householder |  5,273,998       95.42       95.42

                      Unmarried Partner |    253,211        4.58      100.00

----------------------------------------+-----------------------------------

                                  Total |  5,527,209      100.00

 

. tabulate related, nolab

 

Relationshi |

       p to |

  household |

    head -- |

   Detailed |      Freq.     Percent        Cum.

------------+-----------------------------------

        101 |  5,273,998       95.42       95.42

       1114 |    253,211        4.58      100.00

------------+-----------------------------------

      Total |  5,527,209      100.00

 

. keep if related==1114

(5273998 observations deleted)

THESE ARE THE PARTNERS

 

. save "C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta"

file C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta saved

 

. tabulate related

 

      Relationship to household head -- |

                               Detailed |      Freq.     Percent        Cum.

----------------------------------------+-----------------------------------

                      Unmarried Partner |    253,211      100.00      100.00

----------------------------------------+-----------------------------------

                                  Total |    253,211      100.00

 

. drop related

 

. rename age page

 

. rename sex psex

 

. rename raceg praceg

 

. rename marst pmarst

 

. rename hispang phispang

 

. rename educrec peducrec

RENAME VARIABLES TO BE SURE THAT YOU CAN IDENTIFY PARTNER'S CHARACTERISTICS

 

. describe

 

Contains data from C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta

  obs:       253,211                         

 vars:            12                          3 Feb 2004 11:21

 size:     6,583,486 (98.4% of memory free)

-------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

-------------------------------------------------------------------------------

serial          str8   %9s                    Household serial number

hhwt            int    %8.0g       hhwtlbl    Household weight

statefip        byte   %57.0g      statefiplbl

                                              State (FIPS code)

metro           byte   %36.0g      metrolbl   Metropolitan status

metareag        int    %43.0g      metareaglbl

                                              Metropolitan area -- General

page            byte   %8.0g       agelbl     Age

psex            byte   %8.0g       sexlbl     Sex

praceg          byte   %23.0g      raceglbl   Race -- General

pmarst          byte   %26.0g      marstlbl   Marital status

bplg            int    %27.0g      bplglbl    Birthplace -- General

phispang        byte   %19.0g      hispanglbl

                                              Hispanic origin -- General

peducrec        byte   %23.0g      educreclbl

                                              Educational attainment recode

-------------------------------------------------------------------------------

Sorted by: 

     Note:  dataset has changed since last saved

 

. gen str10 cupid=string(statefip)+serial

MAKE THE COUPLE ID

 

. sort cupid

 

. drop serial

 

. rename bplg pbplg

 

SORT BY COUPLE ID, THEN SAVE, THEN GO BACK AND MAKE THE OTHER DATASET OF HOUSEHOLDERS

 

. save "C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta", replace

file C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta saved

 

. clear all

 

. use "C:\AAA Miker Data folder\2000 5% for hh and partner\all heads and partners individual.dta",

>  clear

 

. keep if related==101

(253211 observations deleted)

RELATED==101 ARE THE HEADS OF HH

 

. drop  hhwt metro metareag

 

. gen str10 cupid=string(statefip)+serial

 

. drop  serial statefip related

 

. rename age hage

 

. rename sex hsex

 

. rename raceg hraceg

 

. rename marst hmarst

 

. rename bplg hbplg

 

. rename hispang hhispang

 

. rename educrec heducrec

 

. sort cupid

 

. merge cupid using "C:\AAA Miker Data folder\2000 5% for hh and partner\partners.dta"

(label statefiplbl already defined)

(label metrolbl already defined)

(label metareaglbl already defined)

(label sexlbl already defined)

(label raceglbl already defined)

(label marstlbl already defined)

(label bplglbl already defined)

(label hispanglbl already defined)

(label educreclbl already defined)

 

. tabulate _merge

 

     _merge |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |  5,020,787       95.20       95.20

          3 |    253,211        4.80      100.00

------------+-----------------------------------

      Total |  5,273,998      100.00

 

. keep if _merge==3

(5020787 observations deleted)

 

MERGE==3 ARE THE HEADS OF HOUSEHOLD WHO HAVE PARTNERS MATCHED TO THEM. ALL THE OTHER HEADS OF HH WE THROW AWAY.

 

. tabulate _merge

 

     _merge |      Freq.     Percent        Cum.

------------+-----------------------------------

          3 |    253,211      100.00      100.00

------------+-----------------------------------

      Total |    253,211      100.00

 

. drop _merge cupid

 

. save "C:\AAA Miker Data folder\2000 5% for hh and partner\2000 hh and partner couples.dta"

file C:\AAA Miker Data folder\2000 5% for hh and partner\2000 hh and partner couples.dta saved

 

. describe

 

Contains data from C:\AAA Miker Data folder\2000 5% for hh and partner\2000 hh and partner couples

> .dta

  obs:       253,211                         

 vars:            18                          3 Feb 2004 11:28

 size:     6,583,486 (98.4% of memory free)

-------------------------------------------------------------------------------

              storage  display     value

variable name   type   format      label      variable label

-------------------------------------------------------------------------------

hage            byte   %8.0g       agelbl     Age

hsex            byte   %8.0g       sexlbl     Sex

hraceg          byte   %23.0g      raceglbl   Race -- General

hmarst          byte   %26.0g      marstlbl   Marital status

hbplg           int    %27.0g      bplglbl    Birthplace -- General

hhispang        byte   %19.0g      hispanglbl

                                              Hispanic origin -- General

heducrec        byte   %23.0g      educreclbl

                                              Educational attainment recode

hhwt            int    %8.0g       hhwtlbl    Household weight

statefip        byte   %57.0g      statefiplbl

                                              State (FIPS code)

metro           byte   %36.0g      metrolbl   Metropolitan status

metareag        int    %43.0g      metareaglbl

                                              Metropolitan area -- General

page            byte   %8.0g       agelbl     Age

psex            byte   %8.0g       sexlbl     Sex

praceg          byte   %23.0g      raceglbl   Race -- General

pmarst          byte   %26.0g      marstlbl   Marital status

pbplg           int    %27.0g      bplglbl    Birthplace -- General

phispang        byte   %19.0g      hispanglbl

                                              Hispanic origin -- General

peducrec        byte   %23.0g      educreclbl

                                              Educational attainment recode

-------------------------------------------------------------------------------

Sorted by: 

 

. clear all