---------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web p

> ages\soc_meth_proj3\2010_logs\class_eleven.log

log type:  text

opened on:   2 Mar 2010, 14:59:04

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta",

>  clear

. regress incwage vietnam_vet male age  age_sq yrsed if age>=25 & age<=64 [aweight= perwt_rounded]

(sum of wgt is   1.4261e+08)

Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  5, 69299) = 3127.96

Model |  1.3427e+13     5  2.6853e+12           Prob > F      =  0.0000

Residual |  5.9492e+13 69299   858488914           R-squared     =  0.1841

Total |  7.2919e+13 69304  1.0522e+09           Root MSE      =   29300

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

vietnam_vet |    1035.18   532.7493     1.94   0.052    -9.007979    2079.367

male |   16607.58   228.9415    72.54   0.000     16158.85     17056.3

age |   2848.096   87.34381    32.61   0.000     2676.902     3019.29

age_sq |  -31.92762   .9924702   -32.17   0.000    -33.87286   -29.98238

yrsed |   3540.933   38.50133    91.97   0.000      3465.47    3616.395

_cons |   -88294.8   1901.336   -46.44   0.000    -92021.42   -84568.19

------------------------------------------------------------------------------

. *a hopefully familiar M5, from HW3. I invoke it because several students in the class made the following mistake when adding their own variables, that is they treated a categorical variable (in this case race) as if it were a continuous variable whose values really meant something. Stata doesn't know the difference, so you get the output below.

. regress incwage vietnam_vet male age  age_sq yrsed race if age>=25 & age<=64 [aweight= perwt_rounded]

(sum of wgt is   1.4261e+08)

Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  6, 69298) = 2612.57

Model |  1.3452e+13     6  2.2419e+12           Prob > F      =  0.0000

Residual |  5.9467e+13 69298   858139311           R-squared     =  0.1845

Total |  7.2919e+13 69304  1.0522e+09           Root MSE      =   29294

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

vietnam_vet |   945.3285      532.9     1.77   0.076    -99.15456    1989.812

male |   16596.75   228.9036    72.51   0.000      16148.1     17045.4

age |   2844.023   87.32927    32.57   0.000     2672.858    3015.189

age_sq |   -31.9051   .9922768   -32.15   0.000    -33.84996   -29.96024

yrsed |   3546.516   38.50734    92.10   0.000     3471.042     3621.99

race |  -5.353826   .9902242    -5.41   0.000    -7.294664   -3.412988

_cons |   -87499.2   1906.635   -45.89   0.000     -91236.2    -83762.2

------------------------------------------------------------------------------

. *key error that many people made, was treating a categorical variable as if it were a continuous variable whose numbers really meant something.

. tabulate race

Race |      Freq.     Percent        Cum.

--------------------------------------+-----------------------------------

White |    113,475       84.87       84.87

Black/Negro |     13,626       10.19       95.06

American Indian/Aleut/Eskimo |      1,894        1.42       96.47

Asian or Pacific Islander |      4,715        3.53      100.00

--------------------------------------+-----------------------------------

Total |    133,710      100.00

. tabulate race, nolab

Race |      Freq.     Percent        Cum.

------------+-----------------------------------

100 |    113,475       84.87       84.87

200 |     13,626       10.19       95.06

300 |      1,894        1.42       96.47

650 |      4,715        3.53      100.00

------------+-----------------------------------

Total |    133,710      100.00

*Proper syntax for a categorical variable is to put the "i." in front of the variable, to tell Stata to make the dummy variables. There are 4 categories, so we get 3 dummy variables. The below syntax is proper and correct. If the variable is categorical, and it is not already coded 0-1, then you need to tell Stata to make dummy variables.

. regress incwage vietnam_vet male age  age_sq yrsed i.race if age>=25 & age<=64 [aweight= perwt_rounded]

(sum of wgt is   1.4261e+08)

Source |       SS       df       MS              Number of obs =   69305

-------------+------------------------------           F(  8, 69296) = 1982.42

Model |  1.3580e+13     8  1.6976e+12           Prob > F      =  0.0000

Residual |  5.9339e+13 69296   856305835           R-squared     =  0.1862

Total |  7.2919e+13 69304  1.0522e+09           Root MSE      =   29263

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

vietnam_vet |   1009.395   532.4176     1.90   0.058     -34.1431    2052.932

male |    16524.6   228.7384    72.24   0.000     16076.28    16972.93

age |   2844.569   87.23672    32.61   0.000     2673.585    3015.553

age_sq |   -31.9709   .9912431   -32.25   0.000    -33.91373   -30.02807

yrsed |     3510.6   38.58087    90.99   0.000     3434.981    3586.218

|

race |

200  |  -4288.975   342.9176   -12.51   0.000    -4961.093   -3616.857

300  |  -6117.347   1183.335    -5.17   0.000    -8436.682   -3798.013

650  |  -1226.535   562.7613    -2.18   0.029    -2329.546   -123.5237

|

_cons |  -86985.25   1901.461   -45.75   0.000    -90712.11   -83258.39

------------------------------------------------------------------------------

*Alternatively, we could use desmat:

. desmat: regress incwage vietnam_vet male @age  @age_sq @yrsed race if age>=25 & age<=64 [aweight= perwt_rounded]

---------------------------------------------------------------------------------

Linear regression

---------------------------------------------------------------------------------

Dependent variable                                                    incwage

Number of observations:                                                 69305

aweight:                                                        perwt_rounded

F statistic:                                                         1982.417

Model degrees of freedom:                                                   8

Residual degrees of freedom:                                            69296

R-squared:                                                              0.186

Root MSE                                                            29262.704

Prob:                                                                   0.000

---------------------------------------------------------------------------------

nr Effect                                                      Coeff        s.e.

---------------------------------------------------------------------------------

vietnam_vet

1    1                                                      1009.395     532.418

male

2    male                                                  16524.603**   228.738

3  Age                                                      2844.569**    87.237

4  age_sq                                                    -31.971**     0.991

5  based on educrec                                         3510.600**    38.581

race

6    Black/Negro                                           -4288.975**   342.918

7    American Indian/Aleut/Eskimo                          -6117.347**  1183.335

8    Asian or Pacific Islander                             -1226.535*    562.761

9  _cons                                                  -86985.250**  1901.461

---------------------------------------------------------------------------------

*  p < .05

** p < .01

.

. exit, clear