---------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web p
> ages\soc_meth_proj3\2010_logs\class_eleven.log
log type: text
opened on: 2 Mar 2010, 14:59:04
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta",
> clear
. regress incwage vietnam_vet male age age_sq yrsed if age>=25 & age<=64 [aweight= perwt_rounded]
(sum of wgt is 1.4261e+08)
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 5, 69299) = 3127.96
Model | 1.3427e+13 5 2.6853e+12 Prob > F = 0.0000
Residual | 5.9492e+13 69299 858488914 R-squared = 0.1841
-------------+------------------------------ Adj R-squared = 0.1841
Total | 7.2919e+13 69304 1.0522e+09 Root MSE = 29300
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
vietnam_vet | 1035.18 532.7493 1.94 0.052 -9.007979 2079.367
male | 16607.58 228.9415 72.54 0.000 16158.85 17056.3
age | 2848.096 87.34381 32.61 0.000 2676.902 3019.29
age_sq | -31.92762 .9924702 -32.17 0.000 -33.87286 -29.98238
yrsed | 3540.933 38.50133 91.97 0.000 3465.47 3616.395
_cons | -88294.8 1901.336 -46.44 0.000 -92021.42 -84568.19
------------------------------------------------------------------------------
. *a hopefully familiar M5, from HW3. I invoke it because several students in the class made the following mistake when adding their own variables, that is they treated a categorical variable (in this case race) as if it were a continuous variable whose values really meant something. Stata doesn't know the difference, so you get the output below.
. regress incwage vietnam_vet male age age_sq yrsed race if age>=25 & age<=64 [aweight= perwt_rounded]
(sum of wgt is 1.4261e+08)
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 6, 69298) = 2612.57
Model | 1.3452e+13 6 2.2419e+12 Prob > F = 0.0000
Residual | 5.9467e+13 69298 858139311 R-squared = 0.1845
-------------+------------------------------ Adj R-squared = 0.1844
Total | 7.2919e+13 69304 1.0522e+09 Root MSE = 29294
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
vietnam_vet | 945.3285 532.9 1.77 0.076 -99.15456 1989.812
male | 16596.75 228.9036 72.51 0.000 16148.1 17045.4
age | 2844.023 87.32927 32.57 0.000 2672.858 3015.189
age_sq | -31.9051 .9922768 -32.15 0.000 -33.84996 -29.96024
yrsed | 3546.516 38.50734 92.10 0.000 3471.042 3621.99
race | -5.353826 .9902242 -5.41 0.000 -7.294664 -3.412988
_cons | -87499.2 1906.635 -45.89 0.000 -91236.2 -83762.2
------------------------------------------------------------------------------
. *key error that many people made, was treating a categorical variable as if it were a continuous variable whose numbers really meant something.
. tabulate race
Race | Freq. Percent Cum.
--------------------------------------+-----------------------------------
White | 113,475 84.87 84.87
Black/Negro | 13,626 10.19 95.06
American Indian/Aleut/Eskimo | 1,894 1.42 96.47
Asian or Pacific Islander | 4,715 3.53 100.00
--------------------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate race, nolab
Race | Freq. Percent Cum.
------------+-----------------------------------
100 | 113,475 84.87 84.87
200 | 13,626 10.19 95.06
300 | 1,894 1.42 96.47
650 | 4,715 3.53 100.00
------------+-----------------------------------
Total | 133,710 100.00
*Proper syntax for a categorical variable is to put the "i." in front of the variable, to tell Stata to make the dummy variables. There are 4 categories, so we get 3 dummy variables. The below syntax is proper and correct. If the variable is categorical, and it is not already coded 0-1, then you need to tell Stata to make dummy variables.
. regress incwage vietnam_vet male age age_sq yrsed i.race if age>=25 & age<=64 [aweight= perwt_rounded]
(sum of wgt is 1.4261e+08)
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 8, 69296) = 1982.42
Model | 1.3580e+13 8 1.6976e+12 Prob > F = 0.0000
Residual | 5.9339e+13 69296 856305835 R-squared = 0.1862
-------------+------------------------------ Adj R-squared = 0.1861
Total | 7.2919e+13 69304 1.0522e+09 Root MSE = 29263
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
vietnam_vet | 1009.395 532.4176 1.90 0.058 -34.1431 2052.932
male | 16524.6 228.7384 72.24 0.000 16076.28 16972.93
age | 2844.569 87.23672 32.61 0.000 2673.585 3015.553
age_sq | -31.9709 .9912431 -32.25 0.000 -33.91373 -30.02807
yrsed | 3510.6 38.58087 90.99 0.000 3434.981 3586.218
|
race |
200 | -4288.975 342.9176 -12.51 0.000 -4961.093 -3616.857
300 | -6117.347 1183.335 -5.17 0.000 -8436.682 -3798.013
650 | -1226.535 562.7613 -2.18 0.029 -2329.546 -123.5237
|
_cons | -86985.25 1901.461 -45.75 0.000 -90712.11 -83258.39
------------------------------------------------------------------------------
*Alternatively, we could use desmat:
. desmat: regress incwage vietnam_vet male @age @age_sq @yrsed race if age>=25 & age<=64 [aweight= perwt_rounded]
---------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 69305
aweight: perwt_rounded
F statistic: 1982.417
Model degrees of freedom: 8
Residual degrees of freedom: 69296
R-squared: 0.186
Adjusted R-squared: 0.186
Root MSE 29262.704
Prob: 0.000
---------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------
vietnam_vet
1 1 1009.395 532.418
male
2 male 16524.603** 228.738
3 Age 2844.569** 87.237
4 age_sq -31.971** 0.991
5 based on educrec 3510.600** 38.581
race
6 Black/Negro -4288.975** 342.918
7 American Indian/Aleut/Eskimo -6117.347** 1183.335
8 Asian or Pacific Islander -1226.535* 562.761
9 _cons -86985.250** 1901.461
---------------------------------------------------------------------------------
* p < .05
** p < .01
.
. exit, clear