--------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_pro
> j3\2011_180B_logs\class6.log
log type: text
opened on: 10 Feb 2011, 12:12:49
. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear
*take a look at the worksheet "understanding dummy variables" in my Excel file.
. table metro if age>29 & age<65 & sex==1, contents(freq mean incwage)
----------------------------------------------------------
Metropolitan central city |
status | Freq. mean(incwage)
----------------------------+-----------------------------
Not identifiable | 94 31743.04255
Not in metro area | 6,628 27189.6465
Central city | 6,727 34445.35841
Outside central city | 11,639 43203.0348
Central city status unknown | 4,247 35557.95997
----------------------------------------------------------
. codebook metro
---------------------------------------------------------------------------------------
metro Metropolitan central city status
---------------------------------------------------------------------------------------
type: numeric (byte)
label: metrolbl
range: [0,4] units: 1
unique values: 5 missing .: 0/133710
tabulation: Freq. Numeric Label
340 0 Not identifiable
29658 1 Not in metro area
32481 2 Central city
51468 3 Outside central city
19763 4 Central city status unknown
. regress incwage ib3.metro if age>29 & age<65 & sex==1& metro!=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
metro |
1 | -16013.39 593.9852 -26.96 0.000 -17177.63 -14849.15
2 | -8757.676 591.1938 -14.81 0.000 -9916.443 -7598.91
4 | -7645.075 691.9894 -11.05 0.000 -9001.405 -6288.744
|
_cons | 43203.03 357.7942 120.75 0.000 42501.74 43904.33
------------------------------------------------------------------------------
* Here metro=3 is the comparison category (set by ib3), and note that the constant therefore is the value for metro==3, or suburbs, 43203, and everything else is compared to that.
. regress incwage ib2.metro if age>29 & age<65 & sex==1& metro!=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
metro |
1 | -7255.712 668.0533 -10.86 0.000 -8565.127 -5946.297
3 | 8757.676 591.1938 14.81 0.000 7598.91 9916.443
4 | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
|
_cons | 34445.36 470.6309 73.19 0.000 33522.9 35367.82
------------------------------------------------------------------------------
* If we change the comparison category we get different constant, and different comparison coefficients, and different t values, but the R square is the same ( 2.53%) because it is the same model
* A second syntax for dummy variables is to use the xi: syntax, and this syntax requires us to set the omitted value before hand:
. char metro[omit] 3
. xi: regress incwage i.metro if age>29 & age<65 & sex==1 & metro~=0
i.metro _Imetro_0-4 (naturally coded; _Imetro_3 omitted)
note: _Imetro_0 omitted because of collinearity
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Imetro_0 | (omitted)
_Imetro_1 | -16013.39 593.9852 -26.96 0.000 -17177.63 -14849.15
_Imetro_2 | -8757.676 591.1938 -14.81 0.000 -9916.443 -7598.91
_Imetro_4 | -7645.075 691.9894 -11.05 0.000 -9001.405 -6288.744
_cons | 43203.03 357.7942 120.75 0.000 42501.74 43904.33
------------------------------------------------------------------------------
* Same as above. But with xi Stata generates the dummy variables and also adds them to your variable list, so you can examine them and manipulate them.
. table metro, contents(mean _Imetro_1 mean _Imetro_2 mean _Imetro_4)
----------------------------------------------------------------------------
Metropolitan central city |
status | mean(_Imetr~1) mean(_Imetr~2) mean(_Imetr~4)
----------------------------+-----------------------------------------------
Not identifiable | 0 0 0
Not in metro area | 1 0 0
Central city | 0 1 0
Outside central city | 0 0 0
Central city status unknown | 0 0 1
----------------------------------------------------------------------------
* You can also look at the t statistic for any contrast among the categories which may not have been the contrast highlighted by the regression given the comparison category you picked.
. lincom _Imetro_4- _Imetro_2
( 1) - _Imetro_2 + _Imetro_4 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
------------------------------------------------------------------------------
* Here we get a contrast between metro2 and metro4, which is the same contrast and t-statistic that we got above when metro=2 was the comparison category. lincom is a post-estimation command, meaning it only can be used after regression.
* One last syntax for generating dummy variables is desmat, which you would have to install (for free). Unlike xi: and i. syntaxes which assume that predictors are continuous unless you put the i. in front of the variable to indicate a continuous predictor variable, desmat assumes all predictors are categorical, and you would need to put the @ in front of any continuous predictor. See Stata help for more details.
. *net install desmat
. desmat: regress incwage metro=ind(2) if age>29 & age<65 & sex==1 & metro~=0
---------------------------------------------------------------------------------------
Linear regression
---------------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 29241
F statistic: 252.703
Model degrees of freedom: 3
Residual degrees of freedom: 29237
R-squared: 0.025
Adjusted R-squared: 0.025
Root MSE 38600.339
Prob: 0.000
---------------------------------------------------------------------------------------
nr Effect Coeff s.e.
---------------------------------------------------------------------------------------
metro
1 Not identifiable 0.000 .
2 Central city 7255.712** 668.053
3 Outside central city 16013.388** 593.985
4 Central city status unknown 8368.313** 758.706
5 _cons 27189.646** 474.133
---------------------------------------------------------------------------------------
* p < .05
** p < .01
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s
> oc_meth_proj3\2011_180B_logs\class6.log
log type: text
closed on: 10 Feb 2011, 15:05:37
---------------------------------------------------------------------------------------