--------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class12.log
log type: text
opened on: 19 Feb 2013, 13:36:58
* First we are going to start with Anscombe’s data, which you will need to copy from Excel and then save to the Stata data editor, which has an icon at the top of the Stata control bar, which looks like a spreadsheet. I already have saved the Anscombe data as a Stata file, so:
. use "C:\Users\Michael\Documents\current class files\intro soc methods\anscombe.dta", clear
. regress y2 x2
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.97
Model | 27.5000024 1 27.5000024 Prob > F = 0.0022
Residual | 13.776294 9 1.53069933 R-squared = 0.6662
-------------+------------------------------ Adj R-squared = 0.6292
Total | 41.2762964 10 4.12762964 Root MSE = 1.2372
------------------------------------------------------------------------------
y2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2 | .5 .1179638 4.24 0.002 .2331475 .7668526
_cons | 3.000909 1.125303 2.67 0.026 .4552978 5.54652
------------------------------------------------------------------------------
. twoway (scatter y2 x2) (lfit y2 x2)
* In order to really follow what went on in class, you are going to need to generate the plots yourself.
. predict M2_dfbeta, dfbeta(x2)
* predict is a post-estimation command, meaning you can run it after you run a regression, which we ran above.
. twoway (scatter y2 x2) (scatter M2_dfbeta x2) (lfit y2 x2)
*the above command shows the scatter plot, the best fit line, and a plot of the dfbetas, which turn out to be greatest at the extremes of the X-distribution, because X outliers are most influential over the slope. DFbetas measure how much each point changes the slope, i.e. how different would the slope be if each point were missing…
. gen abs_m2_dfbeta=abs( M2_dfbeta)
. gsort - abs_m2_dfbeta
. list abs_m2_dfbeta M2_dfbeta x2 y2
+----------------------------------+
| abs_m2~a M2_dfbeta x2 y2 |
|----------------------------------|
1. | 1.291224 1.291224 4 3.1 |
2. | 1.291224 -1.291224 14 8.1 |
3. | .2979074 -.2979074 13 8.74 |
4. | .2979073 .2979073 5 4.74 |
5. | .1295366 -.1295366 7 7.26 |
|----------------------------------|
6. | .1295366 .1295366 11 9.26 |
7. | .0971856 -.0971856 8 8.14 |
8. | .0971856 .0971856 10 9.14 |
9. | .0340383 -.0340383 6 6.13 |
10. | .0340383 .0340383 12 9.13 |
|----------------------------------|
11. | 0 0 9 8.77 |
|
*then we generate an absolute value of the dfbetas, we sort the observations from largest to smallest on the new absolute value dfbeta variable, and we list all points in order from largest absolute dfbeta value to smallest.
. clear all
* OK, now on to the 50 state dataset, which is posted on my website in Stata format.
. use "C:\Users\Michael\Documents\current class files\intro soc methods\fifty_state_dataset.dta", clear
. describe
Contains data from C:\Users\Michael\Documents\current class files\intro soc methods\fifty_state_dataset.dta
obs: 51
vars: 11 7 Nov 2010 14:16
size: 2,703 (99.9% of memory free)
-------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------
statefip byte %57.0g statefiplbl
State (FIPS code)
US_born_propo~n float %8.0g mean(US_b~n)
seniors_propo~n float %8.0g mean(ove~65)
children_prop~n float %8.0g mean(chil~n)
NH_White_prop~n float %8.0g mean(NH_w~e)
inctot double %12.0g mean(inctot)
CPS_population long %9.0gc Freq.
incwage double %12.0g mean(incw~e)
male_proportion float %8.0g mean(male)
urban_proport~n float %8.0g mean(urban)
yrsed float %9.0g mean(yrsed)
-------------------------------------------------------------------------------------
Sorted by:
. twoway (scatter incwage US_born_proportion, mlabel(statefip)) (lfit incwage US_born_proportion)
* this scatter plot is much like one you will have to do yourself in HW4.
. regress incwage US_born_proportion
Source | SS df MS Number of obs = 51
-------------+------------------------------ F( 1, 49) = 14.84
Model | 104897990 1 104897990 Prob > F = 0.0003
Residual | 346387525 49 7069133.17 R-squared = 0.2324
-------------+------------------------------ Adj R-squared = 0.2168
Total | 451285515 50 9025710.3 Root MSE = 2658.8
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
US_born_pr~n | -24075.79 6250 -3.85 0.000 -36635.63 -11515.94
_cons | 41551.87 5790.425 7.18 0.000 29915.58 53188.17
------------------------------------------------------------------------------
. predict m1_50st_predicted
(option xb assumed; fitted values)
. gen m1_residuals=incwage- m1_50st_predicted
. gen abs_resid=abs( m1_residuals)
* we generate residuals, and then a new variable with the absolute value of residuals, and then we want to see which state have the largest residuals.
* And by the way: what is the average of all residuals? That’s right, it has to be zero.
. summarize m1_residuals
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
m1_residuals | 51 -.0000266 2632.062 -4678.628 4750.814
. predict m1_dfbeta,dfbeta( US_born_proportion)
. gen abs_dfbeta=abs( m1_dfbeta)
. gsort - abs_dfbeta
* we generate absolute values for the dfbeta variable, then we sort from largest to smallest, and list the first 9 observations (the 9 largest abs_dfbetas):
. list statefip incwage US_born_proportion abs_dfbeta m1_dfbeta m1_residuals if _n<10
+---------------------------------------------------------------------------+
| statefip incwage US_bor~n abs_df~a m1_dfbeta m1_resi~s |
|---------------------------------------------------------------------------|
1. | California 20573.98456 .730234 .7651464 .7651464 -3396.939 |
2. | Florida 17874.36641 .789128 .6546844 .6546844 -4678.628 |
3. | New York 20716.6877 .777071 .3214945 .3214945 -2126.592 |
4. | New Jersey 24990.40441 .829979 .3097655 -.3097655 3420.941 |
5. | West Virginia 13760.09 .988432 .2389347 -.2389347 -3994.506 |
|---------------------------------------------------------------------------|
6. | Montana 13746.25247 .986109 .2340724 -.2340724 -4064.271 |
7. | Hawaii 19547.63379 .818878 .2323307 .2323307 -2289.112 |
8. | Arizona 17986.0649 .856372 .1866927 .1866927 -2947.986 |
9. | Massachusetts 23697.95964 .849691 .181481 -.181481 2603.065 |
+---------------------------------------------------------------------------+
* California, Florida, and New York have the largest abs_dfbetas, but they are not the largest residuals, see below. _n is the Stata variable that accounts for the order of the observations.
. gsort - abs_resid
. list statefip incwage US_born_proportion abs_dfbeta m1_dfbeta m1_residuals if _n<10
+----------------------------------------------------------------------------------+
| statefip incwage US_bor~n abs_df~a m1_dfbeta m1_resi~s |
|----------------------------------------------------------------------------------|
1. | Connecticut 24803.26155 .89299 .1393247 -.1393247 4750.814 |
2. | Florida 17874.36641 .789128 .6546844 .6546844 -4678.628 |
3. | Maryland 24575.59697 .898431 .1125866 -.1125866 4654.167 |
4. | Alaska 23241.05063 .938505 .0549285 .0549285 4284.434 |
5. | District of Columbia 24481.98743 .879239 .1720585 -.1720585 4098.485 |
|----------------------------------------------------------------------------------|
6. | Montana 13746.25247 .986109 .2340724 -.2340724 -4064.271 |
7. | West Virginia 13760.09 .988432 .2389347 -.2389347 -3994.506 |
8. | Minnesota 22967.69873 .935135 .038025 .038025 3929.941 |
9. | New Mexico 15344.73038 .933279 .0297481 -.0297481 -3737.723 |
+----------------------------------------------------------------------------------+
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\soc_180B_win2013\class12.log
log type: text
closed on: 19 Feb 2013, 15:37:49
------------------------------------------------------------------------------------------