what changes in regression when you change the inputs

A new document on what changes and what remains the same in regressions, when you change the inputs

Draft, Feb 19, 2010

Given a model

Y=Const +B₁X₁+B₂X₂+...B_nX_n + Residuals

Type of Change	Effect on Coefficients (Bs)	Effect on T-statistic of that coefficient	Effect on sample size of the model	Effect on goodness of fit of the model
1) Change of units of one variable, X₁	Changes units of B₁	No change to the T-Statistic; T-statistics are unit-free	None	None
2) Inclusion of a new, formerly excluded category of variable X₁	May or may not change B₁. If B₁ was a comparison between nurses and lawyers, and the new added group are sociologists, B₁ won’t change, if there are no other predictor variables. If there are other predictor variables, all coefficients will be changed.	The T-statistic will change, if for no other reason than the joint variance of the dependent variable Y is now different.	Including new cases changes the N of the model	Yes
3) Inclusion of a new predictor variable, X_m	All the coefficients are jointly estimated, so every new variable changes all the other coefficients already in the model. This is one reason we do multiple regression, to estimate coefficient B₁ net of the effect of variable X_m.	Yes	Usually no change. That is, the inclusion of a new predictor variable will only change the sample size of the model if the new predictor variable has missing values. Any cases with missing values on any predictor variable are dropped automatically	Yes. For the R-square, any new nonzero terms must improve the fit. Adjusted R-square will get better if the new terms improve the fit, and will get worse if the new terms make no difference
4) Changing the excluded category of some variable already entered	NO. The initial output reported by the software will be different, but all of the same comparisons as before can be recovered by combining the reported Bs, and when recovered they are the same	No changes, when looking at the same comparisons	No change	No change
5) Weighting with analytic weights	Unless the weights are uniform, the weights will change the coefficients	Yes	No change to sample size using analytic weights, because analytic weights are weights rescaled to leave the sample size unchanged	Yes
6) Weighting with frequency weights	Coefficients will behave the same as with analytic weights	Dramatic changes here, because changed N will change the standard errors, and therefore also the T-statistics	Dramatic changes	Yes
7) Changing the sample size, N, of the dataset	In theory, the expected value of B is not affected by changes in sample size. In practice, if you have a different sample (larger or smaller), B will be different because of sampling variation. You can easily take a random subset of any dataset and you will find that the Bs in the random subset are different from the overall Bs	The expected values of T-statistics are proportional to the square root of N. If you quadruple the sample size, you would expect T-statistics to double, giving you greater power to reject null hypotheses. Of course, in a different sample, the actual T-statistics will not be changed by exactly square root of N, because sampling variation comes into play.	Yes (duh).	Yes, largely because of the sampling variation.