Ghost Story of Heteroskedasticity

The word itself was so scary to me that I could never gather the courage to explore about it. But then I gradually overcame my fear and decided to study the concept of Heteroskedasticity.

I have  tried to make this concept as simple as possible for you so the next time if you see a person with a Hetroskedasticity phobia you are able to calm him down.

So are ready to dare ?



What is Hetroskedasticity? 


Hetroskedasticity simply means "not having constant variance".

Consider few families with low income and few families with high income spending on vacations. Families with low income will spend relatively less on vacations and the variations in the expenditures will be small. But for high income families the amount spent on the vacations will be higher but there will be greater variability among such families resulting in Hetroskedasticity.

If we try to recall the OLS  (Ordinary least squares) assumption that states "There should be constant variance in residuals (Residual is the difference between the predicted and the actual values of Y), but in case of Hetroskedasticity there is no constant variance.

So What if Hetroskedasticity is there?


Now, that we have got to know a bit about this scary term let us discuss how Hetroskedasticity affects. Just to let you guys know, Hetroskedasticity is often a byproduct of other violations of Linear Regression assumptions. For this article, however, we are assuming that all other assumptions have been met except the assumption of no Hetroskedasticity.

1. The very first thing I want to clear out here is that Hetroskedasticity does not result in biased parameter estimates.

2. When there is Hetroskedasticity the standard error are biased which in turn results in biased test statistics and confidence interval, since the OLS estimators are no longer BLUE (best linear unbiased estimates).

How to detect Hetroskedasticity ?


Hetroskedasticity is clearly visible through scatter plots of residual terms. A scatter plot with Hetroskedasticity will look somewhat like these.






Whereas with no Hetroskedasticity i.e. Hemoskedasticity among the residuals, the scatter plot would look like this.





If you want to see these scatter plots in SAS you may use Plot residual * predicted statement with proc reg. This statement will automatically plot residual plots for all the independent variables you have mentioned in the model statement of Proc reg.


There is another test called White test which is statistically used to test Hetroskedasticity. The white test uses hypothesis testing to check for the same.

H0  = Residuals are Homoskedastic
H1  = Residuals are Hetroskedastic

You can include a  spec option in the Proc Reg statement to check for Hetroskedasticity statistically in order to check this we have taken a data from sashelp.cars.


Proc reg data=sashelp.cars;
Model mpg_city=weight horsepower length/spec.;
run;
quit;

After running the above program, SAS will output several statistics, but the one of our concern is the chi square statistic.


The P value of ChiSq statistic should be greater than 0.05 in order to NOT reject our null hypothesis, which states the residuals are Homoskedastic.  With the P value of 0.2515 we do not reject our null hypothesis and our assumption of Homoskedasity is fulfilled.


How to Deal with Hetroskedasticity?

Transformation of variables will solve almost all your problem, when it comes to Hetroskedasticity.
Click to enlarge



The Box-Cox transformation analysis on data provides recommendations on the transformation on X variable on the basis of lamda value.






The table (right) shows recommended Box- Cox
transformations which can be used for the X variable.

     
So, we ideally have to do a square transformation on the X variable in this particular case as lambda value is 2.


Hope, this scary story of Hetroskedasticity is no more scary !
 



Enjoy reading our other articles and stay tuned with ...

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.