If plots and/or tests show that the error variances depend on i there are several standard approaches to fixing the problem, depending on the nature of the dependence.
This usually arises realistically in the following situations:
If
Algebraically it is easy to see how to do the minimization. Rewrite
the quantity to be minimized as
It is possible to do weighted least squares in SAS fairly easily. As an example we consider using the SENIC data set taking the variance of RISK to be proportional to 1/CENSUS. (Motivation: RISK is an estimated proportion; variance of a Binomial proportion is inversely proportional to the sample size. This makes the weight just CENSUS.
proc reg data=scenic; model Risk = Culture Stay Nratio Chest Facil; weight Census; run ;
EDITED OUTPUT (Complete output)
Dependent Variable: RISK Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 5 12876.94280 2575.38856 17.819 0.0001 Error 107 15464.46721 144.52773 C Total 112 28341.41001 Root MSE 12.02197 R-square 0.4544 Dep Mean 4.76215 Adj R-sq 0.4289 C.V. 252.44833 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 0.468108 0.62393433 0.750 0.4547 CULTURE 1 0.030005 0.00891714 3.365 0.0011 STAY 1 0.237420 0.04444810 5.342 0.0001 NRATIO 1 0.623850 0.34803271 1.793 0.0759 CHEST 1 0.003547 0.00444160 0.799 0.4263 FACIL 1 0.008854 0.00603368 1.467 0.1452EDITED OUTPUT FOR UNWEIGHTED CASE (Complete output)
Dependent Variable: RISK Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 5 108.32717 21.66543 24.913 0.0001 Error 107 93.05266 0.86965 C Total 112 201.37982 Root MSE 0.93255 R-square 0.5379 Dep Mean 4.35487 Adj R-sq 0.5163 C.V. 21.41399 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -0.768043 0.61022741 -1.259 0.2109 CULTURE 1 0.043189 0.00984976 4.385 0.0001 STAY 1 0.233926 0.05741114 4.075 0.0001 NRATIO 1 0.672403 0.29931440 2.246 0.0267 CHEST 1 0.009179 0.00540681 1.698 0.0925 FACIL 1 0.018439 0.00629673 2.928 0.0042
Sometimes the response variable will have a distribution which makes it likely that the errors will be not very normal and that the errors will not be homoscedastic. Typical examples:
Example: For each of the doses a number of animals are treated with the corresponding dose of some drug. The number, Y, dying at dose d is Binomial with parameter h(d).
The traditional analysis method is to try transformation:
BIGGEST PROBLEM If the model was linear before transformation then it will not be linear after transformation.