[Method = Ordinary Least Square]
Y = Continuous ; X = Continuous
Objective is to minimise the sum of squares of the residuals(residual=difference between observation and the fitted line)
Assumptions
- Linear relationship between dependent & independent variables
- No presence of outliers
- Independent variables are independent of each other (non collinear)
- Errors, also called residuals
- Should have constant variance (homoscedasticity)
- Are independent and identically distributed (iid) ie No Autocorrelation
- Are normally distributed with a mean of 0
Tests for Assumptions :
- Linearity :
- Methods :
- Residuals vs Predicted plot / Residuals vs Actuals plot
- Corrections :
- Log transformation for strictly positive variables
- Adding regressor which is non-linear function eg x and x2
- Create new variable which is sum/product of A & B
- Methods :
- Multicollinearity
- Methods:
- Correlation Matrix
- VIF (Variance Inflation Factor)
- Methods:
VIF is calculated only on the Independent variables. It runs a series of auxiliary regressions which fetches the R2 value of Xi against other IVs.
Eg : If X2, X3, X4, have high R2 value when regressed against X1, it essentially means that X2, X3, X4 can explain a high amount of variation in X1 and it is redundant. Range = 1 to ∞ 1 < low < 5 < medium < 10 < high
- Homoscedasticity
- Methods:
- Goldfeld-Quandt test
- Scatter plot (residuals vs predicted)
- Corrections :
- Take actual or predicted values of DV and plot it against errors. The plot should be random. If there is a trend, take log of DV.
- Methods:
- Autocorrelation
- Durbin-Watson Test : Tests for serial correlations between errors
Range : 0-4 positive < 2 (uncorr elated) < negative
- Multivariate Normal
- Methods:
- Kolmogorov-Smirnov test / Shapiro-Wilk / Anderson-Darling / Jarque-Bera
- Q-Q Plot
- Histogram with fitted normal curve
- Corrections:
- Nonlinear / Log transformation
- Methods:
Dummy Variable Trap
- Include one less variable when adding dummy variables to regression.
- The excluded variable serves as the base variable.
- All the other values are a reference to the base variable.
Model Performance :
- R Square : % of variance in Y that is explained by X. It is defined as the square of correlation between Predicted and Actual values.
R2= SSEIndependent VarSSEIndependent Var + SSEErrors
- Adjusted R Square : It penalizes for adding impurity (insignificant variables) to the model
- MSE (Mean Squared Error) :
- RMSE (Root Mean Square Error) : It measures standard deviation of the residuals.
Model with the least RMSE is the best model
= sqrt (Sum of Squared Errors) / no of obs = sqrt (mean ( (Actual – Predicted)2 ))
Mean Square : Sum of squares / df
- MAE (Mean Absolute Error) : sum( |Error| ) / n Error = Actual – Predicted |Error|=Absolute Error
- MAPE (Mean Absolute Percentage Error) : { absolute (average [ (Actual – Predicted) / Actual ])} should not exceed ~ 8% – 10%
- AIC
- BIC
Loss Functions: objective is to minimise these
- MAE : Mean Absolute Error (mean of the absolute errors)
- MSE : Mean Squared Error (mean of the squared errors)
- RMSE : Root Mean Squared Error (square root of the mean of squared errors)