Regression analysis is a statistical process for estimating the relationships among variables.
Correlation only measures the strength of a linear relationship, it doesn’t tell anything regarding the relationship. Regression is used to figure out the relationship itself.
Eg: if correlation between Y and X is 0.7 then it says if X increases, 70% of the time Y increases. But regression tells if X increases by 1 unit, by how many units does Y increase.
Types of Data
- Cross-sectional : data at a single point in time with multiple variables
- Time Series : data at multiple points in time with a single variable
- Longitudinal / Pooled / Panel : cross sectional time series data
B-coefficient : if X increases by 1 unit, then Y increases by B-coeff units.
Intercept : Value of predicted Y if both X=0 and Y=0
Intercept is the value or baseline, (organic growth)
Degrees of freedom = no of obs – (dimensions of x + dimension of y) = n- (k+1) [analogy : 5 hats]
Performance / Observation Window = Observation window taken when the account was created.
Performance window when the account defaults. Eg if acct defaults in August and perf windows is between feb – july, then its is not considered as default.
A | Feb (Acct created) |
Performance window is rolling window based after the observation window.
Eg perf window for account made in jan would start from feb/march
perf window for account made in feb would start from march/apr
Linear regression is a minimization function where the model is built to minimize the sum of squared errors whereas logistic regression is a maximization function where the model tries to maximize the parameter values of every variable in such a way that it fits very well on the data