Sunday, June 16, 2019

|| Refreshing the statistical tools for business analytics

51. Refreshing the statistical tools for business analytics


  • Kurtosis is really a measure of the outliers (tails) of the distribution. Kurtosis is really a measure of the outliers (tails) of the distribution. ... Low kurtosis, rather than being a measure of “flatness,” indicates a distribution that is less outlier-prone than is the normal distribution
  • Standard deviation / variance to the mean
    • 1 standard deviation is 68%
    • 2 standard deviation is 95.5%
    • 3 standard deviation is 97%
  • Covariance measures the degree to which the two variables move together, eg. age and income
    • Sum((X-mean X) (Y-mean Y)) / (n-1)
  • R, the correlation coefficient measures the strength of the linear relationship between two variables
    • Sum((X-mean X) (Y-mean Y)) / Square of ((Sum(X-mean X)^2 * Sum(Y-mean Y)^2)
    • R close to zero, no correlation
  • Regression analysis is a set of processes to estimate the relationships amongst variables
  • Random error in linear regression after intercept and slope
    • = Actual - estimated using the linear regression
  • Types of regression
    • linear regression, most common in business analysis
    • logistic regression, most common in business analysis
    • polynominal regression
    • stepwise regression
    • ridge regression
    • lasso regression
    • elastic net regression
  • ANOVA, analysis of variance
    • SST, total sum of squares, Sum (actual y - mean of y)^2
    • SSE, total of sum of error, Sum (actual y - estimated y)^2
    • SSR, total of sum of regression, Sum (estimated y - mean of y)^2
    • SST = SSR + SSE
    • Standard error of estimate SEE = square (MSE) = square (SSE / N-2)
    • R squared = explained variation / total variation 
  • F test measures how well the dependent variable is explained by the independent variables collectively 
    • MSR, MSE, N (number of observations), K (number of independent variables)
    • Decisioning rules
      • reject null hypothesis if F statistics > critical value
  • Weight of evidence
    • Log (distribution of good/distribution of bad)
  • Information value
    • Sum ((DG-DB)*WOE)
  • VIF, multiple variable

No comments: