top of page
Search

15 Steps towards solving problem statement using Machine learning

  • Writer: Swarup Kumar
    Swarup Kumar
  • Mar 28, 2021
  • 1 min read

Updated: Oct 15, 2021


ree

1. Exploratory Data Analysis

- Check variable types, shape

- Histogram, box plot, correlation

- Use Tableau or Power BI for detailed analysis

2. Correlation / Multi collinearity

Drop / combine variables

3. Check and treat Outliers (replace outliers with NA)

4. Check and treat NA or null values

Create multiple combination of datasets

- original (dropping NA)

- treated ONLY with NA/null - mice, cooks distance

- treated with Outliers - capping, mean

4. Binning, Log

*Feed each one to algorithm and select the best one based on metrics

5. Treat factorial or categorical variable with appropriate encoders

6. Feature selection using principal component analysis or linear discriminant analysis

7. Split the dataset into train (0.7), test (0.2) and holdout(0.1)

8. Run each dataset through multiple algorithm templates

9. Pull insights from multiple algorithms together for business insights

For Regression (caret package)

- Linear regression

- Neural network

- C45

- CTREE

- Random Forest

For Classification

- Logistic regression

- Naive bayes

- SVM (Support Vector Machine)

- C45

- CTREE

- Random Forest

10. Check for relevant metrics

Linear regression- RMSE, R2, Adjusted squared error, MAPE

Classification - Accuracy,ROC,Sensitivity, Specificity, F1, Precision

11. Create library functions for steps 2-12 (as above) and maintain it in a template file

12. Give business insights based on data

13. Validate with real time data

14. Make the model to read newer data and continuous

15. Revise the model to maintain metrics




Comments


  • Facebook
  • X
  • LinkedIn
bottom of page