15 Steps towards solving problem statement using Machine learning
- Swarup Kumar

- Mar 28, 2021
- 1 min read
Updated: Oct 15, 2021

1. Exploratory Data Analysis
- Check variable types, shape
- Histogram, box plot, correlation
- Use Tableau or Power BI for detailed analysis
2. Correlation / Multi collinearity
Drop / combine variables
3. Check and treat Outliers (replace outliers with NA)
4. Check and treat NA or null values
Create multiple combination of datasets
- original (dropping NA)
- treated ONLY with NA/null - mice, cooks distance
- treated with Outliers - capping, mean
4. Binning, Log
*Feed each one to algorithm and select the best one based on metrics
5. Treat factorial or categorical variable with appropriate encoders
6. Feature selection using principal component analysis or linear discriminant analysis
7. Split the dataset into train (0.7), test (0.2) and holdout(0.1)
8. Run each dataset through multiple algorithm templates
9. Pull insights from multiple algorithms together for business insights
For Regression (caret package)
- Linear regression
- Neural network
- C45
- CTREE
- Random Forest
For Classification
- Logistic regression
- Naive bayes
- SVM (Support Vector Machine)
- C45
- CTREE
- Random Forest
10. Check for relevant metrics
Linear regression- RMSE, R2, Adjusted squared error, MAPE
Classification - Accuracy,ROC,Sensitivity, Specificity, F1, Precision
11. Create library functions for steps 2-12 (as above) and maintain it in a template file
12. Give business insights based on data
13. Validate with real time data
14. Make the model to read newer data and continuous
15. Revise the model to maintain metrics






Comments