What’s my problem?
I don’t think data science is a kind of art stuff, essentially it is a science. For a long period, I found there was not much information about practice principles for EDA( exploratory data analysis). Yes, maybe there are bunch of helpful program snippets or stats books for reference, when it came to real projects, they were hard to use(or too many choices).
Since there is not a finest solution for me. I just combined several solutions together, and makes them look like a “piratical” solution. Here is it:
From up to down at the second layer, they are six steps before we do the modeling. The following lays are some possible methods we can use. I didn’t list all the information as too many information is gonna make things complex. For
feature selection, there is a useful tool.However, if you do it by your coding, it won’t be hard.
I listed all the contents I put into this plot for your reference.
Comprehensive data exploration with Python: https://www.kaggle.com/pmarcelino/comprehensive-data-exploration-with-python
A Feature Selection Tool for Machine Learning in Python: https://towardsdatascience.com/a-feature-selection-tool-for-machine-learning-in-python-b64dd23710f0
Data Mining: Concepts and Techniques, 3rd ed.: http://hanj.cs.illinois.edu/bk3/