Common Issues In Training ML Model
There are well-known issues faced by many machine learning developer in their day-to-day life.
Data Quality Issues:
- Getting the data for specific needs is not an easy task for data scientists.
- If data is obtained but there are still many issues so we need to preprocess them and convert them into problem fit.
- We need to apply the various methods by using the pandas and sklearn library for preprocessing the data.
- Data has many missing values, mismatched values, Outliers, and so on.
Feature Selection and Engineering
- The feature simply means columns in data.
- Lots of features are not good for machine learning models because all features are not equally important for prediction.
- For that, we need to apply feature engineering. Get the features only that are important for the specific problem.
- Some of the methods for Feature selection:
- Correlation coefficient
- Fisher’s Test
- Information Gain
Overfitting and Underfitting
- Overfitting and Underfitting are well-known issues in machine learning.
- Overfitting means the model is performing best in training but not well in testing data.
- Underfitting means the model is not performing well in training data.
- These are issues caused by insufficient data.
Model Complexity
- Selecting an appropriate model architecture
- Controlling model complexity to avoid overfitting
Exploding and Vanishing Gradients
- This obstacle was a major barrier to training large networks.
- This problem is more prevalent in deep networks with many layers, such as deep neural networks (DNNs) and recurrent neural networks (RNNs).
Transfer Learning Challenges
- Choosing the specific pre-trained model for our use cases.
- Finetuning the pre-trained model is also hard.
Data Leakage
- Unintentional inclusion of information from the test set in the training process
Deployment Challenges
- Many large models are not easily handled in production and also need more space for it.
Thanks for feedback.