Common Issues In Training ML Model


There are well-known issues faced by many machine learning developer in their day-to-day life.

 

Data Quality Issues:

  • Getting the data for specific needs is not an easy task for data scientists.
  • If data is obtained but there are still many issues so we need to preprocess them and convert them into problem fit.
  • We need to apply the various methods by using the pandas and sklearn library for preprocessing the data.
  • Data has many missing values, mismatched values, Outliers, and so on.

 

Feature Selection and Engineering

  • The feature simply means columns in data.
  • Lots of features are not good for machine learning models because all features are not equally important for prediction.
  • For that, we need to apply feature engineering. Get the features only that are important for the specific problem.
  • Some of the methods for Feature selection:
  • Correlation coefficient
  • Fisher’s Test
  • Information Gain

 

Overfitting and Underfitting

  • Overfitting and Underfitting are well-known issues in machine learning.
  • Overfitting means the model is performing best in training but not well in testing data.
  • Underfitting means the model is not performing well in training data.
  • These are issues caused by insufficient data.

 

Model Complexity

  • Selecting an appropriate model architecture
  • Controlling model complexity to avoid overfitting

 

Exploding and Vanishing Gradients

  • This obstacle was a major barrier to training large networks.
  • This problem is more prevalent in deep networks with many layers, such as deep neural networks (DNNs) and recurrent neural networks (RNNs).

 

Transfer Learning Challenges

  • Choosing the specific pre-trained model for our use cases.
  • Finetuning the pre-trained model is also hard.

 

Data Leakage

  • Unintentional inclusion of information from the test set in the training process

 

Deployment Challenges

  • Many large models are not easily handled in production and also need more space for it.


Thanks for feedback.



Read More....
Custom Logistic Regression with Implementation
Exploratory Data Analysis - EDA
Feature Selection In Machine Learning
Machine Learning Pipeline
Machine Learning: Beginner to Pro Roadmap