Machine Learning: Beginner to Pro Roadmap


This article will help you to understand how to go about learning machine learning step by step with the best resources and their links.

What is machine learning?

Machine learning is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming. The key idea behind machine learning is to empower computers to learn from data and improve their performance over time.

To learn about ML, you need some understanding about Maths and Statistics topics like -

For Maths,

  1. Linear Algebra
  2. Calculus
  3. Optimization
  4. Differential Equations
  5. Geometry and Trigonometry

For statistics,

  1. Descriptive Statistics
  2. Inferential Statistics
  3. Probability Distributions
  4. Bayesian Statistics
  5. Regression Analysis
  6. Analysis of Variance (ANOVA)
  7. Time Series Analysis

 

Now to get your hands dirty and do some real implementation, you need to know a programming language such as Python. After that make yourself familiar with libraries of Python like Pandas, Matplotlib, and Seaborn.

Okay, so until now everything we covered was sort of like a pre-requisite. Now comes the real important topics which is machine learning algorithms.

For a Machine Learning algorithm, get your basics clear on - 

Machine Learning types:

  1. Supervised Learning
  2. Unsupervised Learning
  3. Semi-Supervised Learning
  4. Reinforcement Learning

 

There is a list of topics for learning machine learning:

1 → Data Collection

Open Data Repositories:

Web Scraping Tools:

  • Beautiful Soup: A Python library for pulling data out of HTML and XML files.
  • Scrapy: An open-source and collaborative web crawling framework for Python.
  • Selenium: is a powerful open-source framework often used for automated testing of web applications. However, it can also be employed for web scraping tasks when you need to interact with dynamic and JavaScript-heavy websites.

APIs for Data Retrieval:

  • Many websites and services offer APIs for accessing their data. Examples include Twitter, GitHub, and various financial APIs.

Image Datasets:

Text and NLP Datasets:

Healthcare Datasets:

Finance Datasets:

  • Yahoo Finance API: Provides historical stock data.
  • Quandl: A platform for financial, economic, and alternative data.

Social Media Data:

  • Twitter API: Access to Twitter’s data for various purposes.
  • Reddit API: Access to Reddit’s data for research and analysis.

Climate and Environmental Datasets:

 

2 → Data Preprocessing

  • Rescaling
  • — MinMax Scaling
  • — Absolute Maximum Scaling
  • — Normalization
  • — Standardization
  • — Robust Scaling
  • Encoding
  • — Ordinal Encoding
  • — Label Encoding
  • — One-hot Encoding
  • Imputer
  • — Next-previous value
  • — KNN (K-Nearest Neighbours)
  • — Max — Min Value
  • — Missing value prediction
  • — Most Frequent Value
  • — Mean / Median
  • — Fixed Value
  • — Linear interpolation (Pandas Interpolate method)
  • Dimension Reduction
  • — PCA
  • — Backward Elimination (Only for Linear Regression and Logistic Regression)
  • — Forward selection
  • — Score Comparison
  • — Missing value Ratio
  • — Low Variance Filter
  • — High Correlation filter
  • — Random Forest
  • — Factor Analysis
  • Outlier Reduction
  • 1. Two Types
  • — — Outlier Detection
  • — — Outlier Removal
  • 2. Outlier Detection
  • — — Box Plot
  • — — IQR Methods
  • — — Z-score Method
  • — — Distance from the mean (Multivariate)
  • 3. Outlier Removal
  • — — Trimming
  • — — Capping (Treat outlier as missing value)
  • — — Discretization (Bining) -> By making the groups
  • Feature Engineering
  • — Feature Creation
  • — Transformation
  • — Feature extraction
  • — Feature Selection

Check Normal Distribution

  • Seaborn Distplot
  • QQ plot

Transformation (Data convert into normal distribution)

  • Logarithm Transformation (FunctionTransformer) — Good in right-skewed data
  • Reciprocal Transformation (FunctionTransformer) (1/x)
  • Box-Cox Transformation (PowerTransformer)
  • Square Transformation (FunctionTransformer) (x2) — Good in left-skewed data
  • Square root Transformation (FunctionTransformer) [root(x2)]
  • Johnson Transformation (PowerTransformer)

 

3 → Feature Management techniques

  • PCA
  • ICA
  • LDA
  • LLE
  • t-SNE

Feature Selection techniques

  • Filter Method
  • Information gain
  • Chi-square Test
  • Fisher’s score
  • Correlation Coefficient
  • Variance Threshold
  • Mean Absolute Difference
  • Dispersion Ratio
  • Wrapper Method
  • — Forward Selection
  • — Backward Elimination
  • — Bi-directional elimination (stepwise selection)
  • Embedded Method
  • — Random Forest
  • — Lasso Regularizations

 

4 → Ensemble Learning

  • Bagging
  • — Bootstrapping
  • — Aggregating
  • — Max — voting
  • — Averaging
  • Boosting
  • — Adaboost
  • — Gradient Boosting
  • — Extreme gradient boosting or XGBoost
  • Stacking

 

5 → Machine Learning Algorithms

  • 2 — Types
  • — Supervised
  • — unsupervised
  • Supervised Machine Learning (Algorithms)
  • — Regression
  • — — Linear
  • — — Polynomial
  • — — Ridge & Lasso
  • — — Gradient Descent
  • — Decision Tree
  • — Random Forest
  • — Classification
  • — — KNN
  • — — Trees
  • — — Logistic
  • — — Naive Bayes
  • — — SVM
  • Unsupervised Machine Learning
  • — Clustering
  • — — SVD
  • — — PCA
  • — — K-Means
  • — Association analysis
  • — — Apriori
  • — — FP-Growth
  • — Hidden Markov Models

 

A list of full topics If you want to see it in Google Docs then here it is.

Hope you enjoy it!



Thanks for feedback.



Read More....
Common Issues In Training ML Model
Custom Logistic Regression with Implementation
Exploratory Data Analysis - EDA
Feature Selection In Machine Learning
Machine Learning Pipeline