Comparison of Classification Algorithms (LR, DT, RF, SVM, knn)

Jouneid Raza
3 min readFeb 6, 2020

--

What is classification.

Classification refers to class, where we need to identify the object that by which class he belongs to. Like when we have training data and our aim is to predict target variable having classes. For example we want to predict the result of exam like ‘Pass’ or ‘Fail’. Or if we are working on e-commerce project we may need to predict product name for some use case.

In this scenario we have defined boundaries and we only need to predict class of given objects. Basically classification can be categorized into two types.

  1. Binary classification.

Classification task with possible only two outcomes, like Pass or fail etc. Logistic regression is example of this type problem.

2. Multi-class classification.

Classification task with more then two classes, like if we want to predict the winning team for this world cup. Its a multi class problem where each team represents a class. For these kind of problems we have algorithms like Decision Tree, Random Forests.

Each algorithm have it own properties which can be fit in a specific problem or may not be good option for any other use case. So we always try different algorithms to meet better accuracy and reliable model for our project.

How to choose machine learning model for your problem.

several factors have impact on our decision to select best algorithm. Some problems are unique that specific algorithms are defines for these problems like recommendation system. Or in other cases these points can be helpful to decide best algorithm.

  1. Study your data.
  2. Understand your business problem.
  3. Always keep in mind your constraints by the data or business.
  4. Accuracy or speed, which really matters for your case.

Comparison of Classification algorithms.

  1. Speed

Depend o nature of data, size and dimension, For fast training Logistic regression, Naive Bayes are good classification algorithms. Random forest are slow at training. Knn is comparatively slower then logistic regression.

Naive Bayes are much faster then knn. Decision tree is faster due to KNN expensive real time execution.

2. Memory

Knn is memory intensive and costly for training. As it have to keep track of all training data and find the neighbor nodes. Naive Bayes works well with small datasets.

3. Flexibility.

Logistic regression is not flexible enough to capture more complex relationships. Decision tree supports non linearity. SVM supports both linear and non linear solutions.

Knn is better then linear regression when the data have high SNR. Random forest is more robust and accurate then decision trees.

Cheers :)

Feel free to contact me at:
LinkedIn https://www.linkedin.com/in/junaidraza52/
Whatsapp +92–3225847078
Instagram https://www.instagram.com/iamjunaidrana/

Happy Learning :)

--

--

Jouneid Raza
Jouneid Raza

Written by Jouneid Raza

With 8 years of industry expertise, I am a seasoned data engineer specializing in data engineering with diverse domain experiences.

No responses yet