•  
  •  
 

Communications of the IIMA

Abstract

Email spam detection and filtering are crucial security measures in all organizations. It is applied to filter unsolicited messages; most of the time, they comprise a large portion of harmful messages. Machine learning algorithms, specifically classification algorithms, are used to filter and detect if the email is spam or not spam. These algorithms entail training models on labelled data to predict whether an email is spam or not based on its features. In particular, traditional classification machine learning algorithms have been applied for decades but proved ineffective against fast-evolving spam emails. In this research, ensemble techniques by using the meta-learning approach are introduced to reduce the problem of misclassification of spam email and increase the performance of the combined model. This approach is based on combining different classification models to enhance the performance of detecting the spam emails by aggregating different algorithms to reduce false positives and false negative rates, and increase the accuracy of the combined model.

The paper proposed ensemble techniques where various machine-learning algorithms are combined to improve the accuracy and strength of spam detection systems. Using different algorithms, it tries to create an appropriate systematic behaviour to increase the detection rates and reduce the number of misclassification cases. In this research, four machine learning algorithms were selected to build the meta-learning model; these algorithms have been chosen based on their proven effectiveness in spam detection systems, such as Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and K-Nearest Neighbours (KNN). The selected algorithms were applied individually on different datasets. Subsequently, an ensemble model was created using the stacking method to collect all the predictions of the models then aggregate and use them as input features for the final classifier that is based on the Logistic Regression algorithm.

This study demonstrates the effectiveness of an ensemble approach for email spam detection by aggregating multiple weak machine learning algorithms to produce a strong machine learning model. The purpose of this research is to enhance the accuracy and robustness of the predictive model to detect spam emails. As a result, the proposed approach produced a better performance with 95.8% accuracy.

Share

COinS