Date of Award


Document Type


Degree Name

Master of Science in Information Systems and Technology


Information and Decision Sciences

First Reader/Committee Chair

Shayo, Conrad


Heart disease is the leading cause of death for people around the world today. Diagnosis for various forms of heart disease can be detected with numerous medical tests, however, predicting heart disease without such tests is very difficult. Machine learning can help process medical big data and provide hidden knowledge which otherwise would not be possible with the naked eye. The aim of this project is to explore how machine learning algorithms can be used in predicting heart disease by building an optimized model. The research questions are; 1) What Machine learning algorithms are used in the diagnosis of heart disease? 2) How can Machine Learning techniques be used to minimize misdiagnosis (additional tests, and wrong treatment all resulting in greater monetary impact to the patient), 3) How can Machine Learning be used to detect early abnormalities, thus benefiting both patients and the healthcare system?

We collected our dataset from the UCI repository and used Random Forest Classification algorithm for predicting heart disease. Then, we modified one of the hyperparameters called ‘N_Estimator’ to improve the model further. The findings and conclusion for each question are; 1) Machine learning algorithms used in predicting heart disease are Naïve Bayes, Decision Trees, Support Vector Machine, Bagging and Boosting, and RandomForest, concluding that these algorithms can achieve high accuracy in predicting heart disease. 2) Machine learning algorithms can analyze a large amount of data to assist medical professionals in making more informed decisions cost-effectively. 3) Machine Learning algorithms allowed us to analyze clinical data, draw relationships between diagnostic variables, design the predictive model, and tests it against the new case. The predictive model achieved an accuracy of 89.4 percent using RandomForest Classifier’s default setting to predict heart diseases. Furthermore, emerging areas for future research that emerged from this study include the opportunity for training and testing using our model with a larger dataset and modifying different hyperparameters for further improvement.