Date of Award

5-2024

Document Type

Project

Degree Name

Master of Science in Information Systems and Technology

Department

Information and Decision Sciences

First Reader/Committee Chair

Dr. Benjamin Becerra

Abstract

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an accuracy rate of 90.625%. (Q2.) Analysis using 3 tree-based machine learning algorithms including Decision Tree, Random Forest, and XG Boost results in weighted recall values of 81.73%, 90.26%, and 90.14% respectively. (Q3.) A SHAP value analysis on the 3 tree-based machine learning algorithms produced the top 20 most influential features for each, Decision Tree, Random Forest, and XG Boost model respectively. The conclusions are: (Q1.) The analysis revealed many factors contributing to hypertension occurrence, including ethnicity, BMI, physical activity levels, lifestyle habits, psychological factors, chronic factors, and comorbidities. (Q2.) Random Forest and XG Boost demonstrated superior performance in hypertension prediction, while Decision Tree exhibited comparatively lower efficacy when based on the recall value. (Q3.) The analysis of SHAP values provided valuable insights into predicting hypertension based on the tree-based machine learning algorithms, highlighting features such as BMI, ethnicity, COVID-19 status, sleep patterns, physical activity, smoking, drinking, psychological factors including stress, anxiety, depression, chronic diseases, and genetical and demographic factors as significant predictors of hypertension risk. The areas for further studies include: Analyzing various datasets spanning diverse demographics including multifaceted factors. Further utilizing advanced machine learning models to assess model performance to yield more accurate results. Advanced research to further explore Explainable AI and game theory.

Share

COinS