Date of Award
5-2024
Document Type
Project
Degree Name
Master of Science in Information Systems and Technology
Department
Information and Decision Sciences
First Reader/Committee Chair
Dr. Benjamin Becerra
Abstract
The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an accuracy rate of 90.625%. (Q2.) Analysis using 3 tree-based machine learning algorithms including Decision Tree, Random Forest, and XG Boost results in weighted recall values of 81.73%, 90.26%, and 90.14% respectively. (Q3.) A SHAP value analysis on the 3 tree-based machine learning algorithms produced the top 20 most influential features for each, Decision Tree, Random Forest, and XG Boost model respectively. The conclusions are: (Q1.) The analysis revealed many factors contributing to hypertension occurrence, including ethnicity, BMI, physical activity levels, lifestyle habits, psychological factors, chronic factors, and comorbidities. (Q2.) Random Forest and XG Boost demonstrated superior performance in hypertension prediction, while Decision Tree exhibited comparatively lower efficacy when based on the recall value. (Q3.) The analysis of SHAP values provided valuable insights into predicting hypertension based on the tree-based machine learning algorithms, highlighting features such as BMI, ethnicity, COVID-19 status, sleep patterns, physical activity, smoking, drinking, psychological factors including stress, anxiety, depression, chronic diseases, and genetical and demographic factors as significant predictors of hypertension risk. The areas for further studies include: Analyzing various datasets spanning diverse demographics including multifaceted factors. Further utilizing advanced machine learning models to assess model performance to yield more accurate results. Advanced research to further explore Explainable AI and game theory.
Recommended Citation
Sheth, Kruti, "CODE FOR CARE: HYPERTENSION PREDICTION IN WOMEN AGED 18-39 YEARS" (2024). Electronic Theses, Projects, and Dissertations. 1940.
https://scholarworks.lib.csusb.edu/etd/1940
Included in
Artificial Intelligence and Robotics Commons, Cardiovascular Diseases Commons, Categorical Data Analysis Commons, Collection Development and Management Commons, Databases and Information Systems Commons, Data Science Commons, Disease Modeling Commons, Health Sciences and Medical Librarianship Commons, Institutional and Historical Commons, Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons, Science and Technology Studies Commons, Statistical Methodology Commons, Statistical Models Commons