Satnam Singh


This Culminating Experience Project explores the use of machine learning algorithms to detect credit card fraud. The research questions are: Q1) How does the quality of input data, including issues such as outliers, and noise, impact the accuracy and reliability of machine failure prediction models in industrial settings? Q2) How does the integration of SMOTE with feature engineering techniques influence the overall performance of machine learning models in detecting and preventing machine failures? Q3) What is the performance of different machine learning algorithms in predicting machine failures, and which algorithm is the most effective? The research findings are: Q1) Effective outlier handling is vital for predictive maintenance as the variables distribution initially showed a right-skewed patern but after rectifying, it became more centralized, with correlations between specific sensors showing potential for further exploration. Q2) Data balancing through SMOTE and feature engineering is essential due to the rarity of actual failure instances. Substantial challenges are observed when predicting 'Failure' instances, with a lower true positive rate (73%), resulting in low precision (0.02) and recall (0.73) for 'Failure' predictions. This is further reflected in the low F1-Score (0.03) for 'Failure,' indicating a trade-off between precision and recall. Despite a commendable overall accuracy of 94%, the class imbalance within the dataset (92,200 'Running' instances vs. 126 'Failure' instances) remains a contributing factor to the model's limitations. Q3) Machine learning algorithm performance varies, with Catboost excelling in accuracy and failure detection. The choice of algorithm and continuous model refinement are critical for enhanced predictive accuracy in industrial contexts. The main conclusions are: Q1) Addressing outliers in data preprocessing significantly enhances the accuracy of machine failure prediction models. Q2) focuses on addressing the issue of equipment failure parameter imbalance. It was found in the research findings that there was a significant imbalance in the failure data, with only 0.14% of the dataset representing actual failures and 99.86% of the dataset pertaining to non-failure data. This extreme class disparity can result in biased models that underperform on underrepresented classes, which is a common problem in machine learning. Q3) Catboost outperforms other algorithms in predicting machine failures with remarkable accuracy and failure detection rates of 92% accuracy and 99% times it is correct, and further exploration of diverse data and algorithms is needed for tailored industrial applications. Future research areas include advanced outlier handling, sensor relationships, and data balancing for improved model accuracy. Addressing rare failures, enhancing model performance, and exploring diverse machine learning algorithms are critical for advancing predictive maintenance.