The author of this document has limited its availability to on-campus or logged-in CSUSB users only.

Off-campus CSUSB users: To download restricted items, please log in to our proxy server with your MyCoyote username and password.

Date of Award

1-2023

Document Type

Restricted Project: Campus only access

Degree Name

Master of Science in Computer Science

Department

School of Computer Science and Engineering

First Reader/Committee Chair

Dr. Fadi Muheidat

Abstract

The term phishing is mainly defined as impersonating activities with the site which is used by hackers to steal personal as well as professional data and information. The personal data mainly includes the username, password, and account details as well as several other sensitive numbers. In today’s data, there are countless domains where phishing attacks occur continuously including online payment gateways, webmail, website of financial institutions, cloud storage systems, and many others. The goal of this study is to create a model that can identify phishing URLs from a dataset. The URL dataset is trained and tested using feature selections like URL based features, content-based features, and External based features. The data set used for the classification was sourced from the website named Mendeley Data from the article Web page phishing detection by Hannousse & Yahiouche [14], which has a collection of more than 11,000 legitimate, benign, spam, phishing, malware, and defacement URLs. We are using the supervised learning technique as we already have the labeled data. As this is a binary classification problem, we are training the classification models, like Decision Tree, Multilayer Perceptron, Feed Forward Neural Network, and Random Forest on the labeled dataset. We leverage accuracy as the evaluation metric in this project. Accuracy is defined as the ratio of correct predictions over the total number of predictions. This metric is iv also used for selecting the best model from the different classifiers that we are using. In this project, we develop three classification models that can predict if an URL is phishing or a legitimate URL. These models will be an asset for organizations and individuals to detect phishing URLs. In all the classification models that we experimented with, the random forest classifier performed better with an accuracy of 95% on test data and 96% on training data.

Share

COinS