Date of Award
8-2024
Document Type
Thesis
Degree Name
Master of Science in Information Systems and Technology
Department
Information and Decision Sciences
First Reader/Committee Chair
Conrad Shayo
Abstract
This culminating experience project investigated various methods for enhancing spam detection on YouTube, a prevalent issue impacting user experience and platform integrity. The research questions addressed were: Q1) How do different spam detection methods compare regarding robustness, efficiency, and accuracy? Q2) What role do deep learning approaches like RNNs and CNNs play in improving spam comment identification? Q3) What are the unique benefits of using deep learning models for spam comment identification on YouTube? Q4) How can machine learning models be optimized for real-time spam detection on YouTube?
The study gave adequate findings that explained each research question. In the case of (Q1), while algorithms like the Naïve Bayes and Logistic Regression offered precision in identifying spam emails, the models have proven ineffectual at adapting to new forms of spam and constant enhancement in spam techniques, deep learning algorithms like the CNN and RNN offered high accuracy through their robustness due to the models' abilities of feature extraction independently from the text data. The results shown in (Q2) indicate that RNNs and CNNs are critical in transforming the level of spam detection by addressing the problem of semantic meaning and temporal relationships in comments and surpassing traditional methods. Concerning (Q3), it was pointed out that deep learning models are the most accurate, scalable, and resistant to false negatives when identifying spam comments on the videos hosted on YouTube, which helps regain users' trust and enhance the platform's security as the traffic continues to grow. (Q4) was focused on advancing machine learning models for real-time processing, using methods such as model pruning and distribution.
The findings were as follows: (Q1) found that although conventional approaches are efficient at meeting accurate results, deep learning models are highly effective in dealing with the changes in spam strategies. (Q2) pointed out that RNNs and CNNs contribute immensely to discovering spam in SM platforms due to their raw power in NLP and pattern recognition. (Q3) established that the deep learning models' accuracy, scalability, and adaptability, including CNN and RNN, are beneficial in identifying spam on YouTube due to their effectiveness in tackling the ever-evolving spam tactics. (Q4) It has emerged that the fine-tuning of machine learning models is imperative for scaling up the approaches by deploying high-end methodologies for real-time spam detection, which subserves the daunting task of training the algorithms to deal with the flood of user-generated content in the context of YouTube.
Areas of further study include analyzing other complex natural language processing methods combined with classifiers for better spam identification, improving the computational time for multi-modal learning for spam comment detection, and considering federated learning for real-time spam identification on platforms such as YouTube. These research directions are being carried out to boost the existing permutations and improve the permeate spam detection technologies in Information Systems so that they can be efficient, effective, and highly accurate systems capable of coping with the newly emerged spam technologies in flexible, transparent, and effective ways.
Recommended Citation
Pesaru, Sai Charan, "Enhancing YouTube Spam Detection" (2024). Electronic Theses, Projects, and Dissertations. 2014.
https://scholarworks.lib.csusb.edu/etd/2014