Date of Award


Document Type


Degree Name

Master of Science in Computer Science


School of Computer Science and Engineering

First Reader/Committee Chair

Haiyan Qiao


Data clustering is a data analysis technique that groups data based on a measure of similarity. When data is well clustered the similarities between the objects in the same group are high, while the similarities between objects in different groups are low. The data clustering technique is widely applied in a variety of areas such as bioinformatics, image segmentation and market research.

This project conducted an in-depth study on data clustering with focus on density-based clustering methods. The latest density-based (CFSFDP) algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively larger distance from points with higher densities. This method has been examined, experimented, and improved. These methods (KNN-based, Gaussian Kernel-based and Iterative Gaussian Kernel-based) are applied in this project to improve (CFSFDP) density-based clustering. The methods are applied to four milestone datasets and the results are analyzed and compared.