Date of Award
8-2025
Document Type
Project
Degree Name
Master of Science in Computer Science
Department
School of Computer Science and Engineering
First Reader/Committee Chair
Haiyan Qiao
Abstract
This project explores the usage of a late fusion deep learning architecture to predict the geographic origin of music. Mel-Frequency Cepstral Coefficients (MFCCs) and the language of the music sample are used as features. MFCCs were extracted from audio files to capture sound features. The language was identified using OpenAI’s Whisper model to provide additional context. A late fusion neural network architecture combining Long Short-Term Memory (LSTM) layers for sequential MFCC input and dense layers for non-sequential language features were employed to support both classification and regression tasks. The classification model achieved an accuracy of 33.03% across 56 countries or territories, substantially outperforming the random baseline (1.79%). The regression model produced a Mean Great-Circle Error (MGCE) of 2,754.83 km. While regression offers more flexibility for geographic estimation, classification demonstrated a more promising performance with the current dataset. This work highlights the potential of multimodal learning in music-origin prediction.
Recommended Citation
Ladanyi, Fruzsina, "Predicting Music Origin with Deep Learning" (2025). Electronic Theses, Projects, and Dissertations. 2303.
https://scholarworks.lib.csusb.edu/etd/2303
Included in
Data Science Commons, Ethnomusicology Commons, Other Computer Sciences Commons, Other Music Commons