- Goal: classify music tracks into genres and analyze which audio features are most discriminative
- Data: Free Music Archive dataset (official repository): https://github.com/mdeff/fma
Note: This project was initially started in collaboration with user Jakub-Karczewski, but is currently being developed independently.
Note: A comprehensive, in-depth academic report detailing the methodology, feature analysis, and model evaluations is available in the report.md file.
This is an academic project developed as part of the Data Mining course. The project focuses on music genre classification using the Free Music Archive (FMA) dataset. It combines audio signal processing, feature engineering, and machine learning techniques to build and evaluate predictive models, as well as to understand which audio features contribute most to genre discrimination.
- Data Preparation & Exploration
- Metadata Integration: Successfully mapped audio file paths to the FMA metadata database.
- Audio Processing: Extracted audio-based features using multiple libraries.
- Feature Engineering
- Extracted features using librosa.
- Extended feature set using Essentia.
- Dimensionality Reduction: Implemented PCA to reduce feature space while preserving variance
- Scaling: Standardized features to ensure optimal performance for distance-based and gradient-based algorithms.
- Machine Learning Modeling
- Implemented multiple classification algorithms:
- Logistic Regression
- SVM
- Random Forest
- LightGBM
- Model Comparison: Evaluated and compared different algorithms on the fma_small subset
- Performance Evaluation: Monitored training vs. test accuracy to detect and mitigate overfitting
- Implemented multiple classification algorithms:
- Deep Learning & Transfer Learning: Advanced classification approach using a pre-trained Transformer-based model from Hugging Face.
- Model Architecture: DistilHubert (ntu-spml/distillhubert), a distilled version of the HuBERT model that uses knowledge distillation to achieve ~95% of the original performance while being 3x faster and significantly lighter.
- Transfer Learning: The model was fine-tuned on the FMA Small dataset for 3 epochs.
- Preprocessing Pipeline: Resampling audio to 16kHz (required by the model), processing 10-second audio segments, AutoFeatureExtractor for robust input normalization
- Hardware Acceleration: Training was optimized using NVIDIA CUDA on a GTX 1070 Ti, reducing training time from ~7 hours (CPU) to approximately 1 hour.
The best results among classical algorithms were achieved using LightGBM trained on the full feature set extracted with Librosa.
Test Accuracy: 0.4719
Classification Report:
| Genre | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Electronic | 0.51 | 0.56 | 0.53 | 200 |
| Experimental | 0.40 | 0.38 | 0.39 | 200 |
| Folk | 0.44 | 0.48 | 0.46 | 200 |
| Hip-Hop | 0.57 | 0.69 | 0.63 | 200 |
| Instrumental | 0.39 | 0.38 | 0.38 | 200 |
| International | 0.52 | 0.46 | 0.49 | 200 |
| Pop | 0.28 | 0.21 | 0.24 | 200 |
| Rock | 0.59 | 0.60 | 0.60 | 200 |
| Overall Acc | 0.47 | 1600 |
Classical models performed most reliably on genres with distinct rhythmic and instrumental structures, such as Hip-Hop, Rock, and Electronic. Conversely, Pop was the most difficult genre to classify across all tested ML models, likely due to its stylistic diversity and overlapping characteristics with other genres.
Implementing a Transfer Learning approach with a pre-trained Transformer model (DistilHubert) led to a significant leap in performance.
Test Accuracy: 0.57
Classification Report:
| Genre | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Electronic | 0.60 | 0.64 | 0.62 | 101 |
| Experimental | 0.47 | 0.41 | 0.44 | 90 |
| Folk | 0.56 | 0.52 | 0.54 | 95 |
| Hip-Hop | 0.73 | 0.72 | 0.72 | 96 |
| Instrumental | 0.49 | 0.61 | 0.54 | 94 |
| International | 0.66 | 0.68 | 0.67 | 96 |
| Pop | 0.31 | 0.25 | 0.28 | 100 |
| Rock | 0.65 | 0.70 | 0.67 | 128 |
| Overall Acc | 0.57 | 800 |
The neural network was far more effective at capturing subtle audio nuances. While the general difficulty trends remained the same (Pop remained the hardest to classify), scores for every single genre improved by approximately 10 percentage points.
- Language: Python
- Libraries: Pandas, NumPy, Scikit-learn, Librosa, Essentia, Matplotlib, Seaborn, PyTorch
Music_Genres_Classification
┣ 📂classical_ML # ML modeling and analysis
┃ ┣ fma_small_essentia.ipynb
┃ ┣ fma_small_librosa_70.ipynb
┃ ┗ fma_small_librosa_full.ipynb
┣ 📂deep_learning # Training and evaluation ofthe DistilHubert transfer learning model
┃ ┣ distilhubert.ipynb
┃ ┗ dl_model_analysis.ipynb
┣ 📂feature_extraction # Feature extraction and dataset creation
┃ ┣ essentia_features.ipynb
┃ ┗ librosa_features.ipynb
┣ 📂fig # Visualizations
┣ .gitignore
┣ README.md
┗ report.md # Academic/project report and summary of findings