An unsupervised machine learning pipeline that discovers latent customer support topics from 1.22M tweets and automatically generates targeted FAQs.
- Extracts topics using Sentence Transformers and K-Means clustering.
- Utilizes silhouette-based dynamic k selection for optimal cluster sizing.
- Visualizes high-dimensional text data using UMAP.
- Generates automated FAQ documents for each topic cluster using an LLM.
data_preprocessing.ipynb: Handles raw data ingestion, text cleaning, and formatting.clustering_pipeline_final.ipynb: Executes the embedding generation, clustering algorithms, UMAP visualization, and LLM-driven FAQ creation.
Python, Sentence Transformers, K-Means, UMAP, LLMs