This project implements a Conditional Variational Autoencoder (CVAE) to learn a structured latent representation of the Fashion-MNIST dataset. Unlike standard VAEs, this model allows for class-conditional image generation, meaning we can control exactly which type of clothing the model generates.
The primary objective is to build a generative model capable of:
- Disentangling class labels from style variations in the latent space.
- Generating high-quality images conditioned on specific class labels (e.g., forcing the model to generate a "Sneaker" or a "Dress").
- Visualizing how the model organizes data in the latent dimension.
The CVAE incorporates the class label (
-
Encoder: Maps input
$x$ and condition$c$ to a latent distribution$q(z|x,c)$ . -
Decoder: Reconstructs the image from latent vector
$z$ and condition$c$ , approximating$p(x|z,c)$ .
This conditioning forces the latent space to encode style and intra-class variation, rather than the class identity itself.
The plot below visualizes the 2D latent space of the CVAE on the training data. Each point represents an encoded image, colored by its ground truth class label.
Observation: Notice how the classes are distributed. Because we explicitly condition on the class label, the latent space
Below is a visualization of the model's generative capabilities. These images were produced by sampling random noise from the latent space and conditioning the decoder on specific class labels.
The image above demonstrates the model's ability to generate distinct fashion items for specific classes while maintaining the structural integrity of the objects.
- Python 3.8+
- PyTorch
- Matplotlib, NumPy