Skip to content

using generative deep model, namely variational auto-encoder, to reconstruct and generate new images for two famous datasets.

License

Notifications You must be signed in to change notification settings

Shahriar-0/VAE-Anime-Cartoon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VAE-Anime-Cartoon

Introduction

This notebook is a simple implementation of a Variational Autoencoder (VAE) using PyTorch. The VAE is a generative model that learns to encode and decode data.

Datasets

The model is trained on Anime Face dataset and Cartoon face dataset, both of which are famous datasets.

Some samples of the dataset are shown below:

anime sample

cartoon sample

Model

A VAE consists of two main components:

  1. Encoder: Maps the input data to a latent space.
  2. Decoder: Reconstructs the data from the latent space.

Encoder

The encoder maps the input data $x$ to a latent variable $z$. Instead of mapping $x$ to a single point in the latent space, the encoder maps $x$ to a distribution over the latent space. This is typically done using a neural network that outputs the mean $\mu$ and the standard deviation $\sigma$ of a Gaussian distribution.

The latent variable $z$ is then sampled from this distribution: $$z \sim \mathcal{N}(\mu, \sigma^2) $$

Decoder

The decoder maps the latent variable $z$ back to the data space to reconstruct the input data $x $. This is also done using a neural network.

Loss Function

The loss function of a VAE consists of two parts:

  1. Reconstruction Loss: Measures how well the decoder can reconstruct the input data from the latent variable.
  2. KL Divergence: Measures how close the learned latent distribution is to the prior distribution (usually a standard normal distribution).

The total loss is given by: $$\mathcal{L} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - \text{KL}(q(z|x) | p(z))$$

Where:

  • $q(z|x)$ is the approximate posterior distribution (output of the encoder).
  • $p(x|z)$ is the likelihood of the data given the latent variable (output of the decoder).
  • $p(z)$ is the prior distribution (usually a standard normal distribution).

Our model consists of two fully convoluted neural networks, one for the encoder and one for the decoder.

VAE model

Results

The generated images are shown below:

Reconstruction

anime reconstructed

cartoon reconstructed

Noise

anime noise

cartoon noise

Conclusion

It can be seen that using a VAE, the model can learn to reconstruct the data from the latent space, also if we give the decoder a noise input, we can generate new images which doesn't exists, this application can be useful for data augmentation or many other things.

Contributors

About

using generative deep model, namely variational auto-encoder, to reconstruct and generate new images for two famous datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published