CycleGAN Image Translation
This repository details the implementation of Wasserstein CycleGAN, a model that stands at the forefront of image translation technology. CycleGAN operates as a generative adversarial network (GAN) that facilitates bidirectional image translation between horse and zebra domains without the need for paired examples. This innovative model allows for the seamless transformation of horse images to zebra images and vice versa. The project was developed as part of the VISION AND PERCEPTION course, taught by Professors Irene Amerini and Paolo Russo, within my Master’s in Artificial Intelligence and Robotics at the Sapienza University of Rome, showcasing the practical application of advanced concepts in a real-world scenario.
CycleGAN is a deep learning model that is specifically designed for unpaired image-to-image translation tasks. It aims to learn mappings between two different domains without the need for paired training data, meaning there is no requirement for corresponding images in the two domains during the training process.
- Unpaired (Means Un-supervised Learning): CycleGAN
- Paired (Means Supervised Learning): Pix2Pix
Why do we use CycleGAN, If we have a Supervised Learning technique Like Pix2Pix?
The reason behind that, we have to use CycleGAN, instead of Pix2Pix, because for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired example.
Mapping G : X ➡ Y
The dataset used for this implementation consists of paired horse and zebra images. Unfortunately, we cannot provide the dataset directly within this repository due to size and licensing constraints. However, you can acquire the dataset from public sources or repositories and organize it accordingly. Or download the dataset from Kaggle repository Horse2zebra Dataset
In CycleGAN, three distinct loss functions are utilized: adversarial loss, cycle consistency loss, and optional identity loss.
- The adversarial loss aims to train the generator to produce realistic images by distinguishing them from the real images using Mean Squared Error (MSE) loss.
- For the cycle consistency loss, the Wasserstein loss is employed instead of the traditional L1 norm loss. It ensures that translating an image from one domain to another and then back again generates a reconstructed image that closely resembles the original.
- Lastly, the identity loss, although optional, aims to preserve the identity of an input image from the target domain. Notably, the original paper did not utilize identity loss, leaving it as an optional component.
Here are some samples of the translated images generated by the trained CycleGAN model:
Due to limitations in the training environment (e.g., computational resources, time constraints, or limitations in the Google Colab platform), the model was trained for a reduced number of epochs, stopping at 128 epochs instead of the originally intended 200 epochs. That's the main reason, the model is not quite good.
This implementation is based on the original CycleGAN paper:
Zhu, J., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).