Image super-resolution attempts to produce pixels within the image to fill the lack of information due to its low-resolution nature. Hence, it yields a higher-resolution image. One approach to this problem is via generative networks, e.g., ESRGAN (Enhanced Super-Resolution Generative Adversarial Network). This type of GAN is built explicitly for image super-resolution by considering several losses, i.e., contextual loss (focus on the distribution of the feature), perceptual loss (pixel-wise loss), and adversarial loss. These three losses are utilized for the generator loss. On the contrary, the discriminator loss only takes into account the adversarial loss. There are two stages during training: (1) train only the generator on the perceptual loss, and (2) train the generator and discriminator based on those, as mentioned earlier. The model is trained and evaluated on the BSDS500 dataset. The final result of the predicted high-resolution image is subjected to the sharpening method by subtracting the image with the Laplacian of the image.
This link provides the code for the experiments.
Quantitatively, the performance of ESRGAN after training with 420 epochs is presented as follows:
Metrics | Test Dataset |
---|---|
PSNR | 22.587 |
SSIM | 0.586 |
MAE | 0.049 |
Generator and discriminator loss curves on the train setss.
PSNR curve on the train set and the validation set.
SSIM curve on the train set and the validation set.
MAE curve on the train set and the validation set.
The qualitative results of the model are shown below.
Qualitative comparison between the reference high-resolution images (left column), high-resolution images via bicubic interpolation (middle column), and predicted high-resolution images through ESRGAN (right column).