Skip to content

datateamclub/project_monet_01

Repository files navigation

Project 1: I’m Something of a Painter Myself

This beginner-level open competition from Kaggle opened up the doors for our first steps in the UM - Data Team Club to get to work with GAN models.

Challenges

One of the main challenges was the requirements for installing tensorflow and especially tensorflow-addons. Navigating through the multiple versions of Python, TensorFlow itself, iDisplay, etc., in order for our team to be able to install it was quite an Odyssey. During this process, we found this very helpfull compatibility matrix on github that shed a light on the restrictions we should abide to.

Another challenge was getting to wrap our heads arround the algebra that supports this model layers and transformations. We relied on explanations on the internet like Kaggle on Convolution and Relu or videos such as Transposed Convolutions Explained by Johannes Frey

The Model

Our model is heavily inspired on the Monet CycleGAN Tutorial by Amy Jang. This served our purpose on learning and getting to study proficient modeling on the topic.

First we dived on the structure of the model, finding out theres a stablished stucture it should have, as the image shows below, eventhough in between the upsampling and downsampling generator layers more can be added depending on the modeling agent and its preferences.

Generator and discriminator structures used on GAN models

CycleGAN Model Overview

CycleGAN is a type of generative adversarial network (GAN) designed for unsupervised learning of image-to-image translation tasks between two different domains. In this case, it aims to translate images between two domains: photos and Monet-style paintings.

Training on 7 epochs

Generator and Discriminator

CycleGAN uses a generator and discriminator architecture where the generator learns to translate images between two domains (e.g., photos to Monet paintings) using downsampling to extract features and upsampling to generate the final output. The discriminator distinguishes between real and generated images by downsampling the input images and producing a binary classification output. This approach enables unsupervised learning of image-to-image translation tasks, facilitating the creation of artistic transformations between different visual domains.

Generator

The generator in CycleGAN (generator_fn()):

  • Purpose: Takes an input image from one domain (e.g., photo) and translates it to the other domain (e.g., Monet-style painting).
  • Architecture:
    • Input Layer: Takes an input image with dimensions defined by HEIGHT, WIDTH, and CHANNELS.
    • Downsampling (down_stack): Reduces the spatial dimensions of the input image while increasing the number of filters through convolutional layers. This helps capture high-level features and abstract representations.
    • Upsampling (up_stack): Increases the spatial dimensions back to the original size of the output domain image. This process involves convolutional transpose layers (upsampling) which help in generating detailed outputs.
    • Output Layer: Uses a convolutional transpose layer (Conv2DTranspose) to produce the final output image. It employs the tanh activation function to ensure pixel values are in the range [-1, 1], suitable for images.

Discriminator

The discriminator in CycleGAN (discriminator_fn()):

  • Purpose: Distinguishes between real images from the target domain (e.g., real Monet paintings) and fake images generated by the generator.
  • Architecture:
    • Input Layer: Takes an input image with dimensions defined by HEIGHT, WIDTH, and CHANNELS.
    • Downsampling: Reduces the spatial dimensions of the input image while increasing the number of filters through convolutional layers. This helps in extracting features that distinguish real from fake images.
    • Output Layer: Uses a final convolutional layer to produce a 30x30 output map that represents the authenticity of the input image. It outputs a single value that indicates whether the input image is real or generated.

Downsampling and Upsampling

  • Downsampling: downsampling refers to the process of reducing the spatial dimensions of the input image. This is typically achieved through convolutional layers with a larger stride (e.g., stride of 2) or pooling layers. Downsampling helps in capturing hierarchical features and reducing computational complexity.
  • Upsampling: Upsampling involves increasing the spatial dimensions of the image. In CycleGAN, this is done using convolutional transpose layers (Conv2DTranspose). Upsampling helps in generating output images of the desired size while maintaining spatial details and improving the visual quality of the generated images.

Why Use Downsampling and Upsampling?

  • Feature Extraction: Downsampling helps in extracting high-level features from the input images, which are essential for generating images that capture the style and characteristics of the target domain.
  • Spatial Transformation: Upsampling restores the spatial dimensions of the image to generate outputs of the desired size. This is crucial for maintaining the structure and details of the transformed images.

Key lessons

Understanding GAN models can initially seem intimidating, but they are more approachable than they appear. After weeks of investigation and tutorials, we discovered that grasping the structure of neural networks is fundamental to comprehending and effectively crafting these models. Unlike traditional machine learning approaches, GAN models operate with less theoretical underpinning and often rely on iterative experimentation for success. This hands-on, trial-and-error approach contrasts sharply with the more theoretical and proven methods found in classical machine learning models.

We're excited about learning about GAN models and we are ready for our next challenge!

Future steps

Definetely having ended this simple proyect paves the way for more complex modellings to be done the Team, mainly when thinking about creating our own adaptations on a GAN or even the whole model by ourselves.

References

https://www.kaggle.com/code/dimitreoliveira/introduction-to-cyclegan-monet-paintings

https://www.kaggle.com/code/amyjang/monet-cyclegan-tutorial

https://towardsdatascience.com/5-kaggle-data-sets-for-training-gans-33dc2e035161

https://www.kaggle.com/code/ryanholbrook/convolution-and-relu

https://www.kaggle.com/code/ryanholbrook/the-sliding-window

https://www.youtube.com/watch?v=xoAv6D05j7g

https://github.com/junyanz/CycleGAN

https://medium.com/@marcodelpra/generative-adversarial-networks-dba10e1b4424

https://www.youtube.com/playlist?list=PLhhyoLH6IjfwIp8bZnzX8QR30TRcHO8Va

Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2020). Unpaired image-to-image translation using cycle-consistent adversarial networks. Berkeley AI Research (BAIR) Laboratory, UC Berkeley.

About

I’m Something of a Painter Myself

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •