Photo credit: Aleksei Vasileika on Dribble
This project involved the idea of building a system/pipeline that detects four types of 3D shapes — Cube, Cylinder, Spheroid and Sphere in an image, and repository contains code files for preparing the base-line for Transfer Learning. MobileNet (V1) is taken into consideration as the baseline for the Network. Reason for using MobileNet: Since MobileNet provides a light weight infrastructure, and at the same time due to its streamlined architecture makes more parameters customizable and easy to host (if interested) on light wieight MCUs.
MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. The purpose behind using mobilenet for this use case is that, this project is intended to be deployed on mobile devices on the edge, hence making perfect sense to build a model based on a class of efficient models (MobileNets) that were pre-trained to suite deployment of Fine-Tuned DNN models for mobile and embedded vision applications.
'''
Parameters such as: layers, backend and models can be
tweaked accordingly.
'''
# defining the base_line of the network
base_model = MobileNet(input_shape = (IMAGE_SIZE, IMAGE_SIZE,3), alpha = ALPHA,
depth_multiplier = 1, dropout = 0.001, include_top = False,
weights = "imagenet", classes = 4, backend = keras.backend,
layers = keras.layers, models = keras.models, utils = keras.utils)
The following model parameters are considered and set as follows:
IMAGE_SIZE = 224 # Image input size = H x W = 224 x 224
ALPHA = 0.75 # α = Learning rate
EPOCHS = 20
Dataset used here is a custom and refined dataset acquised from Deepmind's repo, which was initially used in Kim, Hyunjik and Mnih, Andriy. "Disentangling by Factorising." In Proceedings of the 35th International Conference on Machine Learning (ICML). 2018., to assess the disentanglement properties of unsupervised learning methods.
The Dataset contains four types of 3D shapes: Cube, Cylinder, Spheroid, and Sphere.
Cube | Cylinder | Spheroid | Sphere |
These 3D shapes are generated 6 ground truth latent factors. These factors are floor colour, wall colour, object colour, scale, shape and orientation. All of them are customizable and can be plotted acording to thier respective defined value-bounds.
- floor hue: 10 values linearly spaced in [0, 1]
- wall hue: 10 values linearly spaced in [0, 1]
- object hue: 10 values linearly spaced in [0, 1]
- scale: 8 values linearly spaced in [0, 1]
- shape: 4 values in [0, 1, 2, 3]
- orientation: 15 values linearly spaced in [-30, 30]
The dataset is stored in a HDF5
file, with the following fields/parameters:
images
: (480000 x 64 x 64 x 3, uint8) RGB images.labels
: (480000 x 6, float64) Values of the latent factors.
The dataset can be downloaded from here
The model tests on abstract objects can be found here.