ML aspects of curbmap
Problem: A user uploads a photo of multiple street signs in one image.
- Preprocess
- Verify that upload location indicated is related to EXIF data
- Contrast Limited Histogram Equalization of image (CLAHE) of LAB image
- Reduce image size to max dimension < 2600 using Lanczos4 interpolation sinc approx
- Grayscale image (possibly not right to do since some images have Red nos and such)
- Bounding boxes for signs (CNN)
- RetinaNet
- Cannot use trained weights from COCO since there is no general sign category (only a trained "stop sign" one). Developed dataset of ~11000 annotations (too small) which has two categories (sign - ~5000 or notsign - ~6000). The dataset comes from a combination of two of ImageNet's synsets (n06793231 and n06794110). There is definitely some overlap of the two sets of images. However, hopefully, this will allow enough space for RetinaNet to learn the bounding boxes of signs. Training is happening now (though only on a NVIDIA 1070SC... it'll take days)
- WIP:
This image shows the result of an image that has bounding boxes predicted on it. With ~93% confidence, the algorithm has identified the sign. - Run trained model on input photos from testing dataset to be collected to see if we get decent bounding boxes for signs.
- From testing it seems photos of signs in daylight or when taken with a flash are detected with the highest confidence. Additionally, photos with some graniness and shot from strange angles produce lower confidences such as in:
This image produces two bounding boxes with moderately lower confidences. - Whereas even lowlight with flash and strange angles produces very high confidence regions:
- We will select photos with a high enough bounding box localization confidence to move onto the next stage (i.e. getting points and skewing the bounded region to adequately detect the text).
- From testing it seems photos of signs in daylight or when taken with a flash are detected with the highest confidence. Additionally, photos with some graniness and shot from strange angles produce lower confidences such as in:
- Crop segments from image into sub "signs"
- Orient/upright/straighten the sign in each box
- Extract the text in the box
- From: https://github.com/chongyangtao/Awesome-Scene-Text-Recognition
- TextSpotter: https://arxiv.org/abs/1803.03474
- Algorithm rotates and arranges closest text into lines in a single CNN
- Since tf/Keras does not have the rotational layers we will have to use Caffe (why does it always come down to caffe :-( )
- OR SegLink: https://arxiv.org/abs/1703.06520
- 20fps
- tf models code at: https://github.com/bgshih/seglink.git
- TextSpotter: https://arxiv.org/abs/1803.03474
- From: https://github.com/chongyangtao/Awesome-Scene-Text-Recognition
- Determine meaning from the text
Problem: Given a street, can we predict a similar restricted street 1.