Machine Learning Comic Book Cover Classifier

Overview

A series of scripts to use machine learning to find and extract covers from comic books.

Most comic book files will have the cover as the first page, but often they will have multiple covers. Sometimes these are included at the start, sometimes at the end, and sometimes they are spread throughout the book. The goal of this project is to be able to run it against a directory full of comic book files, and have it extract all of the covers, so that they can be used to generate a cool collage (see the examples section below)

Examples

Collages were built using John's Background Switcher

Added some manually selected non-cover pages as well to give it a bit more variety.

Step 1 - Feature Engineering

MLE_1_Feature_Engineering.py is the first main file. Given a folder, recursively search through it for comic files (cbr/cbz) and build out a feature set for each page/image in each file.

The features we are using are as follows:

File Name
Whether the file name contains "Variant"
Image Height
Image Width
Number of continuous horizontal black lines in the image
Number of continuous horizontal white lines in the image
Number of white pixels in the image
Number of black pixels in the image
OCR word count for the image
Whether the OCR found the word "Variant"
Whether the OCR found the word "Marvel"
Whether OpenCV thinks it saw the Marvel Logo,
OpenCV confident score it seeing the Marvel Logo

Output csv looks like this:

Step 2 - Classifier Testing and Comparison

MLE_2_Classifier_Testing_And_Comparison.py is the second main file. Given a training data set, split it 80:20 training:test, then run various different classifiers using those two sets and measure their performance.

Key metrics we are measureing are Accuracy, Precision, Recall, F1 and Logistic Loss.

The classifiers tested are:

The results of the tests looked like this:

Overall, GradientBoostingClassifier was found to be the best option for this use case.

Step 3 - Usage

MLE_3_Extract_Classify_Move.py is the third main file. It works as follows:

Given an input folder, recursively search through it, find and extract all comic files to separate directories, flatten them (renaming files to avoid conflicts)
Build feature set for each image
Load trained classifier, load featureset into pandas, iterate over pandas and apply classifier
Move covers to output folder and clean up temp directories.

Early Expirements

Additionally, there are individual files for some of the individual features from MLE 1 from early testing/troubleshooting, as well as some additional benchmarking stuff. Might be of some use to someone.

Misc Other Benchmarking

Comparison of different classifiers against earlier training sets

Examining computation/time cost of different feature types

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
EarlyExperiments		EarlyExperiments
Examples		Examples
Images		Images
.gitignore		.gitignore
MLE_1_Feature_Engineering.py		MLE_1_Feature_Engineering.py
MLE_2_Classifier_Testing_And_Comparison.py		MLE_2_Classifier_Testing_And_Comparison.py
MLE_3_Extract_Classify_Move.py		MLE_3_Extract_Classify_Move.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Comic Book Cover Classifier

Overview

Examples

Step 1 - Feature Engineering

Step 2 - Classifier Testing and Comparison

Step 3 - Usage

Early Expirements

Misc Other Benchmarking

About

Releases

Packages

Languages

jamesj223/MLCBCC

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Comic Book Cover Classifier

Overview

Examples

Step 1 - Feature Engineering

Step 2 - Classifier Testing and Comparison

Step 3 - Usage

Early Expirements

Misc Other Benchmarking

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages