I came across a great article on using the Deep Learning Python package
tflearn to perform inference on some classic datasets in Machine Learning like the MNIST dataset and the CIFAR-10 dataset. As it turns out, these types of models have been around for quite a while for various tasks in image recognition. The particular case of the CIFAR-10 dataset was solved by a neural network very similar to the one from the mentioned post. The general idea of using convolutional neural networks dates back to Yann LeCun’s paper from 1998 in digit recognition.
Since I have a lot of experience working with data but not a lot of experience working with deep learning algorithms, I wondered how easy it would be to adapt these methods to a new, somewhat related, problem: going through thousands of my photos to identify pictures of my cat. Turns out, it was easier than I thought.
Training a similar neural network on my own visual data just amounted to connecting up the inputs and the outputs properly. In particular, any input image has to be re-scaled down (or up) to 32×32 pixels. Similarly, your output must be binary and should represent membership of either of the two classes.
The main difficulty involves creating your dataset. This really just means going through your images and classifying a subset of them by hand. For my own run at this, all I did was create a directory like:
images/ cat/ not_cat/
I put any cat photos I found into the
cat directory while putting any non-cat photographs in the other folder. I tried to keep the same number of images in both directories to try to avoid any class imbalance problems. Then again, this wasn’t as much of a concern since roughly half my photos are cat photos anyway.
tflearn has a helper method that lets you create an HDF5 dataset from your directory of images with a simple function. The
Y values from that data structure can be used as the inputs to the deep learning model.
By using around 400 images (roughly 200 for each class), my classifier achieved about an 85% accuracy rate on a validation set of data. For my purposes, namely just automatically tagging potential photos of my cat, this was accurate enough. Any effort to increase the accuracy of this would probably involve some combination of:
- adding more training data by putting images into my class folders
- changing the shape of the network by adding more layers or more nodes per layer
- using a pre-trained model to bootstrap the learning process
That’s all it really takes. If you know a bit of Python and can sort a few of your photos into folders based on their categories, you can get started using sophisticated deep learning algorithms on your own images.