Competence centre LifeWatch/Citizen Science/Task 4.2
The objective of this task is to develop software able to identify plant's images in order to allow citizens to contribute to flora conservation. Image recognition will be implemented through machine learning with deep neural networks (aka deep learning). Caffe is a deep learning framework created at UC Berkeley.
Caffe installation
Caffe installation in Altamira
For the installation of Caffe in Altamira Supercomputer at IFCA (Spain) we have followed both the official installation guide and a specific guide for installing Caffe on a Supercomputer cluster.
Not having root access to Altamira, Caffe has been installed locally at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/src/caffe
. For more information on the local installation of software in a supercomputer please check [1]. The software and libraries already available at Altamira are:
- Python 2.7.10
- CUDA 7.0.28
- OpenMPI 1.8.3
- Boost 1.54.0
- protobuf 2.5.0
- gcc 4.6.3
- HDF5 1.8.10
The remaining libraries have been installed at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/lib
. Those libraries are:
- gflags
- glog
- leveldb
- OpenCV
- snappy
- LMDB
- ATLAS
Modules can be loaded all at once by loading /gpfs/res_scratch/lifewatch/iheredia/.usr/local/share/modulefiles/common
.
- Comments
At this moment Altamira runs with Tesla M2090 GPUs with CUDA capability 2.0. Therefore Caffe has been compiled without CuDNN (the GPU-accelerated library of primitives for deep neural networks) which requires GPUs with CUDA capability of 3.0 or higher.
Caffe installation in Yong
Having root access, installing Caffe is straightforward in Ubuntu. Yong runs with Nvidia's Quadro 4000 GPU which neither enables CuDNN support. This GPU has a very limited memory which enables training in small simple datasets with small networks (eg. MNIST) but is not capable to store (and therefore train) more complex networks needed to learn more involved datasets (e.g. ImageNet).
Caffe architecture
Neural networks learn their layer's parameters using backpropagation where the gradients of one layer are used to compute the gradients of the previous one (the communication between layers in Caffe is implemented through blobs which store the values and gradients of each layer of the network). Therefore deep networks are very modular and Caffe reflects this modularity by giving you (almost) complete freedom to compose your network's architecture.
There are several types of layers:
Common layers:
- Inner product
- Splitting
- Flattening
- Reshape
- Concatenation
- Slicing
- Element-wise operations
- Argmax
- Softmax
- Mean Variance Normalization
Vision layers:
- Convolution
- Pooling
- Local Response Normalization
Activation/Neuron layers:'
- ReLU
- Leaky ReLU
- Sigmoid
- Tanh
- Absolute Value
- Power
- Binomial Normal LogLikelihood
Activations functions like sigmoids and tanh are not as popular as the used to be. It is usually a safe assumption to choose ReLUs to ensure faster convergence.
Despite the simplicity of constructing a network, it's correct design often involves a considerable amount of expertise. Therefore for beginners it is usually recommended to train their datasets on existing networks (e.g. AlexNet) kindly provided with Caffe.
Training with Caffe
The training can be divided in several steps. The MNIST tutorial and the Imagenet tutorial are good references to follow.
Preparation of the data
First it is necessary to do some image preprocessing: image resizing to a square (e.g. 256x256 for AlexNet) and mean centering (subtracting the mean image of the dataset to each example) which has been seen to lead to a faster convergence. For image resizing in Ubuntu try the command:
for name in /path/to/imagenet/val/*.JPEG; do convert -resize 256x256\! $name $name done
Mean centering will be done directly during training. There are other kinds of preprocessing operations (like normalization, decorrelation with PCA and whitening) that are common in machine learning but who have not proven to be useful in image recognition with deep learning. The second step is to create the lmdb files (for the TRAIN
and TEST
sets) which will be fed to Caffe. When efficiency is not critical, images can be fed to Caffe directly from disk, from files in HDF5 or in common image formats. The lmdb files can be created by modifying create_imagenet.sh
with your image_path. Then modify the make_imagenet_mean.sh
with your image_path. This will generate a mean.prototxt
file that will be used later for centering the images.
Caffe doesn't need a validation set because it only optimizes the weights not the hyperparameters. We could in principle optimize the hyperparameters by creating a Python script which loops over the hyperparameters. Caffe will train each time with a specific combination of hyperparameters and test it's accuracy on a validation set (which Caffe considers to be a TEST
set) and then selecting the combination of hyperparameters which gives the higher accuracy. However the final real accuracy must be mesured on a completely independent set of images, the real test set (refer to Test Section).
Usually deep networks are trained with very large datasets (e.g. the ImageNet dataset provides around 1000 images per label). In our case the datasets are not that large. For example let's take the Portuguese flora dataset. It includes around 2073 species but most of them have very few images. Therefore the strategy should be to identify the genus instead of the specie. As we can see in Figure 1 most genus still have very few images. From the 747 different genus only 292 of them have at least 25 images. Selecting 5 images for the test set and another 5 for the validation set, we have only 15 images left (in the worst case) for the training set.
A possible way of incrementing the number of training images is to use cross-validation. Cross-validation is a technique consisting in split the training set in separate folds and cyclically reusing the validation set to train (so we are able to use the validation set which proves useful in the case of small datasets). As mentioned before Caffe has no validation process and therefore no built-in function for implementing cross-validation. Once again cross-validation can be manually implemented through a Python script by cyclically exchanging the train and validation sets each time finetuning the weights starting from the previous weight (in the first iteration the weights should be fine-tuned from the trained weights of the Oxford 102 flower dataset).
Defining the network's architecture
The network's architecture is defined in the train_test.prototxt
file. All networks start with a data layer and end with an output layer. Each layer can optionally be labelled with TRAIN
or TEST
if we want to include them just on the training or the testing phase. For example we will create two data layers labelled TRAIN
and TEST
each pointing respectively to the train and test lmdb files. Of course we will also create two the output layers labelled TRAIN
and TEST
. The former will output the computed loss in order to start the back propagation while the latter will output the accuracy of the prediction of the batch of test images.
The hyperparameters of the network will be defined in the solver.prototxt
file. In the case of the ImageNet model provided by Caffe for the AlexNet we have that:
net: "models/bvlc_reference_caffenet/train_val.prototxt" test_iter: 1000 test_interval: 1000 base_lr: 0.01 lr_policy: "step" gamma: 0.1 stepsize: 100000 display: 20 max_iter: 450000 momentum: 0.9 weight_decay: 0.0005 snapshot: 10000 snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train" solver_mode: GPU
As mentioned before beginners might want to train their data with some predefined network like this one. Therefore in the train_test.prototxt
file we will change the data layer to adapt it to our new image size (or leave it be if the images are also 256x256). The deploy.prototxt
file must be copied but remains unchanged. In the solver.prototxt
file we just have to change the net_path and optionally the hyperparameters can be changed if we want to do hyperparameter optimization with a validation set as suggested before. One problem of using the AlexNet for training of our Portuguese flora dataset (which is much smaller than ImageNet) is that, due to the high capacity of this net, we might overfit our data. There are two ways of avoiding overfit: either reducing the network size or increasing the regularization (the weight_decay
parameter). The latter is usually the best choice because optimizing in a high capacity network leads to better results.
Training
Once we have all the files, train is done by executing the following command:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt
Some flags can be added as to do screenshots of the results after some given iterations.
Testing with PyCaffe
Caffe has a Python module named PyCaffe specifically designed for the testing phase. Several ipython notebooks are available as an introduction to this module [2].
For the results to be reliable, testing should be done with images who haven't been used neither for training nor validation. Having a test set with the same number of images per label, standard accuracy results are top-1 accuracy (i.e. the predicted label is the right one) and top-5 accuracy (i.e. the right label is among the five first predictions of the network).
- Comments
Once the new GTX 980 Ti GPU will be installed and I will be able to train the Portuguese flora dataset with the AlexNet architecture, I will update the different sections with the results, especially concerning the effectiveness of cross-validation.
Useful links
Computer Vision with Deep Learning
- Stanford Course - Convolutional Neural Networks for Visual Recognition: This is by far the most useful link to introduce yourself to the topic of image recognition with neural networks. Has course notes and recorded lectures on Youtube.
Deep Learning
- Michael Nielsen's webpage: Introductory notes on deep learning.
- Deep Learning Book: Still on preprint but with open access course notes. Written by one of the leading figure in the field.
Machine Learning
- Pattern Recognition and Machine Learning - C. M. Bishop: Classic reference in machine learning. Useful to have the broad view.