Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Competence centre LifeWatch/Citizen Science/Task 4.2

From EGIWiki
Jump to navigation Jump to search

The objective of this task is to develop software able to identify plant's images in order to allow citizens to contribute to flora conservation. Image recognition will be implemented through machine learning with deep neural networks (aka deep learning). Caffe is a deep learning framework created at UC Berkeley.

Caffe installation

Caffe installation in Altamira

For the installation of Caffe in Altamira Supercomputer at IFCA (Spain) we have followed both the official installation guide and a specific guide for installing Caffe on a Supercomputer cluster.

Not having root access to Altamira, Caffe has been installed locally at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/src/caffe. For more information on the local installation of software in a supercomputer please check [1]. The software and libraries already available at Altamira are:

  • Python 2.7.10
  • CUDA 7.0.28
  • OpenMPI 1.8.3
  • Boost 1.54.0
  • protobuf 2.5.0
  • gcc 4.6.3
  • HDF5 1.8.10

The remaining libraries have been installed at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/lib. Those libraries are:

  • gflags
  • glog
  • leveldb
  • OpenCV
  • snappy
  • LMDB
  • ATLAS

Modules can be loaded all at once by loading /gpfs/res_scratch/lifewatch/iheredia/.usr/local/share/modulefiles/common.

Comments

At this moment Altamira runs with Tesla M2090 GPUs with CUDA capability 2.0. Therefore Caffe has been compiled without CuDNN (the GPU-accelerated library of primitives for deep neural networks) which requires GPUs with CUDA capability of 3.0 or higher.

Caffe installation in Yong

Having root access, installing Caffe is straightforward in Ubuntu. Yong runs with Nvidia's Quadro 4000 GPU which neither enables CuDNN support. This GPU has a very limited memory which enables training in small simple datasets with small networks (eg. MNIST) but is not capable to store (and therefore train) more complex networks needed to learn more involved datasets (e.g. ImageNet).

Caffe architecture

Neural networks learn their layer's parameters using backpropagation where the gradients of one layer are used to compute the gradients of the previous one (the communication between layers in Caffe is implemented through blobs which store the values and gradients of each layer of the network). Therefore deep networks are very modular and Caffe reflects this modularity by giving you (almost) complete freedom to compose your network's architecture.

There are several types of intermediate layers which perform the computation:
Common layers:

  • Inner product
  • Splitting
  • Flattening
  • Reshape
  • Concatenation
  • Slicing
  • Element-wise operations
  • Argmax
  • Softmax
  • Mean Variance Normalization


Vision layers:

  • Convolution
  • Pooling
  • Local Response Normalization


Activation/Neuron layers:

  • ReLU
  • Leaky ReLU
  • Sigmoid
  • Tanh
  • Absolute Value
  • Power
  • Binomial Normal LogLikelihood

Activations functions like sigmoids and tanh are not as popular as the used to be. It is usually a safe assumption to choose ReLUs to ensure faster convergence.

Despite the simplicity of constructing a network, it's correct design often involves a considerable amount of expertise. Therefore for beginners it is usually recommended to train their datasets on existing networks (e.g. AlexNet) kindly provided with Caffe.

Training with Caffe

The training can be divided in several steps. The MNIST tutorial and the Imagenet tutorial are good references to follow.

Preparation of the data

Figure 1. Distribution of the number of images per genus for the portuguese flora dataset.

First it is necessary to do some image preprocessing: image resizing to a square (e.g. 256x256 for AlexNet) and mean centering (subtracting the mean image of the dataset to each example) which has been seen to lead to a faster convergence. For image resizing in Ubuntu try the command:

for name in /path/to/imagenet/val/*.JPEG; do
    convert -resize 256x256\! $name $name
done

Mean centering will be done directly during training. There are other kinds of preprocessing operations (like normalization, decorrelation with PCA and whitening) that are common in machine learning but who have not proven to be useful in image recognition with deep learning. You should also create a test.txt and a val.txt file. Each file is a list with the following structure:

path_to_image1.jpg 0 #label
path_to_image2.jpg 0 
...
path_to_image100.jpg 56
...

where the paths should start from your image folder paths. If you intend to build some hierarchical classification like in ImageNet (e.g. labels cobra and boa can be regrouped inside snakes) you might need some additional files (like a .bet.pickle which defines a graph for your dataset). Of course this will prove useful in the classification of flora as species regroup into genus which in turn regroup into families and so on.

The second step is to create the lmdb files (for the TRAIN and TEST sets) which will be fed to Caffe. When efficiency is not critical, images can be fed to Caffe directly from disk, from files in HDF5 or in common image formats. The lmdb files can be created by modifying create_imagenet.sh updating the different paths. Then modify the make_imagenet_mean.sh with your lmdbfile_path. This will generate a mean.prototxt file that will be used later for centering the images.

Caffe doesn't need a validation set because it only optimizes the weights not the hyperparameters (the val.txt acts as a test set). We could in principle optimize the hyperparameters by creating a Python script which loops over the hyperparameters. Caffe will train each time with a specific combination of hyperparameters and test it's accuracy on a validation set (which Caffe considers to be a TEST set) and then selecting the combination of hyperparameters which gives the higher accuracy. However the final real accuracy must be mesured on a completely independent set of images, the real test set (refer to Test Section).

Usually deep networks are trained with very large datasets (e.g. the ImageNet dataset provides around 1000 images per label). In our case the datasets are not that large. For example let's take the Portuguese flora dataset. It includes around 2073 species but most of them have very few images. Therefore the strategy should be to identify the genus instead of the specie. As we can see in Figure 1 most genus still have very few images. From the 747 different genus only 292 of them have at least 25 images. Selecting 5 images for the test set and another 5 for the validation set, we have only 15 images left (in the worst case) for the training set.

A possible way of incrementing the number of training images is to use cross-validation. Cross-validation is a technique consisting in splitting the training set into separate folds and cyclically reusing the validation set to train (so we are able to train with the validation set which proves useful in the case of small datasets). As mentioned before Caffe has no validation process and therefore no built-in function for implementing cross-validation. Once again cross-validation can be manually implemented through a Python script by cyclically exchanging the train and validation sets and each time finetuning the weights starting from the previous iteration's weights (in the first iteration the weights can be fine-tuned from the trained weights of the Oxford 102 flower dataset or any other dataset trained with AlexNet).

Defining the network's architecture

The network's architecture is defined in the train_test.prototxt file. All networks start with a data layer and end with an output layer. Each layer can optionally be labelled with TRAIN or TEST if we want to include them just on the training or the testing phase. For example we will create two data layers labelled TRAIN and TEST each pointing respectively to the train and test lmdb files. Those data layers will also contain the batch size (i.e. the number of images we will process at each iteration) which should be adapted to the GPU memory. Doing parameter update with batches instead of with the full dataset is a much more efficient way to ensure a fast convergence. Of course we will also create two the output layers labelled TRAIN and TEST. The former will output the computed loss in order to start the back propagation while the latter will output the accuracy of the prediction of the batch of test images.

The hyperparameters of the network will be defined in the solver.prototxt file. In the case of the ImageNet model provided by Caffe for the AlexNet we have that:

net: "models/bvlc_reference_caffenet/train_val.prototxt"
test_iter: 1000       # number of test iterations = number_of_test_images / test_batch_size
test_interval: 1000   # we carry out testing after 1000 training iterations
base_lr: 0.01         # base learning rate
lr_policy: "step"     # learning rate policy
gamma: 0.1
stepsize: 100000
display: 20           # display every 20 iterations 
max_iter: 450000      # maximum number of iterations
momentum: 0.9         # momemtum
weight_decay: 0.0005  # regularization parameter
snapshot: 10000       # snapshot intermediate results
snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train"
solver_mode: GPU      # solver mode: CPU or GPU 

As mentioned before beginners might want to train their data with some predefined network like this one. Therefore in the train_test.prototxt file we will change the data layer to adapt it to our new image size (or leave it be if the images are also 256x256). The deploy.prototxt file must be copied but remains unchanged. In the solver.prototxt file we just have to change the net_path and optionally the hyperparameters can be changed if we want to do hyperparameter optimization with a validation set as suggested before. One problem of using the AlexNet for training of our Portuguese flora dataset (which is much smaller than ImageNet) is that, due to the high capacity of this net, we might overfit our data. There are two ways of avoiding overfit: either reducing the network size or increasing the regularization (the weight_decay parameter). The latter is usually the best choice because the loss landscape of large networks, in spite of having more local minima, usually achieves a smaller loss. In addition when training on larger networks the variance of the achieved final loss is much smaller that when training of small networks so we rely less on luck of random initialization. The intuition behind it is that it is harder to get stuck in a high dimensional local minimum than in a low dimensional one.

Training

Once we have all the files, train is done by executing the following command:

./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt

Some flags can be added as to do screenshots of the results after some given iterations. This will output a .caffemodel file where the learned weights will be stored.

Testing with PyCaffe

Caffe has a Python module named pycaffe specifically designed for the testing phase. Several ipython notebooks are available as an introduction to this module [2][3].

For the results to be reliable, testing should be done with images who haven't been used neither for training nor validation. Having a test set with the same number of images per label, standard accuracy results are top-1 accuracy (i.e. the predicted label is the right one) and top-5 accuracy (i.e. the right label is among the five first predictions of the network).

Comments

Once the new GTX 980 Ti GPU will be installed and I will be able to train the Portuguese flora dataset with the AlexNet architecture, I will update the different sections with the results, especially concerning the effectiveness of hyperparameter optimization and cross-validation. In addition I will upload the python code for preparing the dataset and measuring the accuracy of the model.

Update

Today arrived the new GPU. Good news: it performs x20 faster on the MNIST dataset! Stay tuned for upcoming updates.

An example: The Portuguese Flora dataset

Useful links

Computer Vision with Deep Learning

  • Stanford Course - Convolutional Neural Networks for Visual Recognition: This is by far the most useful link to introduce yourself to the topic of image recognition with neural networks. It progressively goes from the simplest concepts (SVM, Softmax, 2-layer network, backpropagation) to the key concepts of image recognition with deep learning (convolutional neural networks, ...). It has course notes and recorded lectures on Youtube.

Deep Learning

Machine Learning