Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @

Competence centre LifeWatch/Citizen Science/Task 4.2

From EGIWiki
Jump to navigation Jump to search

The objective of this task is to develop software able to identify plant's images in order to allow citizens to contribute to flora conservation. Image recognition will be implemented through machine learning with deep neural networks (aka deep learning). Caffe is a deep learning framework created at UC Berkeley.

Caffe installation

Caffe installation in Altamira

For the installation of Caffe in Altamira Supercomputer at IFCA (Spain) we have followed both the official installation guide and a specific guide for installing Caffe on a Supercomputer cluster.

Not having root access to Altamira, Caffe has been installed locally at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/src/caffe. For more information on the local installation of software in a supercomputer please check [1]. The software and libraries already available at Altamira are:

  • Python 2.7.10
  • CUDA 7.0.28
  • OpenMPI 1.8.3
  • Boost 1.54.0
  • protobuf 2.5.0
  • gcc 4.6.3
  • HDF5 1.8.10

The remaining libraries have been installed at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/lib. Those libraries are:

  • gflags
  • glog
  • leveldb
  • OpenCV
  • snappy
  • LMDB

Modules can be loaded all at once by loading /gpfs/res_scratch/lifewatch/iheredia/.usr/local/share/modulefiles/common.


At this moment Altamira runs with Tesla M2090 GPUs with CUDA capability 2.0. Therefore Caffe has been compiled without CuDNN (the GPU-accelerated library of primitives for deep neural networks) which requires GPUs with CUDA capability of 3.0 or higher.

Caffe installation in Yong

Having root access, installing Caffe is straightforward in Ubuntu. Yong runs with Nvidia's Quadro 4000 GPU which neither enables CuDNN support. This GPU has a very limited memory which enables training in small simple datasets with small networks (eg. MNIST) but is not capable to store (and therefore train) more complex networks needed to learn more involved datasets (e.g. ImageNet).

Caffe architecture

Neural networks learn their layer's parameters using backpropagation where the gradients of one layer are used to compute the gradients of the previous one (the communication between layers in Caffe is implemented through blobs which store the values and gradients of each layer of the network). Therefore deep networks are very modular and Caffe reflects this modularity by giving you (almost) complete freedom to compose your network's architecture.

There are several types of layers:
Common layers:

  • Inner product
  • Splitting
  • Flattening
  • Reshape
  • Concatenation
  • Slicing
  • Element-wise operations
  • Argmax
  • Softmax
  • Mean Variance Normalization

Vision layers:

  • Convolution
  • Pooling
  • Local Response Normalization

Activation/Neuron layers:'

  • ReLU
  • Leaky ReLU
  • Sigmoid
  • Tanh
  • Absolute Value
  • Power
  • Binomial Normal LogLikelihood

Activations functions like sigmoids and tanh are not as popular as the used to be. It is usually a safe assumption to choose ReLUs to ensure faster convergence.

Despite the simplicity of constructing a network, it's correct design often involves a considerable amount of expertise. Therefore for beginners it is usually recommended to train their datasets on existing networks (e.g. AlexNet) kindly provided with Caffe.

Training with Caffe

The training can be divided in several steps. The MNIST tutorial and the Imagenet tutorial are good references to follow.

Preparation of the data

Figure 1. Distribution of the number of images per genus for the portuguese flora dataset.

First it is necessary to do some image preprocessing: image resizing to a square (e.g. 256x256 for AlexNet) and mean centering (subtracting the mean image of the dataset to each example) which has been seen to lead to a faster convergence. For image resizing in Ubuntu try the command:

for name in /path/to/imagenet/val/*.JPEG; do
    convert -resize 256x256\! $name $name

Mean centering will be done directly during training. There are other kinds of preprocessing operations (like normalization, decorrelation with PCA and whitening) that are common in machine learning but who have not proven to be useful in image recognition with deep learning. The second step is to create the lmdb files (for the TRAIN and TEST sets) which will be fed to Caffe. When efficiency is not critical, images can be fed to Caffe directly from disk, from files in HDF5 or in common image formats. The lmdb files can be created by modifying with your image_path. Then modify the with your image_path. This will generate a mean.prototxt file that will be used later for centering the images.

Caffe doesn't need a validation set because it only optimizes the weights not the hyperparameters. We could in principle optimize the hyperparameters by creating a Python script which loops over the hyperparameters. Caffe will train each time with a specific combination of hyperparameters and test it's accuracy on a validation set (which Caffe considers to be a TEST set) and then selecting the combination of hyperparameters which gives the higher accuracy. However the final real accuracy must be mesured on a completely independent set of images, the real test set (refer to Test Section).

Usually deep networks are trained with very large datasets (e.g. the ImageNet dataset provides around 1000 images per label). In our case the datasets are not that large. For example let's take the Portuguese flora dataset. It includes around 2073 species but most of them have very few images. Therefore the strategy should be to identify the genus instead of the specie. As we can see in Figure 1 most genus still have very few images. From the 747 different genus only 292 of them have at least 25 images. Selecting 5 images for the test set and another 5 for the validation set, we have only 15 images left (in the worst case) for the training set.

A possible way of incrementing the number of training images is to use cross-validation. Cross-validation is a technique consisting in split the training set in separate folds and cyclically reusing the validation set to train (so we are able to use the validation set which proves useful in the case of small datasets). As mentioned before Caffe has no validation process and therefore no built-in function for implementing cross-validation. Once again cross-validation can be manually implemented through a Python script by cyclically exchanging the train and validation sets each time finetuning the weights starting from the previous weight (in the first iteration the weights should be fine-tuned from the trained weights of the Oxford 102 flower dataset).

Defining the network's architecture

The network's architecture is defined in the train_test.prototxt file. All networks start with a data layer and end with an output layer. Each layer can optionally be labelled with TRAIN or TEST if we want to include them just on the training or the testing phase. For example we will create two data layers labelled TRAIN and TEST each pointing respectively to the train and test lmdb files. Of course we will also create two the output layers labelled TRAIN and TEST. The former will output the computed loss in order to start the back propagation while the latter will output the accuracy of the prediction of the batch of test images.

The hyperparameters of the network will be defined in the solver.prototxt file. In the case of the ImageNet model provided by Caffe for the AlexNet we have that:

net: "models/bvlc_reference_caffenet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train"
solver_mode: GPU

As mentioned before beginners might want to train their data with some predefined network like this one. Therefore in the train_test.prototxt file we will change the data layer to adapt it to our new image size (or leave it be if the images are also 256x256). In the solver.prototxt file we just have to change the net_path and optionally the hyperparameters can be changed if we want to do hyperparameter optimization with a validation set as suggested before. One problem of using the AlexNet for training of our Portuguese flora dataset (which is much smaller than ImageNet) is that, due to the high capacity of this net, we might overfit our data. There are two ways of avoiding overfit: either reducing the network size or increasing the regularization (the weight_decay parameter). The latter is usually the best choice because optimizing in a high capacity network leads to better results.


Once we have all the files, train is done by executing the following command:

./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt

Some flags can be added as to do screenshots of the results after some given iterations.

Testing with PyCaffe

Caffe has a Python module named PyCaffe specifically designed for the testing phase. Several ipython notebooks are available as an introduction to this module [2].

For the results to be reliable, testing should be done with images who haven't been used neither for training nor validation. Having a test set with the same number of images per label, standard accuracy results are top-1 accuracy (i.e. the predicted label is the right one) and top-5 accuracy (i.e. the right label is among the five first predictions of the network).


Once the new GTX 980 Ti GPU will be installed and I will be able to train the Portuguese flora dataset with the AlexNet architecture, I will update the different sections with the results, especially concerning the effectiveness of cross-validation.

Useful links

Computer Vision with Deep Learning

Deep Learning

Machine Learning