Difference between revisions of "Competence centre LifeWatch/Citizen Science/Task 4.2"
Line 86: | Line 86: | ||
Usually deep networks are trained with very large datasets (e.g. the ImageNet dataset provides around 1000 images per label). In our case the datasets are not that large. For example let's take the [http://www.flora-on.pt/ Portuguese flora dataset]. It includes around 2073 species but most of them have very few images. Therefore the strategy should be to identify the genus instead of the specie. As we can see in Figure 1 most genus still have very few images. From the 747 different genus only 292 of them have at least 25 images. Selecting 5 images for the test set and another 5 for the validation set, we have only 15 images left (in the worst case) for the training set. | Usually deep networks are trained with very large datasets (e.g. the ImageNet dataset provides around 1000 images per label). In our case the datasets are not that large. For example let's take the [http://www.flora-on.pt/ Portuguese flora dataset]. It includes around 2073 species but most of them have very few images. Therefore the strategy should be to identify the genus instead of the specie. As we can see in Figure 1 most genus still have very few images. From the 747 different genus only 292 of them have at least 25 images. Selecting 5 images for the test set and another 5 for the validation set, we have only 15 images left (in the worst case) for the training set. | ||
A possible way of incrementing the number of training images is to use cross-validation. Cross-validation is a technique consisting in split the training set in separate folds and cyclically reusing the validation set to train (so we are able to use the validation set which proves useful in the case of small datasets). As mentioned before Caffe has no validation process and therefore no built-in function for implementing cross-validation. Once again cross-validation can be manually implemented through a Python script by cyclically exchanging the train and validation sets each time finetuning the weights starting from the previous weight (in the first iteration the | A possible way of incrementing the number of training images is to use cross-validation. Cross-validation is a technique consisting in split the training set in separate folds and cyclically reusing the validation set to train (so we are able to use the validation set which proves useful in the case of small datasets). As mentioned before Caffe has no validation process and therefore no built-in function for implementing cross-validation. Once again cross-validation can be manually implemented through a Python script by cyclically exchanging the train and validation sets each time finetuning the weights starting from the previous weight (in the first iteration the weights should be fine-tuned from the trained weights of the [https://gist.github.com/jimgoo/0179e52305ca768a601f Oxford 102 flower dataset]). | ||
=== Defining the network's architecture === | === Defining the network's architecture === |
Revision as of 16:12, 4 April 2016
The objective of this task is to develop software able to identify plant's images in order to allow citizens to contribute to flora conservation. Image recognition will be implemented through machine learning with deep neural networks (aka deep learning). Caffe is a deep learning framework created at UC Berkeley.
Caffe installation
Caffe installation in Altamira
For the installation of Caffe in Altamira Supercomputer at IFCA (Spain) we have followed both the official installation guide and a specific guide for installing Caffe on a Supercomputer cluster.
Not having root access to Altamira, Caffe has been installed locally at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/src/caffe
. For more information on the local installation of software in a supercomputer please check [1]. The software and libraries already available at Altamira are:
- Python 2.7.10
- CUDA 7.0.28
- OpenMPI 1.8.3
- Boost 1.54.0
- protobuf 2.5.0
- gcc 4.6.3
- HDF5 1.8.10
The remaining libraries have been installed at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/lib
. Those libraries are:
- gflags
- glog
- leveldb
- OpenCV
- snappy
- LMDB
- ATLAS
Modules can be loaded all at once by loading /gpfs/res_scratch/lifewatch/iheredia/.usr/local/share/modulefiles/common
.
- Comments
At this moment Altamira runs with Tesla M2090 GPUs with CUDA capability 2.0. Therefore Caffe has been compiled without CuDNN (the GPU-accelerated library of primitives for deep neural networks) which requires GPUs with CUDA capability of 3.0 or higher.
Caffe installation in Yong
Having root access, installing Caffe is straightforward in Ubuntu. Yong runs with Nvidia's Quadro 4000 GPU which neither enables CuDNN support. This GPU has a very limited memory which enables training in small simple datasets with small networks (eg. MNIST) but is not capable to store (and therefore train) more complex networks needed to learn more involved datasets (e.g. ImageNet).
Caffe architecture
Neural networks learn their layer's parameters using backpropagation where the gradients of one layer are used to compute the gradients of the previous one (the communication between layers in Caffe is implemented through blobs which store the values and gradients of each layer of the network). Therefore deep networks are very modular and Caffe reflects this modularity by giving you (almost) complete freedom to compose your network's architecture.
There are several types of layers:
Common layers:
- Inner product
- Splitting
- Flattening
- Reshape
- Concatenation
- Slicing
- Element-wise operations
- Argmax
- Softmax
- Mean Variance Normalization
Vision layers:
- Convolution
- Pooling
- Local Response Normalization
Activation/Neuron layers:'
- ReLU
- Leaky ReLU
- Sigmoid
- Tanh
- Absolute Value
- Power
- Binomial Normal LogLikelihood
Activations functions like sigmoids and tanh are not as popular as the used to be. It is usually a safe assumption to choose ReLUs to ensure faster convergence.
Despite the simplicity of constructing a network, it's correct design often involves a considerable amount of expertise. Therefore for beginners it is usually recommended to train their datasets on existing networks (e.g. AlexNet) kindly provided with Caffe.
Training with Caffe
The training can be divided in several steps. The MNIST tutorial and the Imagenet tutorial are good references to follow.
Preparation of the data
First it is necessary to do some image preprocessing: image resizing to a square (e.g. 256x256 for AlexNet) and mean centering (subtracting the mean image of the dataset to each example) which has been seen to lead to a faster convergence. There are other kinds of preprocessing operations (like normalization, decorrelation with PCA and whitening) that are common in machine learning but who have not proven to be useful in image recognition with deep learning. The second step is to create the lmdb files (for the TRAIN
and TEST
sets) which will be fed to Caffe. When efficiency is not critical, images can be fed to Caffe directly from disk, from files in HDF5 or in common image formats. The lmdb files can be created by modifying create_imagenet.sh
with your image_path. Caffe doesn't need a validation set because it only optimizes the weights not the hyperparameters. We could in principle optimize the hyperparameters by creating a Python script which loops over the hyperparameters. Caffe will train each time with a specific combination of hyperparameters and test it's accuracy on a validation set (which Caffe considers to be a TEST
set) and then selecting the combination of hyperparameters which gives the higher accuracy. However the final real accuracy must be mesured on a completely independent set of images, the real test set (refer to Test Section).
Usually deep networks are trained with very large datasets (e.g. the ImageNet dataset provides around 1000 images per label). In our case the datasets are not that large. For example let's take the Portuguese flora dataset. It includes around 2073 species but most of them have very few images. Therefore the strategy should be to identify the genus instead of the specie. As we can see in Figure 1 most genus still have very few images. From the 747 different genus only 292 of them have at least 25 images. Selecting 5 images for the test set and another 5 for the validation set, we have only 15 images left (in the worst case) for the training set.
A possible way of incrementing the number of training images is to use cross-validation. Cross-validation is a technique consisting in split the training set in separate folds and cyclically reusing the validation set to train (so we are able to use the validation set which proves useful in the case of small datasets). As mentioned before Caffe has no validation process and therefore no built-in function for implementing cross-validation. Once again cross-validation can be manually implemented through a Python script by cyclically exchanging the train and validation sets each time finetuning the weights starting from the previous weight (in the first iteration the weights should be fine-tuned from the trained weights of the Oxford 102 flower dataset).
Defining the network's architecture
The network's architecture is defined in the net_train_test.prototxt
file. All networks start with a data layer and end with an output layer where the loss is computed in order to start the back propagation.
The parameters of the network will be defined in the solver.prototxt
file.
As mentioned before beginners might want to train their data with some predefined network like AlexNet. In this case the only layer that must be changed is the data layer (if the images have a size different than 256x256) and the output layer. One problem of using the AlexNet for training of our Portuguese flora dataset (which is much smaller than ImageNet) is that, due to the high capacity of this net, we might overfit our data. There are two ways of avoiding overfit: either reducing the network size or increasing the regularization. The latter is usually the best choice because optimizing in a high capacity network leads to better results.
Traning
Testing with PyCaffe
Caffe has a Python module named PyCaffe specifically designed for the testing phase. Several ipython notebooks are available as an introduction to this module [2].
For the results to be reliable, testing should be done with images who haven't been used neither for training nor validation. Having a test set with the same number of images per label, standard accuracy results are top-1 accuracy (i.e. the predicted label is the right one) and top-5 accuracy (i.e. the right label is among the five first predictions of the network).
- Comments
Once the new GTX 980 Ti GPU will be installed and I will be able to train the Portuguese flora dataset with the AlexNet architecture, I will update the different sections with the results, especially concerning the effectiveness of cross-validation.
Useful links
Computer Vision with Deep Learning
- Stanford Course - Convolutional Neural Networks for Visual Recognition: This is by far the most useful link to introduce yourself to the topic of image recognition with neural networks. Has course notes and recorded lectures on Youtube.
Deep Learning
- Michael Nielsen's webpage: Introductory notes on deep learning.
- Deep Learning Book: Still on preprint but with open access course notes. Written by one of the leading figure in the field.
Machine Learning
- Pattern Recognition and Machine Learning - C. M. Bishop: Classic reference in machine learning. Useful to have the broad view.