Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Competence centre LifeWatch/Citizen Science/Task 4.2

From EGIWiki
Jump to navigation Jump to search

Pattern recognition will be implemented through machine learning with deep neural networks (aka deep learning). Caffe is a deep learning framework created at UC Berkeley.

Caffe installation

Caffe installation in Altamira

For the installation of Caffe in Altamira Supercomputer at IFCA (Spain) we have followed both the official installation guide and a specific guide for installing Caffe on a Supercomputer cluster.

Not having root access to Altamira, Caffe has been installed locally at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/src/caffe. For more information on the local installation of software in a supercomputer please check [1]. The software and libraries already available at Altamira are:

  • PYTHON 2.7.10
  • CUDA 7.0.28
  • OPENMPI 1.8.3
  • BOOST 1.54.0
  • PROTOBUF 2.5.0
  • GCC 4.6.3
  • HDF5 1.8.10

The remaining libraries have been installed at /gpfs/res_scratch/lifewatch/iheredia/.usr/local/lib. Those libraries are:

  • gflags
  • glog
  • leveldb
  • OpenCV
  • snappy
  • LMDB
  • ATLAS

Modules can be loaded all at once by loading /gpfs/res_scratch/lifewatch/iheredia/.usr/local/share/modulefiles/common.

Comments

At this moment Altamira runs with Tesla M2090 GPUs with CUDA capability 2.0. Therefore Caffe has been compiled without CuDNN (the GPU-accelerated library of primitives for deep neural networks) which requires GPUs with CUDA capability of 3.0 or higher.

Caffe installation in Yong

Having root access, installing caffe is straightforward in Ubuntu. Yong runs with Nvidia's Quadro 4000 GPU which neither enable CuDNN support. This GPU has a very limited memory which enables training in small simple dataset which require small networks (eg. MNIST) but is not capable to store (and therefore train) more complex networks needed to learn more onvolved dataset (eg. ImageNet).