Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "HOWTO09 How to use Federated Cloud Storage"

From EGIWiki
Jump to navigation Jump to search
(37 intermediate revisions by 5 users not shown)
Line 1: Line 1:
= EGI Federated Cloud Storage Tutorial  =
{{Template: Op menubar}}
{{Template:Doc_menubar}}


<span style="color:#009000">NOTE: THE DESCRIBED STORAGE SOLUTIONS ARE IN PRE-PRODUCTION MODE AND ACCESS TO STORAGE SITES IN THE EGI FEDERATED CLOUD IS AVAILABLE ON REQUEST: UCST@EGI.EU </span>
[[Category:Operations Manuals]]
{{TOC_right}}


 
{{DeprecatedAndMovedTo|new_location=https://docs.egi.eu/users/online-storage/}}
This page aims to give a brief description of the storage services provided by the EGI Federated Cloud and a basic tutorial on how to use and integrate them into your application.
 
The guide is intended for application developers and system administrator to select the best Federated Cloud storage solution for their application needs and to understand how to integrate it into their own applications.
 
== Storage solutions overview ==
 
If you are in need of more storage than the one provided within the VM OS image disk, you can can use the EGI Federated Cloud storage services.
 
There are two kind of services, the Block Storage and the Object Storage. Both of them have their own set of advantages and disadvantages
 
Block storage is a capability of the Federated Cloud Infrastructure-as-a-Service (IaaS). It provides additional storage blocks who can be attached to a virtual machine. A storage block is a virtual disk of a given size, which may be exposed as a virtual device in the VM. That means that you are free to use the storage block as a normal hard-drive (or usb pen drive). You can format it with any file system you want and mount it locally in your VM root file system. Block devices are persistent, thus they keep all the data after VM shutdown and need to be explicitly destroyed when data is not needed anymore. Blockstorage disks can be accessed only from within a VM, and only from VMs running at the same site where the block storage is located. Also, they can be accessed by only one VM at the same time. As part of the IaaS service, block storage is managed via OCCI (the EGI Federated Cloud IaaS interface). There is a limit on the number of block storage devices you can attach on a VM and there is a limit to the maximum size of such virtual disks. These values will depend on the particular Federated Cloud site. Moreover, the disk space is accounted for the entire block storage device, regardless how much of it is currently in use within the VM. More information on how to create, attach and destroy block storage are reported in the next paragraph.
 
Object storage is a standalone service of the EGI Federated Cloud, usually referred also as Federated Cloud STorage-as-a-Service (STaaS). Object storage stores data as set of individual objects, which can have different types (eg. files, images, documents) and are organized within containers (eg. folders). Each object/file has is own URL, which can be used to access the resource, share the file with other people, setup custom metadata and access control lists. The objects can be accessed programmatically (eg. via GET or PUT operations, for file download and upload) and can be also programmatically manipulated (eq. an image can be resized on the fly). The STaaS interface for the EGI Federated Cloud is provided via to the CDMI standard. Differently from the block storage, there is virtually no limit to the amount of data you can store, only the space used is accounted, you can access the data from any location (from any VM running at any EGI site or even from other cloud providers or from your own laptop/browser), you can expose the data via external portals (using HTTP as transport protocols), you can set access control lists per file and even make the data publicly available. On the other hand, the data is accessed via a client, thus integration with existing applications may require a change to the application logic.
 
A summary of the main differences between Block and Object Storage is reported in the following table.
 
{| cellspacing="5" cellpadding="5" border="0" class="wikitable"
|-
|
| '''Access'''
| '''Sharing'''
| '''Accounting'''
| '''Management'''
| '''Integration'''
|-
| '''Block Storage'''
| only from within a VM
only at the same site the VM is located
| not possible
| for the entire block, regardless how much of it is actively used in the VM
| via the OCCI interface
| easy with any application capable to write/read file from a local disk
|-
| '''Object Storage'''
| from any device connected to the internet.
| possible (data can be kept private or public)
| only for the data stored
| via the CDMI interface
| requires a client to be integrated within the application
|}
 
According to your application needs, you may select one technology over the other. In general, block storage is a good and simple solution for temporary data and data which you do not need to share beside the single application running on a single VM. If you need to have your data exposed within portals or shared between different steps of your processing workflow, it is usually best to use the object storage.
 
== Block Storage ==
 
=== How to use the EGI Block Storage ===
 
The EGI Federated Cloud block storage is managed via the OCCI interface, and can be accessed via the rOCCI command line client, by any other OCCI client or directly calling the OCCI API. The following examples will use the rOCCI client, as it is the standard OCCI client for the EGI Federated Cloud. For information on how to install the rOCCI client, you can follow [[Fedcloud-tf:CLI_Environment|this guide]]. Please note that the OCCI interface is required only to reserve the storage and attach it to a VM. After this operations, the resource is exposed within the VM as a normal disk device and file access will be performed as per any other local disk file access within the application. A newly created block storage is usually a blank disk, thus it needs to be formatted and mounted prior its usage. This operation is dependent on which operating system is installed on the VM. For this guide, we will consider a standard Linux operating system, managed via remote shell (SSH).
 
To use a block storage device, you need first to create it. You can do so issuing a "create storage" OCCI command, which via rOCCI is:
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action create --resource storage -t occi.storage.size='num(''<storage_site_in_gb>'')',occi.core.title=''<storage_resource_name>''
http://site.occi.endpoint/storage/''<storage_resource_id>''
 
where:
* ''<site_occi_endpoint>'' is the OCCI endpoint of your site. The endpoint shall be the same from which you are starting/stopping your virtual machines
* ''<proxy_certificate>'' is a X509 proxy certificate for authentication. See [[Fedcloud-tf:CLI_Environment|CLI Environment Setup]] for more information
* ''<storage_site_in_gb>'' is the size of your block storage device in GB. This is the size of the virtual disk which will be attached to the VM. You will be accounted for the entire disk size, regardless how much space you are using from it. Consider also that this is the raw size of the disk. Actual available file space will depend on the file system. The minimum size is 1 (1 GB), while the maximum size depends on the site, but is usually no more than 2-5TB.
* ''<storage_resource_name>'' is a mnemonic name for the resource. You can use this parameter internally to discriminate between disks.
* ''<storage_resource_id>'' is the id of the newly created resource. It will be used to identify the resurce in the future calls.
 
Upon a successful execution, the OCCI command will return the location of the newly created resource and its id. To check that everything is ok, you can issue a "describe storage" operation to check if your resource is available:
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action describe --resource /storage/''<storage_resource_id>''
 
After the successful creation of the storage resource, you need to attach it to a VM. You can do it on an already existing VM, via the "compute link" command:
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action link --resource /compute/''<vm_id>'' --link /storage/''<storage_resource_id>''
 
If you want to create a new VM with the storage directly attached to it (and thus be able to manage it at contextualization time), you can just add the <code>--link</code> command to the "compute create" command (as reported [[Fedcloud-tf:Users:FAQ#How_can_I_start_a_VM.3F|here]]):
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action create --resource compute [...] --link /storage/''<storage_resource_id>''
 
Please note that you can attach a storage to only one VM at the time. Tentative to attach it to more than one VM will fail.
 
If a block storage is attached correctly to a VM, it will be listed as a ''storagelink'' into the OCCI description of the compute entity, eg:
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action describe --resource /compute/''<vm_id>''
[...]
  Links:
    <nowiki>[[ http://schemas.ogf.org/occi/infrastructure#storagelink ]]</nowiki>
    >> location: /storage/link/f9e5b73d-71ac-4abb-96c9-ce7649734ae1
    occi.core.source = /compute/2f6d70c6-fb75-4372-9917-ac688b1391ee
    occi.core.target = /storage/7cfba655-f692-406f-a659-79b0224290cc
    occi.core.id = /storage/link/f9e5b73d-71ac-4abb-96c9-ce7649734ae1
[...]
 
Now, a new device will be available into your VM. The device will be initially empty, so you will need to format it and mount it. A full guide on how to do so is available [http://www.rackspace.com/knowledge_center/article/prepare-your-cloud-block-storage-volume here], a basic set of commands for Linux VMs are listed below:
 
[user@client]# ssh root@my_vm_ip
[root@vm]# #Find the disk device via fdisk
[root@vm]# fdisk -l
[...]
Disk '''/dev/vdb''': 1073 MB, 1073741824 bytes
16 heads, 63 sectors/track, 2080 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
[root@vm]# #Create a partition table into the disk
[root@vm]# echo -e "o\nn\np\n1\n\n\nw" | fdisk /dev/vdb
[root@vm]# #Format the partition
[root@vm]# mkfs.ext4 /dev/vdb
[root@vm]# #Mount the disk
[root@vm]# mkdir /mnt/additional_disk
[root@vm]# mount /dev/vdb /mnt/additional_disk
 
The block storage is now available in the VM, and you can use it to store your application data. Block storage will be persistent, so it will not be automatically destroyed if the VM is destroyed. It is also possible to detach it and reattach it to a new VM. To do so, you can execute the following OCCI commands:
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action unlink --resource /compute/''<vm_id>'' --link /storage/''<storage_resource_id>''
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action link --resource /compute/''<new_vm_id>'' --link /storage/''<storage_resource_id>''
 
Please note that, in the new VM, you will not need to reformat the storage again. This because the storage will keep the file system and all the data included into it. Depending on the OS, the storage may be also automatically mounted in the new VM. If that is not the case, you can manually issue a mount operation:
 
[root@vm]# mount /dev/vdb /mnt/additional_disk
 
If you do not need the storage and its data anymore, you can delete the storage issuing a "storage delete" command, eg:
 
[user@client]# occi -e ''<site_occi_endpoint>'' --auth x509 --user-cred ''<proxy_certificate>'' --voms --action delete --resource /storage/''<storage_resource_id>''
 
=== How to integrate the EGI Block Storage into your application ===
 
The easiest way to integrate block storage into your application is to do so at contextualization time. If you are using linux, you can use the following script as a sample for your contextualization/deployment script (NOTE: It may not work for all the OSes), which will format and mount the available storage resources into the /mnt/available_disks/ folder:
 
#!/bin/bash
i=0
for d in xvdb xvdc xvdd vdb vdc vdd; do
  if <nowiki>[[ -e /dev/$d ]]</nowiki>; then
    if [[ ! -e /dev/${d}1 ]]; then
      echo -e "o\nn\np\n1\n\n\nw" | fdisk /dev/$d
      sleep 1
      mkfs.ext4 /dev/${d}1
    elif [[ -e /dev/${d}2 ]]; then
      continue
    fi
    mkdir -p /mnt/additional_storage/$i
    mount -t ext4 /dev/${d}1 /mnt/additional_storage/$i
    i=$(( $i + 1 ))
  fi
done
 
You can then setup your application in its deployment script to write the application files into the block storage device. Applications will not see any difference between a block storage device and a normal hardware disk, thus no major changes should be required in the application logic.
 
Note that some OS, like [https://appdb.egi.eu/store/vappliance/cernvm/ CERNVM], will automatically detect all the attached block storage and add it to the root virtual file system.
 
== Object Storage ==
 
=== How to use the EGI Object Storage ===
 
Object storage does not need to be reserved a-priori, but can be used directly via the CDMI interface. Most of the CDMI operations are performed via simple HTTP calls, thus it is not strictly required to have a particular client for CDMI and generic HTTP clients like curl or wget may be used. Anyway, to access all the CDMI capabilities and manage authorization, it is recommended to use a simple CDMI client, such as [https://github.com/EGI-FCTF/bCDMI bCDMI]. You can install it by following the instructions [[Fedcloud-tf:CLI_Environment#Set_up_.28b.29CDMI_tools|here]].
 
Access to object storage is performed via a different set of operations. Here we will try to list the most important ones and give some usage samples:
 
* List the content of a folder:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ list /
marica-container/
 
* Create a folder:
[[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ mkdir test
{
  "completionStatus": "Complete",
  "objectName": "test/",
  "capabilitiesURI": "/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/cdmi_capabilities/container/",
  "parentURI": "/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/",
  "objectType": "application/cdmi-container",
  "metadata": {}
}
 
* Upload a file:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ put -T testfile test/test.txt
 
* Download a file:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ get test/test.txt -o testfile
 
* Delete a file:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete test/test.txt
 
* Delete a folder (folder must be empty):
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete test/
 
* Delete a folder and all its files:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete -r test/
 
* Make a folder publicly accessible:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ post test/ -m readacl='.r:*'
 
* Make a folder private:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ post test/ -m readacl='.'
 
* Get a public file url
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ publicurl test/test.txt
https://prisma-swift.ba.infn.it:8080/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/test/test.txt
 
* Download a public file
[user@client]# curl -k https://prisma-swift.ba.infn.it:8080/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/test/test.txt
 
=== How to integrate the EGI Object Storage into your application ===
 
Integration of the block storage within your application will require a CDMI client (such as [https://github.com/EGI-FCTF/bCDMI bCDMI] or a set of libraries (eg. [libcdmi https://github.com/livenson/libcdmi-python] to be integrated within your application. Another possibility is to integrate directly the CDMI API calls inside your application. This process may be relatively easy if you are using only the basic CDMI operations, which are standard HTTP RESTful operations. For more information, you can look at the [http://cdmi.sniacloud.com/ CDMI specifications], in particular to the [http://cdmi.sniacloud.com/cdmi_spec/6-common_operations/6-common_operations.htm#TOC_6_1_Overview common operations] and to the code of the [https://github.com/EGI-FCTF/bCDMI/blob/master/bcdmi bCDMI software] (which uses simple cURL calls to perform the CDMI operations).
 
Within CDMI, it is possible to set a file or a container as public (see paragraph above). If so, read operations can be performed via normal HTTP calls without authentication. This means you can access the data with any web browser and seamlessly integrate it within any web portal served by HTTP.
 
Other CDMI operations (like write operations), requires authentication. For the FedCloud, this is performed via a access token generated from the Keystone service of the site using the user X509-VOMS proxy certificate. It is possible to use the bCDMI client to perform the authentication process via the ''auth'' command (see [https://github.com/EGI-FCTF/bCDMI/blob/master/README.md bCDMI usage guide] for more information) and then use the generated token in your application with normal HTTP calls.

Revision as of 09:20, 16 October 2020