Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

HOWTO09 How to use Federated Cloud Storage

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


This page aims to give a brief description of the storage services provided by the EGI Federated Cloud and a basic tutorial on how to use and integrate them into your application.

The guide is intended for application developers and system administrator to select the best Federated Cloud storage solution for their application needs and to understand how to integrate it into their own applications.

Storage solutions overview

If you are in need of more storage than the one provided within the VM OS image disk, you can can use the EGI Federated Cloud storage services.

There are two kind of services, the Block Storage and the Object Storage. Both of them have their own set of advantages and disadvantages

Block storage is a capability of the Federated Cloud Infrastructure-as-a-Service (IaaS). It provides additional storage blocks who can be attached to a virtual machine. A storage block is a virtual disk of a given size, which may be exposed as a virtual device in the VM. You can think of this can of devices as a USB stick that can be plugged into the VMs and can be used as a normal drive. You can format it with any file system you want and mount it in your VM. Block devices are persistent, thus they keep all the data after VM shutdown and need to be explicitly destroyed when data is not needed anymore. Block storage disks can be accessed only from within a VM, and only from VMs running at the same site where the block storage is located. Also, they can be accessed by only one VM at the same time. As part of the IaaS service, block storage is managed via OCCI (or OpenStack native interface). There is a limit on the number of block storage devices you can attach on a VM and there is a limit to the maximum size of such virtual disks. These values will depend on the particular Federated Cloud site. Moreover, the disk space is accounted for the entire block storage device, regardless how much of it is actually used.

Object storage is a standalone service of the EGI Federated Cloud, usually referred also as Federated Cloud STorage-as-a-Service (STaaS). Object storage stores data as set of individual objects, which can have different types (e.g. files, images, documents) and are organized within containers (e.g. folders). Each object/file has is own URL, which can be used to access the resource, share the file with other people, setup custom metadata and access control lists. These objects are accessed and managed via a REST API. The STaaS interface for the EGI Federated Cloud is provided via to the CDMI standard and/or OpenStack SWIFT interface. Differently from the block storage, there is virtually no limit to the amount of data you can store, only the space used is accounted, you can access the data from any location (from any VM running at any EGI site or even from other cloud providers or from your own laptop/browser), you can expose the data via external portals (using HTTP as transport protocols), you can set access control lists per container and even make the data publicly available. On the other hand, data is accessed via a API requests, thus integration with existing applications may require a change to the application logic.

A summary of the main differences between Block and Object Storage is reported in the following table.

Access Sharing Accounting Management Integration
Block Storage only from within a VM

only at the same site the VM is located

not possible for the entire block, regardless how much of it is actively used in the VM via OCCI interface (or native OpenStack for provider enabling it) POSIX access, easy with any application capable to write/read file from a local disk
Object Storage from any device connected to the internet. possible (data can be kept private or public) only for the data stored via CDMI interface (or native OpenStack for providers enabling it) files are accessed via requests to the server, requires a client to be integrated within the application

According to your application needs, you may select one technology over the other. In general, block storage is a good and simple solution for temporary data and data which you do not need to share beside the single application running on a single VM. If you need to have your data exposed within portals or shared between different steps of your processing workflow, it is usually best to use the object storage.

Block Storage

Managing Block Storage

Block storage is created and managed via requests to specific APIs. Once the storage is attached to a VM, is managed as a regular block device that can be managed from within the VM as any other block device.

OCCI

The EGI Federated Cloud block storage can be managed via the OCCI interface, so you can use the rOCCI command line client as shown in the examples below. The installation guide for rOCCI client can help you to get the client ready.

To use a block storage device, you need first to create it. You can do so issuing a "create storage" OCCI command, which via rOCCI is:

occi -e <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a create -r storage \
     -t occi.storage.size='num(<storage_site_in_gb>)',occi.core.title=<storage_resource_name>

where:

  • <site_occi_endpoint> is the OCCI endpoint of your site.
  • <proxy_certificate> is a X509 proxy certificate for authentication.
  • <storage_site_in_gb> is the size of your block storage device in GB. You will be accounted for the entire disk size, regardless how much space you are using from it. Consider also that this is the raw size of the disk. Actual available file space will depend on the file system. The minimum size is 1 (1 GB), while the maximum size depends on the site, but is usually no more than 2-5TB.
  • <storage_resource_name> is a mnemonic name for the resource. You can use this parameter internally to discriminate between disks.

That command will return the ID of the newly created storage resource (typically in the form https://<site_occi_endpoint>/storage/<some-id>). This ID will be used to identify the resource in subsequent commands.

You can list your available volumes with the list command:

occi -e <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a list -r storage

And get detailed information on any of the available volumes:

occi -e <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a describe -r <storage_resource_id>


After the successful creation of the storage resource, you can attach it to a VM. You can do it on an already existing VM, via the "link" command:

occi -e <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a link -r <vm_id> \
     -j <storage_resource_id>

You can also attach the storage directly to a VM on creation (and this be able to use it during contextualization). Just add the --link (or the equivalent -j command to the "compute create" command:

occi -e <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a create -r compute \
     [...other VM creation parameters...] \
     -j <storage_resource_id>


Please note that you can attach a storage to only one VM at the time. Any attempt to attach it to more than one VM will fail.

If a block storage is attached correctly to a VM, it will be listed as storagelink a when described, e.g:

occi -e  <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a describe -r <vm_id>
[...]
  Links:
 
    [[ http://schemas.ogf.org/occi/infrastructure#storagelink ]]
    >> location: /storage/link/f9e5b73d-71ac-4abb-96c9-ce7649734ae1
    occi.core.source = /compute/2f6d70c6-fb75-4372-9917-ac688b1391ee
    occi.core.target = /storage/7cfba655-f692-406f-a659-79b0224290cc
    occi.core.id = /storage/link/f9e5b73d-71ac-4abb-96c9-ce7649734ae1
    occi.storagelink.deviceid = /dev/vdb
[...]

The occi.storagelink.deviceid shows the device on the VM where the disk is found. Block storage will be persistent, so it will not be destroyed if the VM is destroyed. It is also possible to detach it and reattach it to a different VM. Detaching is performed with this command (storage_link_id in the example above is /storage/link/f9e5b73d-71ac-4abb-96c9-ce7649734ae1):

occi -e  <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a unlink -r <storage_link_id>

Once you do not need the storage anymore, you can delete it issuing a command like this:

occi -e  <site_occi_endpoint> -n x509 -x <proxy_certificate> -X \
     -a delete -r <storage_resource_id>

OpenStack

OpenStack sites can also offer block storage features natively. For the OpenStack client we provide an extension to authenticate with the current EGI AAI. The API and SDK guide contains the details on how to install the client and use it with the appropriate credentials.

The main commands for managing volumes are the following:

Command Function
volume list list available volumes
volume create --size <size> <name> Create a volume of size <size> GBs with name <name>
volume show <volume> show details of a given volume
volume delete <volume> delete a volume
server add volume <server> <volume> Attach volume to a server
server remove volume <server> <volume> Detach volume from server

How to integrate the EGI Block Storage into your application

Block Storage will appear as block devices into your VM. Usually these devices are empty upon creation. You will need to partition and create filesystems on them the first time you attach them to a VM.

You can just create a filesystem on the block device with the following command (run this at your VM!). Only run this command the first time you use the device, it removes all data stored:

# mkfs.ext4 /dev/<volume device>

The volume device (e.g vdb)can be obtained with a description of the link in OCCI or with a volume show in OpenStack. Once you have a filesystem you can mount it at the desired path:

# mount /dev/<volume device> /<path>

With that you can access /<path> where all your data will be available. Applications will not see any difference between a block storage device and a normal hardware disk, thus no major changes should be required in the application logic.

Note that some OS, like CERNVM, will automatically detect all the attached block storage and add it to the root virtual file system.

Object Storage

EGI currently offers two APIs for accessing Object Storage:

CDMI

NOTE: CDMI STORAGE IS CURRENTLY IN PRE-PRODUCTION MODE IN THE EGI FEDERATED CLOUD. ACCESS TO STORAGE RESOURCES IS AVAILABLE ON REQUEST THROUGH THE EGI USER COMMUNITY SUPPORT TEAM: UCST@EGI.EU

CDMI operations are performed via simple HTTP calls, thus it is not strictly required to have a particular client for CDMI: generic HTTP clients like curl or wget may be used. Clients can ease the operations and manage authentication that may vary from implementations. EGI provides bCDMI, a simple client that wraps curl calls and performs authentication with X.509 proxies for you.

Objects in CDMI are organized into a hierarchical structure of containers where objects are stored (similarly to files into a directory hierarchy). Here follows some sample queries to the server to manage and access containers and objects:

  • List the content of a container:
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ list /
marica-container/
  • Create a container:
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ mkdir test
{
  "completionStatus": "Complete",
  "objectName": "test/",
  "capabilitiesURI": "/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/cdmi_capabilities/container/",
  "parentURI": "/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/",
  "objectType": "application/cdmi-container",
  "metadata": {}
}
  • Upload a file (object):
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ put -T testfile test/test.txt
  • Download a file (object):
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ get test/test.txt -o testfile
  • Delete a file (object):
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete test/test.txt
  • Delete a container (must be empty):
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete test/
  • Delete a container recursively:
$ ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete -r test/

Uploading large objects

CDMI implementation may impose limits on the size of individual objects (in OpenStack based CDMI this limit is set at 5GB by default) so in order to upload larger objects you will need to split your file into pieces that are uploaded into the same location and then made available as a single download.

We are working on an update of the cdmi client to automatically handle the splitting for you, the steps needed are the following:

  1. Split your file into smaller chunks
  2. Upload each of the chunks into the same location adding the following headers to the request:
    • X-CDMI-uploadID with the same id of your choice for all the chunks
    • X-CDMI-partial as true for all the chunks
    • Content-Range with the following format bytes=<start>-<end>, where start and end are the bytes from the total file that each individual chunk contains
  3. Once the chunks are uploaded, make a PUT request to the location of the file with the following headers:
    • X-CDMI-uploadID with the same id of your choice for all the chunks
    • X-CDMI-partial as false
  4. Now you can get the complete file with a simple download (the CDMI server will merge all the chunks for you)

OpenStack SWIFT

Similarly to CDMI, SWIFT offers a RESTful API to manage your storage. There are two main clients for this platform: the OpenStack command line client and the swift client. For the OpenStack client we provide an extension to authenticate with the current EGI AAI. The API and SDK guide contains the details on how to install the client.

Available resources can be gathered in swift services in GOCDB. For accessing the endpoint check the URL of the specific provider, e.g. for server4-epsh.unizar.es the URL is https://server4-epsh.unizar.es:5000/v2.0/. With your openstack client, first get which projects are you allowed in:

$ keystone_tenants --os-auth-url https://server4-epsh.unizar.es:5000/v2.0/ \
                   --os-auth-type v2voms --os-x509-user-proxy $X509_USER_PROXY
Tenant id: fffd98393bae4bf0acf66237c8f292ad
Tenant name: egi
Enabled: True
Description: egi fedcloud

With the tenant id, you can use the openstack client directly:

$ openstack --os-auth-url https://server4-epsh.unizar.es:5000/v2.0/ \
            --os-auth-type v2voms os-x509-user-proxy $X509_USER_PROXY \
            --os-project-id fffd98393bae4bf0acf66237c8f292ad \
            <commands>

For convenience you can also set the parameters as environment variables, so you don't have to include in every other command:

export OS_AUTH_URL=https://server4-epsh.unizar.es:5000/v2.0/
export OS_AUTH_TYPE=v2voms
export OS_X509_USER_PROXY=$X509_USER_PROXY
export OS_PROJECT_ID=fffd98393bae4bf0acf66237c8f292ad

Here follows some common operations:

  • List containers:
$ openstack container list
+-----------------+
| Name            |
+-----------------+
| Cloudflow       |
| my-new-bucket23 |
+-----------------+
  • Create a container
$ openstack container create test
+---------+-----------+------------+
| account | container | x-trans-id |
+---------+-----------+------------+
| v1      | test      | None       |
+---------+-----------+------------+
  • Create an object on a container:
$ openstack object create test test.txt
+----------+-----------+----------------------------------+
| object   | container | etag                             |
+----------+-----------+----------------------------------+
| test.txt | test      | 3fc8eaba542609681ac900797e67ac98 |
+----------+-----------+----------------------------------+
  • List objects on a container:
$ openstack object list test
+----------+
| Name     |
+----------+
| test.txt |
+----------+
  • Download object:
$ openstack object save test test.txt
  • Delete object:
$ openstack object delete test test.txt
  • Delete container (must be empty):
openstack container delete test
  • Delete recursively container:
openstack container delete -r test

Swift client

Although the swift client does not integrate directly with the EGI AAI, it does include some useful features that are missing in the OpenStack command-line client that may be relevant for your needs. Nevertheless it's quite simple to use the swift client following these steps:

First get a token with the OpenStack client (adapt arguments to your endpoint):

$ openstack --os-auth-url https://fsd-cloud.zam.kfa-juelich.de:5000/v2.0 \
            --os-auth-type v2voms --os-x509-user-proxy $X509_USER_PROXY \
            --os-project-id df37f5b1ebc94604964c2854b9c0551f \
            token issue
+------------+----------------------------------+
| Field      | Value                            |
+------------+----------------------------------+
| expires    | 2016-05-27T12:21:14Z             |
| id         | 70c8706f59bb4986bef3e463b9169477 |
| project_id | df37f5b1ebc94604964c2854b9c0551f |
| user_id    | 54ab086a6c1949dab782f7addb2689da |
+------------+----------------------------------+

Get the URL of the SWIFT endpoint (again adapt arguments to your endpoint):

$ openstack --os-auth-url https://fsd-cloud.zam.kfa-juelich.de:5000/v2.0 \
            --os-auth-type v2voms --os-x509-user-proxy $X509_USER_PROXY \
            --os-project-id df37f5b1ebc94604964c2854b9c0551f \
            catalog show swift
+-----------+-----------------------------------------------------------------------------------------------+
| Field     | Value                                                                                         |
+-----------+-----------------------------------------------------------------------------------------------+
| endpoints | FSDCloud                                                                                      |
|           |   publicURL: https://swift.zam.kfa-juelich.de:8888/v1/AUTH_df37f5b1ebc94604964c2854b9c0551f   |
|           |   internalURL: https://swift.zam.kfa-juelich.de:8888/v1/AUTH_df37f5b1ebc94604964c2854b9c0551f |
|           |   adminURL: https://swift.zam.kfa-juelich.de:8888/v1                                          |
|           |                                                                                               |
| name      | swift                                                                                         |
| type      | object-store                                                                                  |
+-----------+-----------------------------------------------------------------------------------------------+

Now you can use the token and the public URL to access with the swift client the resources, for example the stat command:

swift --os-auth-token 70c8706f59bb4986bef3e463b9169477 \
      --os-storage-url https://swift.zam.kfa-juelich.de:8888/v1/AUTH_df37f5b1ebc94604964c2854b9c0551f \
      stat
                        Account: AUTH_df37f5b1ebc94604964c2854b9c0551f
                     Containers: 19
                        Objects: 141
                          Bytes: 26977607038
Containers in policy "policy-0": 19
   Objects in policy "policy-0": 141
     Bytes in policy "policy-0": 26977607038
                            Via: 1.1 swift.zam.kfa-juelich.de:8888
    X-Account-Project-Domain-Id: default
                     Connection: close
                         Server: Apache
                    X-Timestamp: 1354717393.97315
                     X-Trans-Id: txff3a507662cf484990e61-0057482f0a
                   Content-Type: text/plain; charset=utf-8
                  Accept-Ranges: bytes

Large objects

Swift client will manage the upload of large objects automatically for you and split into segments as required to meet the size limits of the server. The swift upload command will do that for you, just use it as usual:

swift upload <container> <file>

The upload options can be further refined as described in the swift upload documentation and Large Objects overview of OpenStack. When you download the object, you will get the whole file independently of how the upload was splitted.

Setting ACLs

One of the interesting features of Object Storage is the possibility to set ACLs on the containers. You can easily do this with the swift post command. For example to set a container as publicly readable:

swift post -r ".r:*" <container>

To remove public access from a container

swift post -r "" <container>

Check the OpenStack documentation for more information.

How to integrate the EGI Object Storage into your application

Integration of the block storage within your application will require a CDMI client (such as bCDMI or a set of libraries (eg. [libcdmi https://github.com/livenson/libcdmi-python] to be integrated within your application. Another possibility is to integrate directly the CDMI API calls inside your application. This process may be relatively easy if you are using only the basic CDMI operations, which are standard HTTP RESTful operations. For more information, you can look at the CDMI specifications, in particular to the common operations and to the code of the bCDMI software (which uses simple cURL calls to perform the CDMI operations).

Within CDMI, it is possible to set a file or a container as public (see paragraph above). If so, read operations can be performed via normal HTTP calls without authentication. This means you can access the data with any web browser and seamlessly integrate it within any web portal served by HTTP.

Other CDMI operations (like write operations), requires authentication. For the FedCloud, this is performed via a access token generated from the Keystone service of the site using the user X509-VOMS proxy certificate. It is possible to use the bCDMI client to perform the authentication process via the auth command (see bCDMI usage guide for more information) and then use the generated token in your application with normal HTTP calls.