Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI Opendata platform

From EGIWiki
Jump to navigation Jump to search
Alert.png This article is Deprecated and has been moved to https://docs.egi.eu/users/datahub/.



Overview For users For resource providers Infrastructure status Site-specific configuration Architecture




The EGI Open Data platform

The Open Data Platform (ODP) is a solution allowing integration of various data repositories available in a distributed infrastructure, offering the capability to make data open, and link them to key open data catalogues following respective guidelines, such as the OpenAIRE open access infrastructure. The core enabling technology of ODP is Onedata34, a data management solution that allows a seamless and optimised access to data spread over a distributed infrastructure.

Onedata storage is composed of a global network of providers who provision their storage resources to users. Any data centre, or even a personal computer, can become Onedata provider by installing the Oneprovider service near the physical storage resources. Providers have full control over which users can use their storage resources and to what extent - in terms of data size and transfer limits. Thanks to its highly optimized architecture, it enables high throughput access to large-scale data through standardized interfaces based on POSIX and CDMI, hiding from the users the low level storage interfaces, and allows the seamless application execution in multi-provider environment. Users can integrate data stored in different Oneprovider services in personal catalogues, which offer a fast and transparent access, regardless the geographical location of the data.

Setup your Onedata deployment into the EGI Federated Cloud

This is a step-by-step guide to setup a simple Onedata deployment into the EGI Federated Cloud. For further information about Onedata, please refer to its online documentation.

The deployment described on this guide is made of:

  • OneZone instance
  • 1 OneProvider instance with attached a cloud block storage exposed through a POSIX interface.

This setup can be easily extended to include more providers.

The guide also includes instruction to access data managed by this OneData deployment from other virtual instances started in the EGI Federated Cloud.

OneZone

ssh -i [path-to-your-private-key] ubuntu@[ip-address-of-your-instance] 
  • Install docker-compose in the machine as root.
sudo -i

curl -L https://github.com/docker/compose/releases/download/1.8.0-rc1/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose

and verify its setup:

~# docker-compose -v
docker-compose version 1.8.0-rc1, build 9bf6bc6

don't forget to exit from the root shell.

  • In the home of the ubuntu user download the OneData software:
git clone https://github.com/onedata/getting-started
  • Go to the scenario 3.0 folder.
cd getting-started/scenarios/3_0_oneprovider_onezone_multihost/
  • Install OneZone using the run_onedata.sh script:
sudo ./run_onedata.sh --zone &

Wait for the message "Congratulations! onezone has been successfully started."

Now your OneZone instance is ready. You can access it with your browser:

https://[ip-address-of-your-onezone-instance]

Please, login with

login:admin
password:password

OneProvider

  • Create your storage in the cloud and store its ID in an environment variable.
occi --endpoint $ENDPOINT  --auth x509 --voms --user-cred $X509_USER_PROXY --action create --resource storage
--attribute occi.storage.size="num([number-og-GB])" --attribute occi.core.title="[storage_name]"
[ID]

export STORAGE_ID=[ID]
  • Link the block storage to your instance.
occi --endpoint $ENDPOINT  --auth x509 --voms --user-cred $X509_USER_PROXY --action link --resource $COMPUTE_ID --link $STORAGE_ID
  • Run an OCCI describe command to your instance to get the block storage path. You can find it in the occi.storagelink.deviceid attribute. See the example below.


occi --endpoint $ENDPOINT  --auth x509 --voms --user-cred $X509_USER_PROXY --action describe --resource $COMPUTE_ID

[[ http://schemas.ogf.org/occi/infrastructure#compute ]]
>> location: /occi/compute/136618fb-4687-4a15-b144-5381d9aa0742
occi.core.id = 136618fb-4687-4a15-b144-5381d9aa0742
occi.core.title = onedata-provider100
occi.compute.cores = 2
occi.compute.hostname = onedata-provider100
occi.compute.memory = 4096
occi.compute.state = active

Links:

    [[ http://schemas.ogf.org/occi/core#link ]]
    >> location: /occi/storagelink/136618fb-4687-4a15-b144-5381d9aa0742_fd8e7b72-0c1e-4f9c-a1a0-d4043c56b752
    occi.storagelink.deviceid = /dev/vdb
    occi.core.source = http://cloud.recas.ba.infn.it:8787/occi/compute/136618fb-4687-4a15-b144-5381d9aa0742
    occi.core.target = http://cloud.recas.ba.infn.it:8787/occi/storage/fd8e7b72-0c1e-4f9c-a1a0-d4043c56b752
    occi.core.id = 136618fb-4687-4a15-b144-5381d9aa0742_fd8e7b72-0c1e-4f9c-a1a0-d4043c56b752

    [[ http://schemas.ogf.org/occi/core#link ]]
    >> location: /occi/networklink/136618fb-4687-4a15-b144-5381d9aa0742_fe82ef7b-4bb7-4c1e-b4ec-ec5c1b0c7333_90.147.102.160
    occi.networkinterface.mac = fa:16:3e:bf:26:94
    occi.networkinterface.interface = eth0
    occi.networkinterface.state = active
    occi.networkinterface.allocation = dynamic
    occi.networkinterface.address = 90.147.102.160
    occi.core.source = http://cloud.recas.ba.infn.it:8787/occi/compute/136618fb-4687-4a15-b144-5381d9aa0742
    occi.core.target = http://cloud.recas.ba.infn.it:8787/occi/network/fe82ef7b-4bb7-4c1e-b4ec-ec5c1b0c7333
    occi.core.id = 136618fb-4687-4a15-b144-5381d9aa0742_fe82ef7b-4bb7-4c1e-b4ec-ec5c1b0c7333_90.147.102.160

Mixins:

    [[ http://schemas.openstack.org/template/os#b7765c28-6bc6-438a-8b7c-b6873103c5f5 ]]
    title:        Image for Docker Ubuntu 14.04 [Ubuntu/14.04/VirtualBox]_EGI_fedcloud
    term:         b7765c28-6bc6-438a-8b7c-b6873103c5f5
    location:     /occi/os_tpl/b7765c28-6bc6-438a-8b7c-b6873103c5f5

    [[ http://schemas.openstack.org/template/resource#8 ]]
    title:        Flavor: medium
    term:         8
    location:     /occi/resource_tpl/8

Actions:

    [[ http://schemas.ogf.org/occi/infrastructure/compute/action#start ]]

    [[ http://schemas.ogf.org/occi/infrastructure/compute/action#stop ]]

    [[ http://schemas.ogf.org/occi/infrastructure/compute/action#restart ]]

    [[ http://schemas.ogf.org/occi/infrastructure/compute/action#suspend ]]

  • Login into the new instance with your private key.
ssh -i [path-to-your-private-key] ubuntu@[ip-address-of-your-instance] 
  • Format and mount the block storage attached to the instance
sudo mkfs.ext3 [block-storage-path-in-your-instance]
sudo mount /dev/vdb [mount-point]

In the following an example with block storage identified by /dev/vdb and mount point /mnt/.

sudo mkfs.ext3 /dev/vdb
sudo mount /dev/vdb /mnt/
  • Install docker-compose in the machine as root.
sudo -i

curl -L https://github.com/docker/compose/releases/download/1.8.0-rc1/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose

and verify its setup:

~# docker-compose -v
docker-compose version 1.8.0-rc1, build 9bf6bc6

don't forget to exit from the root shell.

  • In the home of the ubuntu user download the OneData software:
git clone https://github.com/onedata/getting-started
  • go to the scenario 3.0 folder.
cd getting-started/scenarios/3_0_oneprovider_onezone_multihost/
  • edit file docker-compose-oneprovider.yml and replace ${ONEPROVIDER_DATA_DIR} with the folder where you mounted the block storage.
# data persistence
- "[mount-point]:/volumes/storage"


  • Install OneProvider using the run_onedata.sh script:
sudo ./run_onedata.sh --provider --provider-fqdn [OneData-IP] --zone-fqdn [OneZone-IP] &

Wait for the message "Congratulations! oneprovider has been successfully started."

See below an example of output.

IMPORTANT: After each start wait for a message: Congratulations! oneprovider has been successfully started.
To ensure that the oneprovider is completely setup.
Pulling node1.oneprovider.localhost (onedata/oneprovider:3.0.0-rc5)...
3.0.0-rc5: Pulling from onedata/oneprovider
a3ed95caeb02: Pull complete
57c018ee1aa8: Pull complete
ebd817e34e50: Pull complete
c068f2d0906a: Pull complete
b3bad60afaef: Pull complete
47175703b6bc: Pull complete
5cd40b9132e8: Pull complete
25c596923cc2: Pull complete
e665a356e4f5: Pull complete
6585e20d8e6f: Pull complete
229064f88987: Pull complete
debcc3142260: Pull complete
0056c9a8edc5: Pull complete
Digest: sha256:88c0fd1fa61ebd4e548222beb14f9d29efacddcf0d8bea7f1f1185d4d453be64
Status: Downloaded newer image for onedata/oneprovider:3.0.0-rc5
Creating network "30oneprovideronezonemultihost_default" with the default driver
Creating oneprovider-1
Attaching to oneprovider-1
oneprovider-1                  | Starting op_panel: [  OK  ]
oneprovider-1                  | 
oneprovider-1                  | Configuring oneprovider:
oneprovider-1                  | * service_onepanel: set_cookie
oneprovider-1                  | * service_onepanel: purge_node
oneprovider-1                  | * service_onepanel: create_tables
oneprovider-1                  | * service_onepanel: add_default_users
oneprovider-1                  | * service_onepanel: add_nodes
oneprovider-1                  | * service: save
oneprovider-1                  | * service_couchbase: configure
oneprovider-1                  | * service_couchbase: start
oneprovider-1                  | * service_couchbase: wait_for_init
oneprovider-1                  | * service_couchbase: init_cluster
oneprovider-1                  | * service_couchbase: rebalance_cluster
oneprovider-1                  | * service_couchbase: status
oneprovider-1                  | * service: save
oneprovider-1                  | * service_cluster_manager: configure
oneprovider-1                  | * service_cluster_manager: start
oneprovider-1                  | * service_cluster_manager: status
oneprovider-1                  | * service: save
oneprovider-1                  | * service_op_worker: configure
oneprovider-1                  | * service_op_worker: setup_certs
oneprovider-1                  | * service_op_worker: start
oneprovider-1                  | * service_op_worker: wait_for_init
oneprovider-1                  | * service_op_worker: status
oneprovider-1                  | * service_op_worker: add_storages
oneprovider-1                  | * service_oneprovider: configure
oneprovider-1                  | * service_oneprovider: configure
oneprovider-1                  | * service_oneprovider: register
oneprovider-1                  | * service: save
oneprovider-1                  | * service_onepanel: add_users
oneprovider-1                  | 
oneprovider-1                  | Container details:
oneprovider-1                  | * IP Address: 172.18.0.2
oneprovider-1                  | * Ports: -
oneprovider-1                  | 
oneprovider-1                  | Congratulations! oneprovider has been successfully started.

Space Management

This section describes how to create a new space in your OneZone instance and support it with your OneProvider instance.

  • Access your OneZone instance with a browser
https://[ip-address-of-your-onezone-instance]
  • Select Data Space Management and Create a new space
  • Insert a name for the new space
  • Click on the just create space and then on get support
  • Copy the token
  • Access the administrative panel of your OneProvider instance
https://[ip-address-of-your-oneprovider-instance]:9443
  • Select Space/Management/Support Space
  • Fill in the form:

Storage: NFS

Token: token copied in OneZone

Support size: amount of storage you want to assign to this space (<= the size of the block storage you attached to the OneProvider instance)

  • Go back to OneZone and reload the page in your browser
  • Now, the provider is showed in the map. Select Data Space Management and the space you created. The provider will appear under the space.
  • Select the provider and, then, Go to your files in the map.
  • OneZone will redirect you in the Oneprovider web page to manage data. Upload some sample files.

Access your data from an instance running into the EGI FedCloud

This section described how you can access the data stored in your OneData installation from any other instance created within the EGI FedCloud through a POSIX interface.

  • Go to your OneZone instance and get an authentication code selecting Access Tokens and Create new access token. Copy the token.
  • Setup your OneData client:
wget -q -O - http://get.onedata.org/oneclient.sh | bash

export PROVIDER_HOSTNAME=[IP-OneProvider]
export ONECLIENT_AUTHORIZATION_TOKEN='[TOKEN]'
  • Create a folder where you would like to mount the OneData virtual file-system. Store its path in the MOUNT_POINT environment variable.

For example:

mkdir /mnt/onedata
export MOUNT_POINT=/mnt/onedata
  • Run the OneData client
oneclient --authentication token $MOUNT_POINT --no-check-certificate
  • Now the OneData virtual file-system is mounted to your VM. Go to [MOUNT_POINT] and check if you can access the file you uploaded in OneZone.
  • Update one of the files in [MOUNT_POINT] or create a new one. Go to OneZone and check if the file is accessible through the web interface.