Difference between revisions of "VT-CloudCaps:Questionnaire"

From EGIWiki
Jump to: navigation, search
(Created page with "This should evolve to questionnaire which should map state of user-groups using FedCloud and working with our mini-project. We have to create questionnaire, fill it with already ...")
 
(DIRAC)
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This should evolve to questionnaire which should map state of user-groups using FedCloud and working with our mini-project.
+
This should evolve to questionnaire which should map state of user-groups using FedCloud and working with our mini-project. We have to create questionnaire, fill it with already known information and only then approach users.  
We have to create questionnaire, fill it with already known information and only then approach users.
 
  
Feel free to edit, just first ideas!
+
Feel free to edit, just first ideas!  
  
== Image preparation ==
+
== Pool of topics and questions  ==
  
How image was created, managed, should we help with preparation?
+
=== Image preparation and management  ===
  
* how many images is used by this group (one, several with different functions)?
+
How has the image been created, how is it managed, should we help with preparation?  
* how was is created? From scratch, from basic OS image, using full-installation, copy of desktop, group already provides ...
 
* is it one partition with system, more disks with data, empty space for other data (packages, user data)
 
* is everything installed in image? Is it installing packages/software during/after boot?
 
* image prepared to run with KVM/Xen, in OVF?
 
* kernel version?
 
* how new versions should be installed? No, rarely, via vmcatcher, other way.
 
* signed? endorsed? verified by some RP?
 
* HW requirements
 
* network requirements (public IP, open ports, firewall, running in VPN, expected bandwidth)
 
  
== Workload management ==
+
*How many images are used by your group (one, several with different functions)?
 +
*How did you create the image? From scratch, from basic OS image, using full OS installation, copy of desktop, copy of image prepared in vmware/virtualbox on desktop, group already provides one ...
 +
*Is it one partition with system, multiple partitions/whole disk (with dedicated place for data, empty space for other data, packages, user data)
 +
*Is everything required for computation already installed in the image? Would it be interesting to install parts during VM start (contextualization, always latest version of packages)? Is it installing packages/software during/after boot? CVMFS?
 +
*Image prepared to run with KVM/Xen, in which format (OVF)?
 +
*Do you rely on a specific Linux Kernel version?
 +
*How should new versions of the image be distributed and installed? No need, rarely, often via vmcatcher, other way. How do you intend to deal with security updates?
 +
*Is image signed? Endorsed by some group? Verified by some RP?
 +
*What kind of hardware requirements (resource demands do  your image and application have? RAM, Disk, Processor, Cores.
 +
*What are the network requirements of your application? Do you require access to the running instance from external? Which ports do you require to be open? Do you expect arbitrary access from within the instance to the outside world? What are your bandwidth expectations?
  
How actual work is submitted to running VM?
+
=== Workload management  ===
  
* some pilot framework? BOINC? call home?
+
How do you submit the actual work to the running instances? Should we care, help?  
* scheduling? Is autoscaling usable/needed?
 
* how long should VM run? 
 
  
== AAI and contextualization ==
+
*Do you use a form of pilot framework? BOINC? Other implementation of call-home?  
* some support for user contextualization? Already/would be nice/not needed.
+
*Is VM started by some workload system/application, which immediately submits "jobs"?
* system with pre-installed ssh, know root password, ssh public key, group accessible public key, other way to login, remote desktop, need for user contextualization (storing ssh key in authorized_keys)
+
*Who is doing scheduling? VMs running across several providers?  
* some support for system contextualization? Already/would be nice/not needed.
+
*Do you do automatic scaling of your framework? Do you require vertical scaling, e.g. sizing up instances, or horizontal scaling, i.e. adding more instances as needed? <br>
* ... a lot of questions ...  
+
*How long should a VM run (long computation, smaller jobs submited inside VM,...)?
 +
*Can the VM be preempted or migrated?
  
== Data, big data ==
+
=== AAI and contextualization  ===
* is application working with some big data? If yes, which type of access is needed
 
* ... a lot of questions ...
 
  
== What else ==
+
How do you intend access to running VMs (should we help, explain what's possible, push contextualization?)
 +
 
 +
*Is there support for user contextualization? Already available/would be nice/not needed.
 +
*Does your system come with pre-installed ssh access, a fixed root password, ssh public key, group accessible public key, other way to login, remote desktop, no need for root access, need for user contextualization (storing ssh key in authorized_keys)
 +
*Management of running VMs - all started by one representative of VO, image/VM shared between group of users, VM just for one user
 +
*Does VM contain some credentials to be able to access remote services/data? Could this be injected via contextualization?
 +
 
 +
=== Data, big data  ===
 +
 
 +
In some cases, big data are analyzed/produced by cloud applications. There is usually place for improvements, help, new services...
 +
 
 +
*Does your application work with large amounts of data? If yes, which type of access is needed (big shared network storage, virtual disk accessed by some VMs, object storage)? Do you only read or also write this category of data?
 +
*Is all of the data used by all VMs? Every VM/job is using small subset? Other patterns?
 +
*Do you require a Hadoop like environment?
 +
*Are you already using some object storage like S3, CDMI? Data service from EGI (gridftp, SE, SRM)?
 +
*Large data downloaded/produced during VM lifetime?
 +
*Need for higher-level control of data access?
 +
 
 +
=== What else should we know?  ===
 +
 
 +
*Is there a need for other services? Like messaging system, integration with standard EGI services (data?), SQL database?
 +
 
 +
== UseCase-specific questionnaires  ==
 +
 
 +
=== OpenModeller  ===
 +
 
 +
Candidate for '''block storage''', '''object storage''' and possibly '''auto-scaling'''.
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
 
 +
From scratch, installing the needed libraries and tools as COMPSs, OpenModeller, rOCCI client, etc. and then we saved it.
 +
 
 +
*Is everything required for computation already installed in the image? (software, tools etc.)
 +
 
 +
Yes, even if we could have done it at VM creation time
 +
 
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
 
 +
I uploaded the image in the EGI repository and then I endorsed it at CESNET. I don't know how the other providers published it.
 +
 
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
 
 +
We never measured minimum requirements for running openModeller, and this can also be quite variable depending on the experiment. However, since the EGI Use Case is related with the BioVeL project, which is currently using a modelling service hosted at CRIA, I think BioVeL expects that the new service instance in Europe should provide at least a similar performance, if not better. The whole service is running on a single machine here:
 +
 
 +
Dell PowerEdge 1800 (2 processors Intel Xeon CPU 3.80GHz, 4GB Memory DDR2, 400MHz, 6 HDs SCSI of 146GB, 2 HDs SCSI of 300GB)
 +
 
 +
<br> <br>
 +
 
 +
*How do you submit work to running instances? (pilot framework or local workload)
 +
 
 +
We use the VENUS-C/COMPSs framework. An endpoint is provided to user communities and accessed through Taverna for BioVeL users, through a Virtual Research Environment in EUBrazil-OpenBio.
 +
 
 +
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
 +
 
 +
Horizontal
 +
 
 +
*How do you access your virtual machines once they have been launched?
 +
 
 +
COMPSs access the VMs for execution.
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
 
 +
COMPSs at the moment of VM creation, copies the SSH keys
 +
 
 +
<br>
 +
 
 +
*What's the character of your data? (size, format, read-only vs. read-write)
 +
 
 +
The local set of environmental layers has ~32GB. Please note this is just a limited set of the most popular files, so this number can easily increase over the time. Environmental layers are one of the inputs for the modelling procedure, together with a set of species occurrences points. Both are only read by openModeller to generate, test or project models.
 +
 
 +
*How are you accessing your data? (copied locally vs. accessed remotely)
 +
 
 +
Copied locally
 +
 
 +
*How much space do you need for a single computation?
 +
 
 +
Results can either be small XML files generated by creating or testing models (few KB) or a raster file generated by projecting a model (from a few MB up to a few GB, depending on spatial extent, resolution and format). In the new service API there's a new operation that will accept multiple model creations, tests and projections in a single request, which makes this question even more complicated to answer.
 +
 
 +
<br>
 +
 
 +
*Could environmental layers be stored in object storage?
 +
 
 +
Environmental layers are rasters that are usually stored as regular files, but they can also be stored in relational databases, such as TerraLib or rasdaman do. Apparently Oracle Spatial can store rasters using object storage, but I have no experience with this.
 +
 
 +
*How are you gathering results and what's their character? (size, format, sensitivity)
 +
 
 +
The service is asynchronous: clients send job requests and need to retrieve results later when the job is finished. Results are stored for a certain period of time configured by the sysadmin. They are all regular files. Here we keep them for a couple of weeks. It's hard to tell about the size because it depends on the number of requests in that period and on the type of requests. Right now the results in our server are only taking a few MB, but it would be a good idea to reserve some GB for this task. There's no security mechanism provided by the service. Results are retrieved by providing a ticket that is generated for the initial request (a random combination of numbers and characters).
 +
 
 +
*Do you support or actively use any dynamic cloud-like environment? (which, how and why)
 +
 
 +
<br>
 +
 
 +
*Are you exposing any services to the outside world? (i.e., listening on public interfaces)
 +
 
 +
Yes, there is an Extended Open Modeller Web Service exposed to the users and deployed outside EGI
 +
 
 +
*How are they protected from unauthorized use?
 +
 
 +
The endpoint is public (at the moment, Renato don't know if you plan something about security) but then the access to the VENUS-C/COMPSs middleware is protected with x509 certificates security.
 +
 
 +
=== WeNMR  ===
 +
 
 +
Candidate for '''auto-scaling'''.
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
 
 +
the image was taken by Wouter from a previous developer not working anymore at CMBI. We initially planned to re-create it from scratch by following best practices as suggested e.g. in http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-custom-images.html, however there was no time to do it before the FedCloud demo deadline and we simply shrink the existing one as much a possible.
 +
 
 +
*Is everything required for computation already installed in the image? (software, tools, data, etc.)
 +
 
 +
the software is there, the input data (listed in the ToPoS token) are copied by the VM before processing doing wget from http://nmr.cmbi.ru.nl/NRG-CING/data/
 +
 
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
 
 +
by hand
 +
 
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
 
 +
2GB of RAM per CPU-core are enough, storage is not critical (1GB of free space on th VM is enough), network is needed to copy input and output data from/to web repositories, and to communicate with ToPoS server <br>
 +
 
 +
*How do you submit work to running instances? (pilot framework or local workload)
 +
 
 +
using the ToPoS pilot framework
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
 
 +
some contextualisation was needed for Wnodes and CESNET cloud providers, as described in https://wiki.egi.eu/wiki/Fedcloud-tf:FedCloudWeNMR . The user know the password of the "i" account, but there is no need from the user to login in the VM because the application starts automatically after the boot. However the ssh key of the "i" account of the NMR server has to be present in the VM in order to copy there through rcp the produced output data.
 +
 
 +
<br>
 +
 
 +
*What's the character of your data? (size, format, read-only vs. read-write)
 +
 
 +
read-only input data zipped tarballs O(5 MB) accessed through web
 +
 
 +
*Have you considered using object storage to access your data and store the results?
 +
 
 +
no(t yet?)
 +
 
 +
*Are you dealing with sensitive data?
 +
 
 +
data are public, while parts of the code in the VM is something that the developers do not want make publicly available, because there is strong competition among the bio-NMR groups in designing the best algorithms for structure calculations
 +
 
 +
*How are you accessing your data? (copied locally vs. accessed remotely)
 +
 
 +
copied locally
 +
 
 +
=== BNCWeb  ===
 +
 
 +
Candidate for '''SQL database'''.
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
*Is everything required for computation already installed in the image? (software, tools, data, etc.)
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
 
 +
<br>
 +
 
 +
*How are you accessing your data? (copied locally vs. accessed remotely)
 +
*Have you considered using a centralized SQL database to share and access your corpus data across multiple instances?
 +
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
 +
 
 +
<br>
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
*Are you exposing any services to the outside world? (i.e., listening on public interfaces)
 +
*How are they protected from unauthorized use?
 +
 
 +
=== PeachNote  ===
 +
 
 +
Candidate for Messaging (currently using Amazon SQS), Database (Apache HBase), Auto-Scaling
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
*Is everything required for computation already installed in the image? (software, tools, data, etc.)
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
 
 +
<br>
 +
 
 +
*We have learned that your VM&nbsp;would need access to Amazon's SQS for job info, to HBase cluster to retrieve and store data, and to the peachnote server to regularly update the workflow code. Which are the hosts and ports these services run on?
 +
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
 +
 
 +
<br>
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
*Are you exposing any services to the outside world? (i.e., listening on public interfaces)
 +
*How are they protected from unauthorized use?
 +
 
 +
=== WSPGRADE  ===
 +
 
 +
Candidate for Auto-Scaling<br>
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
 
 +
We creates basic images from different operating systems (e.g SL 6, Debian, Ubuntu, CentOS) and then we fork the images in order to customize them (e.g install and configure WS-Pgrade).
 +
 
 +
*Is everything required for computation already installed in the image? (software, tools, data, etc.)
 +
 
 +
Yes, it contains everything.
 +
 
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
 
 +
Marketplaces, vmcaster/vmcatcher
 +
 
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
 
 +
The WS-PGrade needs 2 CPU, 4GB memory and 8-16GB of storage. <br>
 +
 
 +
*How are you accessing your data? (copied locally vs. accessed remotely)<br>
 +
 
 +
Remotely.
 +
 
 +
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
 +
 
 +
Not yet. <br>
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
 
 +
We are using EC2 like contextualization.
 +
 
 +
*Are you exposing any services to the outside world? (i.e., listening on public interfaces)
 +
 
 +
Our users should connect to the portal via http.
 +
 
 +
*How are they protected from unauthorized use?
 +
 
 +
It has a liferay portal framework and it uses its own authentication methods.
 +
 
 +
=== GaiaSpace  ===
 +
 
 +
Candidate for Auto-Scaling, Object-Storage, Block-Storage<br>
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
*Is everything required for computation already installed in the image? (software, tools, data, etc.)
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
 
 +
<br>
 +
 
 +
*How are you accessing your data? (copied locally vs. accessed remotely)<br>
 +
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
 +
 
 +
<br>
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
*Are you exposing any services to the outside world? (i.e., listening on public interfaces)
 +
*How are they protected from unauthorized use?
 +
 
 +
=== DIRAC  ===
 +
 
 +
Candidate for Auto-Scaling<br>
 +
 
 +
*How did you create the image? (from scratch, basic installation, full installation etc.)
 +
** (see VMConfiguration.png)  A DIRAC run includes a VMScheduler wich contains at least one Running Pod. A Running Pod defines the relationship between a single DIRAC VM image with a list of end-poits. A DIRAC VM image is a Boot image with the necesary for a particular Contextualization method. Currently the following contextualization methods are implemented:
 +
** 1) Ad-hoc image: A ready to go image without further contextualization. This image has to be prepared to run in a specific endpoint and a particular DIRAC configuration (VMDIRAC server to connect, DIRAC release to use ...)  We have run with CentOS and Ubuntu images supporting platform dependencies for Belle, Alice and Auger HEP software, both in private CloudStack IaaS and at Amazon EC2 commercial cloud.
 +
** 2) HEPiX Contextualized image to allow an image management with a golden image separated of the context specifics, which are automatically manages by VMDIRAC.
 +
*** 2.1) OpenNebula - HEPiX contextualization: golden image CernVM-batch-node. DIRAC configuration provided by a ISO context image, which is generic for all the OpenNebula IaaS sites of the cloud aggregation. End-point configuration provided through the on-the-fly Open Nebula context section environment, wich is specific of each Open Nebula IaaS end-point and selected on submission time from the DIRAC Configuration Server
 +
*** 2.2) OpenStack - HEPiX contextualization: golden image CernVM-batch-node. DIRAC configuration provided by the amiconfig tools, sending the scripts in nova 1.1 userdata. End-point configuration provided through nova 1.1 metadata, wich is specific of each OpenStack IaaS end-point and selected on submission time from the DIRAC Configuration Server.
 +
*** 2.3) Generic contextualizaton using ssh. Whatever image with a ssh deamon listenning in a port with inbound connectivity. The VM boots, the VMDIRAC polls the active sshd port and runs the DIRAC and the end-point configuration using ssftp and ssh connections.
 +
*Is everything required for computation already installed in the image? (software, tools, data, etc.)
 +
** For the sotware area VMDIRAC is using cvmfs remote software repository from CERN with the LHCb repo with software and Conditions DB and also cvmfs repository from USC with software and tools for Alice and Auger. VMDIRAC uses the cvmfs included at CernVM images with a particular configuration, but also Ubuntu and CentOS images which have been prepared with a cvmfs client. Of course, an eventual user may setup an ad-hoc image with every software and tools prepared for a particular run.
 +
** About the data, VM uses transparently third-party sotorage systems
 +
*How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
 +
** Any thid-party automated distribution can be used. Currently VMDIRAC is manually configured with the required image catalog metadata.
 +
*What are your resource requirements? (CPU, memory, storage and network)
 +
** The VM DIRAC image configuration allows to specify the VM flavor to run. LHCb has a work in progress to take advantage of multicore processing with different CPU and memory requirements depending on the specific software to run on the VMs.
 +
 
 +
 
 +
<br>
 +
 
 +
*How are you accessing your data? (copied locally vs. accessed remotely)<br>
 +
** Access data on VMDIRAC:
 +
**Accessing "remotely" in the site by SRM. Accessing on Amazon using S3. Copied locally. Input and Output Sandox.
 +
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
 +
**A VM sumbitting policy is configured to each end-point:
 +
*** endpoint vmPolicy "static" -> slots driven, indicated by maxEndpointInstances, all the end-point availables slots are used to create VMs.
 +
*** endpoint vmPolicy "elastic" -> jobs driven, one by one if thre are jobs on the DIRAC Task Queued. The elasticity of this policy is tunned using the Running Pod configuration parameter namely CPUPerInstance. If the current required CPU in the jobs of the DIRAC task queue divided by the currently running VMs is lower than the CPUPerInstance, then no more VMs are submitted. A CPUPerInstance can be set to the contextualization time of a specific sutup, and in this manner if the necesary average required time tu run the jobs of the DIRAC Task Queue is lower than the contextualization time, then no more VMs are submitted. This is a compromise solution to use the available resources in a more efficent manner (saving creation overheads), and at the same time can be setup to use all the available resources to finish the production in a shorter total time, but with more resource costs (additional overhead).
 +
** A VM stoppage policy is configured to each end-point:
 +
*** endpoint vmStopPolicy "never"
 +
*** endpoint vmStopPolicy "elastic" -> no more jobs + VM halting margin time
 +
** Anycase VMs can be stopped by the VMDIRAC admin or by the HEPiX stoppage in the CernVM images (wich is responsability of each cloud site admin). If a running VM is required to be stopped, then the VM stops in an ordenated manner, declaring the running job stopped in DIRAC (which can be resubmitted), then halting the VM.
 +
 
 +
 
 +
<br>
 +
 
 +
*Are you using contextualization? (how and where or why not)
 +
** Only in the generic ssh contextualization, the rest of the cases VMDIRAC uses outbound connectivity.
 +
*Are you exposing any services to the outside world? (i.e., listening on public interfaces) How are they protected from unauthorized use?
 +
** On the particular case of the generic ssh contextualization, public key from the VMDIRAC service is used, the ssh connections are dissabled after the configuration of the VM.
 +
 
 +
=== DCH  ===
 +
 
 +
Candidate for &lt;Capability&gt;
 +
 
 +
<br>
 +
 
 +
[[Category:VT-CloudCaps]]

Latest revision as of 09:06, 13 June 2013

This should evolve to questionnaire which should map state of user-groups using FedCloud and working with our mini-project. We have to create questionnaire, fill it with already known information and only then approach users.

Feel free to edit, just first ideas!

Pool of topics and questions

Image preparation and management

How has the image been created, how is it managed, should we help with preparation?

  • How many images are used by your group (one, several with different functions)?
  • How did you create the image? From scratch, from basic OS image, using full OS installation, copy of desktop, copy of image prepared in vmware/virtualbox on desktop, group already provides one ...
  • Is it one partition with system, multiple partitions/whole disk (with dedicated place for data, empty space for other data, packages, user data)
  • Is everything required for computation already installed in the image? Would it be interesting to install parts during VM start (contextualization, always latest version of packages)? Is it installing packages/software during/after boot? CVMFS?
  • Image prepared to run with KVM/Xen, in which format (OVF)?
  • Do you rely on a specific Linux Kernel version?
  • How should new versions of the image be distributed and installed? No need, rarely, often via vmcatcher, other way. How do you intend to deal with security updates?
  • Is image signed? Endorsed by some group? Verified by some RP?
  • What kind of hardware requirements (resource demands do  your image and application have? RAM, Disk, Processor, Cores.
  • What are the network requirements of your application? Do you require access to the running instance from external? Which ports do you require to be open? Do you expect arbitrary access from within the instance to the outside world? What are your bandwidth expectations?

Workload management

How do you submit the actual work to the running instances? Should we care, help?

  • Do you use a form of pilot framework? BOINC? Other implementation of call-home?
  • Is VM started by some workload system/application, which immediately submits "jobs"?
  • Who is doing scheduling? VMs running across several providers?
  • Do you do automatic scaling of your framework? Do you require vertical scaling, e.g. sizing up instances, or horizontal scaling, i.e. adding more instances as needed?
  • How long should a VM run (long computation, smaller jobs submited inside VM,...)?
  • Can the VM be preempted or migrated?

AAI and contextualization

How do you intend access to running VMs (should we help, explain what's possible, push contextualization?)

  • Is there support for user contextualization? Already available/would be nice/not needed.
  • Does your system come with pre-installed ssh access, a fixed root password, ssh public key, group accessible public key, other way to login, remote desktop, no need for root access, need for user contextualization (storing ssh key in authorized_keys)
  • Management of running VMs - all started by one representative of VO, image/VM shared between group of users, VM just for one user
  • Does VM contain some credentials to be able to access remote services/data? Could this be injected via contextualization?

Data, big data

In some cases, big data are analyzed/produced by cloud applications. There is usually place for improvements, help, new services...

  • Does your application work with large amounts of data? If yes, which type of access is needed (big shared network storage, virtual disk accessed by some VMs, object storage)? Do you only read or also write this category of data?
  • Is all of the data used by all VMs? Every VM/job is using small subset? Other patterns?
  • Do you require a Hadoop like environment?
  • Are you already using some object storage like S3, CDMI? Data service from EGI (gridftp, SE, SRM)?
  • Large data downloaded/produced during VM lifetime?
  • Need for higher-level control of data access?

What else should we know?

  • Is there a need for other services? Like messaging system, integration with standard EGI services (data?), SQL database?

UseCase-specific questionnaires

OpenModeller

Candidate for block storage, object storage and possibly auto-scaling.

  • How did you create the image? (from scratch, basic installation, full installation etc.)

From scratch, installing the needed libraries and tools as COMPSs, OpenModeller, rOCCI client, etc. and then we saved it.

  • Is everything required for computation already installed in the image? (software, tools etc.)

Yes, even if we could have done it at VM creation time

  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)

I uploaded the image in the EGI repository and then I endorsed it at CESNET. I don't know how the other providers published it.

  • What are your resource requirements? (CPU, memory, storage and network)

We never measured minimum requirements for running openModeller, and this can also be quite variable depending on the experiment. However, since the EGI Use Case is related with the BioVeL project, which is currently using a modelling service hosted at CRIA, I think BioVeL expects that the new service instance in Europe should provide at least a similar performance, if not better. The whole service is running on a single machine here:

Dell PowerEdge 1800 (2 processors Intel Xeon CPU 3.80GHz, 4GB Memory DDR2, 400MHz, 6 HDs SCSI of 146GB, 2 HDs SCSI of 300GB)



  • How do you submit work to running instances? (pilot framework or local workload)

We use the VENUS-C/COMPSs framework. An endpoint is provided to user communities and accessed through Taverna for BioVeL users, through a Virtual Research Environment in EUBrazil-OpenBio.

  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?

Horizontal

  • How do you access your virtual machines once they have been launched?

COMPSs access the VMs for execution.

  • Are you using contextualization? (how and where or why not)

COMPSs at the moment of VM creation, copies the SSH keys


  • What's the character of your data? (size, format, read-only vs. read-write)

The local set of environmental layers has ~32GB. Please note this is just a limited set of the most popular files, so this number can easily increase over the time. Environmental layers are one of the inputs for the modelling procedure, together with a set of species occurrences points. Both are only read by openModeller to generate, test or project models.

  • How are you accessing your data? (copied locally vs. accessed remotely)

Copied locally

  • How much space do you need for a single computation?

Results can either be small XML files generated by creating or testing models (few KB) or a raster file generated by projecting a model (from a few MB up to a few GB, depending on spatial extent, resolution and format). In the new service API there's a new operation that will accept multiple model creations, tests and projections in a single request, which makes this question even more complicated to answer.


  • Could environmental layers be stored in object storage?

Environmental layers are rasters that are usually stored as regular files, but they can also be stored in relational databases, such as TerraLib or rasdaman do. Apparently Oracle Spatial can store rasters using object storage, but I have no experience with this.

  • How are you gathering results and what's their character? (size, format, sensitivity)

The service is asynchronous: clients send job requests and need to retrieve results later when the job is finished. Results are stored for a certain period of time configured by the sysadmin. They are all regular files. Here we keep them for a couple of weeks. It's hard to tell about the size because it depends on the number of requests in that period and on the type of requests. Right now the results in our server are only taking a few MB, but it would be a good idea to reserve some GB for this task. There's no security mechanism provided by the service. Results are retrieved by providing a ticket that is generated for the initial request (a random combination of numbers and characters).

  • Do you support or actively use any dynamic cloud-like environment? (which, how and why)


  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)

Yes, there is an Extended Open Modeller Web Service exposed to the users and deployed outside EGI

  • How are they protected from unauthorized use?

The endpoint is public (at the moment, Renato don't know if you plan something about security) but then the access to the VENUS-C/COMPSs middleware is protected with x509 certificates security.

WeNMR

Candidate for auto-scaling.

  • How did you create the image? (from scratch, basic installation, full installation etc.)

the image was taken by Wouter from a previous developer not working anymore at CMBI. We initially planned to re-create it from scratch by following best practices as suggested e.g. in http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-custom-images.html, however there was no time to do it before the FedCloud demo deadline and we simply shrink the existing one as much a possible.

  • Is everything required for computation already installed in the image? (software, tools, data, etc.)

the software is there, the input data (listed in the ToPoS token) are copied by the VM before processing doing wget from http://nmr.cmbi.ru.nl/NRG-CING/data/

  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)

by hand

  • What are your resource requirements? (CPU, memory, storage and network)

2GB of RAM per CPU-core are enough, storage is not critical (1GB of free space on th VM is enough), network is needed to copy input and output data from/to web repositories, and to communicate with ToPoS server

  • How do you submit work to running instances? (pilot framework or local workload)

using the ToPoS pilot framework

  • Are you using contextualization? (how and where or why not)

some contextualisation was needed for Wnodes and CESNET cloud providers, as described in https://wiki.egi.eu/wiki/Fedcloud-tf:FedCloudWeNMR . The user know the password of the "i" account, but there is no need from the user to login in the VM because the application starts automatically after the boot. However the ssh key of the "i" account of the NMR server has to be present in the VM in order to copy there through rcp the produced output data.


  • What's the character of your data? (size, format, read-only vs. read-write)

read-only input data zipped tarballs O(5 MB) accessed through web

  • Have you considered using object storage to access your data and store the results?

no(t yet?)

  • Are you dealing with sensitive data?

data are public, while parts of the code in the VM is something that the developers do not want make publicly available, because there is strong competition among the bio-NMR groups in designing the best algorithms for structure calculations

  • How are you accessing your data? (copied locally vs. accessed remotely)

copied locally

BNCWeb

Candidate for SQL database.

  • How did you create the image? (from scratch, basic installation, full installation etc.)
  • Is everything required for computation already installed in the image? (software, tools, data, etc.)
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
  • What are your resource requirements? (CPU, memory, storage and network)


  • How are you accessing your data? (copied locally vs. accessed remotely)
  • Have you considered using a centralized SQL database to share and access your corpus data across multiple instances?
  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?


  • Are you using contextualization? (how and where or why not)
  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)
  • How are they protected from unauthorized use?

PeachNote

Candidate for Messaging (currently using Amazon SQS), Database (Apache HBase), Auto-Scaling

  • How did you create the image? (from scratch, basic installation, full installation etc.)
  • Is everything required for computation already installed in the image? (software, tools, data, etc.)
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
  • What are your resource requirements? (CPU, memory, storage and network)


  • We have learned that your VM would need access to Amazon's SQS for job info, to HBase cluster to retrieve and store data, and to the peachnote server to regularly update the workflow code. Which are the hosts and ports these services run on?
  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?


  • Are you using contextualization? (how and where or why not)
  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)
  • How are they protected from unauthorized use?

WSPGRADE

Candidate for Auto-Scaling

  • How did you create the image? (from scratch, basic installation, full installation etc.)

We creates basic images from different operating systems (e.g SL 6, Debian, Ubuntu, CentOS) and then we fork the images in order to customize them (e.g install and configure WS-Pgrade).

  • Is everything required for computation already installed in the image? (software, tools, data, etc.)

Yes, it contains everything.

  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)

Marketplaces, vmcaster/vmcatcher

  • What are your resource requirements? (CPU, memory, storage and network)

The WS-PGrade needs 2 CPU, 4GB memory and 8-16GB of storage.

  • How are you accessing your data? (copied locally vs. accessed remotely)

Remotely.

  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?

Not yet.

  • Are you using contextualization? (how and where or why not)

We are using EC2 like contextualization.

  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)

Our users should connect to the portal via http.

  • How are they protected from unauthorized use?

It has a liferay portal framework and it uses its own authentication methods.

GaiaSpace

Candidate for Auto-Scaling, Object-Storage, Block-Storage

  • How did you create the image? (from scratch, basic installation, full installation etc.)
  • Is everything required for computation already installed in the image? (software, tools, data, etc.)
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
  • What are your resource requirements? (CPU, memory, storage and network)


  • How are you accessing your data? (copied locally vs. accessed remotely)
  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?


  • Are you using contextualization? (how and where or why not)
  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)
  • How are they protected from unauthorized use?

DIRAC

Candidate for Auto-Scaling

  • How did you create the image? (from scratch, basic installation, full installation etc.)
    • (see VMConfiguration.png) A DIRAC run includes a VMScheduler wich contains at least one Running Pod. A Running Pod defines the relationship between a single DIRAC VM image with a list of end-poits. A DIRAC VM image is a Boot image with the necesary for a particular Contextualization method. Currently the following contextualization methods are implemented:
    • 1) Ad-hoc image: A ready to go image without further contextualization. This image has to be prepared to run in a specific endpoint and a particular DIRAC configuration (VMDIRAC server to connect, DIRAC release to use ...) We have run with CentOS and Ubuntu images supporting platform dependencies for Belle, Alice and Auger HEP software, both in private CloudStack IaaS and at Amazon EC2 commercial cloud.
    • 2) HEPiX Contextualized image to allow an image management with a golden image separated of the context specifics, which are automatically manages by VMDIRAC.
      • 2.1) OpenNebula - HEPiX contextualization: golden image CernVM-batch-node. DIRAC configuration provided by a ISO context image, which is generic for all the OpenNebula IaaS sites of the cloud aggregation. End-point configuration provided through the on-the-fly Open Nebula context section environment, wich is specific of each Open Nebula IaaS end-point and selected on submission time from the DIRAC Configuration Server
      • 2.2) OpenStack - HEPiX contextualization: golden image CernVM-batch-node. DIRAC configuration provided by the amiconfig tools, sending the scripts in nova 1.1 userdata. End-point configuration provided through nova 1.1 metadata, wich is specific of each OpenStack IaaS end-point and selected on submission time from the DIRAC Configuration Server.
      • 2.3) Generic contextualizaton using ssh. Whatever image with a ssh deamon listenning in a port with inbound connectivity. The VM boots, the VMDIRAC polls the active sshd port and runs the DIRAC and the end-point configuration using ssftp and ssh connections.
  • Is everything required for computation already installed in the image? (software, tools, data, etc.)
    • For the sotware area VMDIRAC is using cvmfs remote software repository from CERN with the LHCb repo with software and Conditions DB and also cvmfs repository from USC with software and tools for Alice and Auger. VMDIRAC uses the cvmfs included at CernVM images with a particular configuration, but also Ubuntu and CentOS images which have been prepared with a cvmfs client. Of course, an eventual user may setup an ad-hoc image with every software and tools prepared for a particular run.
    • About the data, VM uses transparently third-party sotorage systems
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
    • Any thid-party automated distribution can be used. Currently VMDIRAC is manually configured with the required image catalog metadata.
  • What are your resource requirements? (CPU, memory, storage and network)
    • The VM DIRAC image configuration allows to specify the VM flavor to run. LHCb has a work in progress to take advantage of multicore processing with different CPU and memory requirements depending on the specific software to run on the VMs.



  • How are you accessing your data? (copied locally vs. accessed remotely)
    • Access data on VMDIRAC:
    • Accessing "remotely" in the site by SRM. Accessing on Amazon using S3. Copied locally. Input and Output Sandox.
  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
    • A VM sumbitting policy is configured to each end-point:
      • endpoint vmPolicy "static" -> slots driven, indicated by maxEndpointInstances, all the end-point availables slots are used to create VMs.
      • endpoint vmPolicy "elastic" -> jobs driven, one by one if thre are jobs on the DIRAC Task Queued. The elasticity of this policy is tunned using the Running Pod configuration parameter namely CPUPerInstance. If the current required CPU in the jobs of the DIRAC task queue divided by the currently running VMs is lower than the CPUPerInstance, then no more VMs are submitted. A CPUPerInstance can be set to the contextualization time of a specific sutup, and in this manner if the necesary average required time tu run the jobs of the DIRAC Task Queue is lower than the contextualization time, then no more VMs are submitted. This is a compromise solution to use the available resources in a more efficent manner (saving creation overheads), and at the same time can be setup to use all the available resources to finish the production in a shorter total time, but with more resource costs (additional overhead).
    • A VM stoppage policy is configured to each end-point:
      • endpoint vmStopPolicy "never"
      • endpoint vmStopPolicy "elastic" -> no more jobs + VM halting margin time
    • Anycase VMs can be stopped by the VMDIRAC admin or by the HEPiX stoppage in the CernVM images (wich is responsability of each cloud site admin). If a running VM is required to be stopped, then the VM stops in an ordenated manner, declaring the running job stopped in DIRAC (which can be resubmitted), then halting the VM.



  • Are you using contextualization? (how and where or why not)
    • Only in the generic ssh contextualization, the rest of the cases VMDIRAC uses outbound connectivity.
  • Are you exposing any services to the outside world? (i.e., listening on public interfaces) How are they protected from unauthorized use?
    • On the particular case of the generic ssh contextualization, public key from the VMDIRAC service is used, the ssh connections are dissabled after the configuration of the VM.

DCH

Candidate for <Capability>