Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "VT-CloudCaps:Questionnaire"

From EGIWiki
Jump to navigation Jump to search
Line 109: Line 109:


*How are you accessing your data? (copied locally vs. accessed remotely)
*How are you accessing your data? (copied locally vs. accessed remotely)
*Have you considered using a centralized SQL database to share and access your corpus data across multiple instances?  
*Have you considered using a centralized SQL database to share and access your corpus data across multiple instances?
*Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?





Revision as of 18:33, 6 May 2013

This should evolve to questionnaire which should map state of user-groups using FedCloud and working with our mini-project. We have to create questionnaire, fill it with already known information and only then approach users.

Feel free to edit, just first ideas!

Pool of topics and questions

Image preparation and management

How has the image been created, how is it managed, should we help with preparation?

  • How many images are used by your group (one, several with different functions)?
  • How did you create the image? From scratch, from basic OS image, using full OS installation, copy of desktop, copy of image prepared in vmware/virtualbox on desktop, group already provides one ...
  • Is it one partition with system, multiple partitions/whole disk (with dedicated place for data, empty space for other data, packages, user data)
  • Is everything required for computation already installed in the image? Would it be interesting to install parts during VM start (contextualization, always latest version of packages)? Is it installing packages/software during/after boot? CVMFS?
  • Image prepared to run with KVM/Xen, in which format (OVF)?
  • Do you rely on a specific Linux Kernel version?
  • How should new versions of the image be distributed and installed? No need, rarely, often via vmcatcher, other way. How do you intend to deal with security updates?
  • Is image signed? Endorsed by some group? Verified by some RP?
  • What kind of hardware requirements (resource demands do  your image and application have? RAM, Disk, Processor, Cores.
  • What are the network requirements of your application? Do you require access to the running instance from external? Which ports do you require to be open? Do you expect arbitrary access from within the instance to the outside world? What are your bandwidth expectations?

Workload management

How do you submit the actual work to the running instances? Should we care, help?

  • Do you use a form of pilot framework? BOINC? Other implementation of call-home?
  • Is VM started by some workload system/application, which immediately submits "jobs"?
  • Who is doing scheduling? VMs running across several providers?
  • Do you do automatic scaling of your framework? Do you require vertical scaling, e.g. sizing up instances, or horizontal scaling, i.e. adding more instances as needed?
  • How long should a VM run (long computation, smaller jobs submited inside VM,...)?
  • Can the VM be preempted or migrated?

AAI and contextualization

How do you intend access to running VMs (should we help, explain what's possible, push contextualization?)

  • Is there support for user contextualization? Already available/would be nice/not needed.
  • Does your system come with pre-installed ssh access, a fixed root password, ssh public key, group accessible public key, other way to login, remote desktop, no need for root access, need for user contextualization (storing ssh key in authorized_keys)
  • Management of running VMs - all started by one representative of VO, image/VM shared between group of users, VM just for one user
  • Does VM contain some credentials to be able to access remote services/data? Could this be injected via contextualization?

Data, big data

In some cases, big data are analyzed/produced by cloud applications. There is usually place for improvements, help, new services...

  • Does your application work with large amounts of data? If yes, which type of access is needed (big shared network storage, virtual disk accessed by some VMs, object storage)? Do you only read or also write this category of data?
  • Is all of the data used by all VMs? Every VM/job is using small subset? Other patterns?
  • Do you require a Hadoop like environment?
  • Are you already using some object storage like S3, CDMI? Data service from EGI (gridftp, SE, SRM)?
  • Large data downloaded/produced during VM lifetime?
  • Need for higher-level control of data access?

What else should we know?

  • Is there a need for other services? Like messaging system, integration with standard EGI services (data?), SQL database?

UseCase-specific questionnaires

OpenModeller

Candidate for block storage, object storage and possibly auto-scaling.

  • How did you create the image? (from scratch, basic installation, full installation etc.)
  • Is everything required for computation already installed in the image? (software, tools etc.)
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
  • What are your resource requirements? (CPU, memory, storage and network)


  • How do you submit work to running instances? (pilot framework or local workload)
  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?
  • How do you access your virtual machines once they have been launched?
  • Are you using contextualization? (how and where or why not)


  • What's the character of your data? (size, format, read-only vs. read-write)
  • How are you accessing your data? (copied locally vs. accessed remotely)
  • How much space do you need for a single computation?
  • Could environmental layers be stored in object storage?
  • How are you gathering results and what's their character? (size, format, sensitivity)
  • Do you support or actively use any dynamic cloud-like environment? (which, how and why)


  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)
  • How are they protected from unauthorized use?

WeNMR

Candidate for auto-scaling.

  • How did you create the image? (from scratch, basic installation, full installation etc.)
  • Is everything required for computation already installed in the image? (software, tools etc.)
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
  • What are your resource requirements? (CPU, memory, storage and network)


  • How do you submit work to running instances? (pilot framework or local workload)
  • Are you using contextualization? (how and where or why not)


  • What's the character of your data? (size, format, read-only vs. read-write)
  • Have you considered using object storage to access your data and store the results?
  • Are you dealing with sensitive data?
  • How are you accessing your data? (copied locally vs. accessed remotely)

BNCWeb

Candidate for SQL database.

  • How did you create the image? (from scratch, basic installation, full installation etc.)
  • Is everything required for computation already installed in the image? (software, tools etc.)
  • How will you distribute your image and its updates? (vmcaster/vmcatcher, automated using a different tool, by hand)
  • What are your resource requirements? (CPU, memory, storage and network)


  • How are you accessing your data? (copied locally vs. accessed remotely)
  • Have you considered using a centralized SQL database to share and access your corpus data across multiple instances?
  • Does your application support horizontal (more instances) and vertical (more resources for a single instance) auto-scaling?


  • Are you using contextualization? (how and where or why not)
  • Are you exposing any services to the outside world? (i.e., listening on public interfaces)
  • How are they protected from unauthorized use?