From EGIWiki
Revision as of 14:39, 17 April 2013 by Ruda (talk | contribs) (Data, big data)
Jump to: navigation, search

This should evolve to questionnaire which should map state of user-groups using FedCloud and working with our mini-project. We have to create questionnaire, fill it with already known information and only then approach users.

Feel free to edit, just first ideas!

Image preparation

How image was created, is managed, should we help with preparation?

  • how many images is used by this group (one, several with different functions)?
  • how image was created? From scratch, from basic OS image, using full OS installation, copy of desktop, copy of image prepared in vmware/virtualbox on desktop, group already provides one ...
  • is it one partition with system, more partitions/whole disk (with dedicated place for data, empty space for other data, packages, user data)
  • is everything required for computation installed in image? Would it be interesting to install parts during VM start (contextualization, always latest version of packages)? Is it installing packages/software during/after boot? CVMFS?
  • image prepared to run with KVM/Xen, in which format (OVF)?
  • kernel version (doesn't matter ... must be exactly the same)?
  • how new versions of image should be installed? No need, rarely, often via vmcatcher, other way. What about security updates?
  • Is image signed? endorsed by some group? verified by some RP?
  • HW requirements (special HW, small/big template)
  • network requirements (public IP, open ports, firewall, running in VPN, expected bandwidth)

Workload management

How actual work is submitted to running VM? Should we care, help?

  • some pilot framework? BOINC? Other implementation of call-home?
  • Is VM started by some wokload system/application, which immediately submits "jobs"?
  • Who is doing scheduling? VMs running across several providers?
  • Is autoscaling usable/needed (different requirements on resources during VM lifetime)? Or is it easier to spawn new VMs?
  • How long should VM run (long computation, smaller jobs submited inside VM,...)
  • Can be VM preempted, migrated?

AAI and contextualization

How access to running VM is implemented (should we help, explain what's possible, push contextualization?)

  • some support for user contextualization? Already/would be nice/not needed.
  • system with pre-installed ssh access, know root password, ssh public key, group accessible public key, other way to login, remote desktop, no need for root access, need for user contextualization (storing ssh key in authorized_keys)
  • management of running VMs - all started by one representative of VO, image/VM shared between group of users, VM just for one user
  • does VM contains some credentials to be able to access remote services/data?

Data, big data

In some cases, big data are analyzed/produced by cloud applications. There is usually place for improvements, help, new services...

  • is application working with some big data? If yes, which type of access is needed (big shared network storage, virtual disk accessed by some VMs, object storage)?
  • all data used by all VMs? Every VM/job is using small subset? Other patterns?
  • requirements for Hadoop like environment?
  • already using some object storage like S3, CDMI? Data service from EGI (gridftp, SE, SRM)?
  • large data downloaded/produced during VM lifetime?
  • need for higher-level control of data access?

What else should we know?

  • need for other services? Like messaging system, integration with standard EGI services (data?), SQL database?