EGI Operations Surveys (CLOSED)

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Back to the operations survey main page


(CLOSED) Federation of NGI services and central coordination

  • Release date: 19-12-2012
  • Deadline for submission:22-01-2013

Overview

The NGI operations sustainability survey provides indications of little progress in securing new funding to support national operations services to compensate the end of EC funding expected after April 2014 with the end of EGI-InSPIRE. Through EGI-InSPIRE EC currently contribute to the 33% of the operational costs of national grid infrastructures.

Costs of running operations can be partly reduced by sharing operational services with other NGIs and by increasing their efficiency by centrally coordinating the deployment of core services. With this survey we aim at collecting NGI expressions of interest in exploring some of these deployment scenarios in preparation to the end of EGI-InSPIRE. The results of this survey will be used to plan actions for a transition to the period after EGI-InSPIRE.

Instructions

Please participate to the survey by submitting your contribution. Click here to take survey.

PDF version of the survey. It is a fill-in form PDF (Acrobat Reader X allows to save forms) and can be used for your convenience, but submit your official answer using the online survey.

Result

OMB presentation

(CLOSED) Sustainability of the EGI/NGIs operations service

  • Release date: 24 July 2012
  • Deadline for submission: 07 September 2012

Overview

  • Release date: 24 July 2012
  • Deadline for the submission: 07 September 2012

With this survey, we aim to collect information about the sustainability of NGI operations services and EGI global operations services beyond 2014.

Information gathered through this survey will be used: - to provide information for D4.7 'Sustainability assessment of operational services' - to prepare the workshop on the sustainability of national infrastructures planned for the next EGI Technical Forum (see workshop description) - to evolve operations in the coming two years of EGI-InSPIRE towards a more sustainable model

Instructions

Please participate to the survey by submitting your contribution here.

You can find a PDF version of the survey here. It is a fill-in form PDF (Acrobat Reader X allows to save forms) and can be used for your convenience, but submit your official answer using the online survey.


(CLOSED) Resource Allocation Policies

  • Release date: 17 July 2012
  • Deadline for submission: 07 September 2012

Overview

Support of new international user communities across EGI requires the access to a portion of resources at NGI and Resource Centre level. However, policies for capacity allocation differ greatly across NGIs, and in some cases across Resource Centres of a given NGI. Purpose of this survey is to collect information about individual policies in use for better capacity management across EGI.

ITIL defines capacity management as The Process responsible for ensuring that the Capacity of IT Services and the IT Infrastructure is able to deliver agreed Service Level Targets in a Cost Effective and timely manner. Capacity Management considers all Resources required to deliver the IT Service, and plans for short-, medium- and long-term Business Requirements.

The survey is targeted to NGI Operations Managers. Operations Managers are requested to collect information from individual site operations managers, in case of policies which differ across sites in the NGI.

Instructions

Please participate to the survey by submitting your contribution (1 reply per NGI) here. Feedback must be provided involving sites, if resource allocation policies are site-specific.

Results

(CLOSED) Usage and future maintenance of deployed software

  • Release date: 06 March 2012
  • Deadline for submission: 20 March 2012

Overview

Several EC-funded projects that are sourcing code currently deployed in the EGI production infrastructure, will terminate in 2013. Among these are EMI (ARC, dCache, gLite, UNICORE) and IGE (Globus). EGI needs to define the list of software products whose maintenance and supported is considered to be high priority in order to ensure service continuation to the end-users. The information collected through this survey is of great importance, and will be used by EGI to define a software support plan. Please make sure that your input to the survey is accurate and correctly reflects the position of your community. Purpose of this survey is to:

  • Define the list of products that are considered to be high priority
  • Assess current usage of these high priority products:
    • which communities are using what
    • current workload
  • Assess which operations and/or user communities will directly contribute to software development and maintenance after 2013 (last question of the survey)

This survey is targeted to virtual research communities and resource infrastructure providers (EIROs and NGIs, 1 reply per NGI/EIRO), who are responsible of collecting information from their users and resource centres respectively.

Instructions

Please participate to the survey by submitting your contribution here (Please, read carefully the overview and the instructions).

You can find here a PDF version of the survey.

Results

See slides

(CLOSED) Platform, software and VM deployment plans

  • Release date: 06 March 2012
  • Deadline for submission: 20 March 2012

Overview

Grid software is being provided for multiple OS platforms. For example, the number of platforms supported will increase with EMI 2.0, which is expected in Spring 2012: in EMI 2.0 it is expected that all products will be released for sl6, and a subset also for Debian. EGI operations consequently need to review the list of early adopter sites, to make sure that staged rollout resources are allocated for testing of software on the platforms of interest. In addition, EGI SA2 is deploying a new Market Place based on StratusLab. The new Market Place will include different Virtual Machines with different middleware services ready to be used, verified and tested by SA2 team. We need to assess your interest in this service.

All Resource infrastructure Providers are requested to consult with their site managers to define their priorities. Please provide your feedback according to the instructions below.

Instructions

Please participate to this survey by supplying your input at here

Results

(CLOSED) NGI International tasks and EGI Global Services

  • Release date: December 20 2011
  • Deadline for submission: January 19 2012

Overview NGI International Services

NGIs are requested to perform a self-assessment of the NGI operational services.

  • Question 1-7: NGIs are requested to assess the operations international tasks of the NGI/EIRO. The NGI representatives are required to fill in the tables providing an estimation of the current manpower needed to run the service and a written report of the main activities carried out in 2011. Note well: the manpower requested to be estimated is the TOTAL one, including EGI-InSPIRE funding and local sources of funding as applicable.
  • Question 8: NGIs are requested to provide feedback about three areas of improvement of operations for 2012
  • Question 9-11: NGIs are requested to estimate the level of internal funding for NGI operations already secured after the end EGI-InSPIRE

WORD version of survey

Instructions

Overview: EGI Global Services

NGIs are requested to assess the performance of EGI.eu Global Services (operational ones). Input provided will be used for the yearly assessment of EGI Global Services in 2012 (milestone MS115). The survey addresses all operations Global Services, these are devided into four categories:

  • PART I Infrastructure services and tools (central instances)
  • PART II Grid services: release and deployment
  • PART III Support
  • PART IV Operations management and coordination

WORD version of survey

Instructions

Results

(CLOSED) SLURM support for CREAM

  • Release date: November 25 2011
  • Deadline for submission: December 14th 2011

Overview

This survey is addressed to sites deploying gLite middleware to assess the interest in getting support of the SLRUM batch system in CREAM (the batch systems currently supported are LSF, PBS/Torque, SGE, this list does not include SLURM currently).SLRUM was reported in our community to be a viable replacement of PBS/Torque especially in large stes, thanks to its scalability and good functional capabilities.This survey aims to collect feedback from site managers in order to assess the need for support of SLURM in CREAM.We invite all sites deploying CREAM or who are planning to deploy CREAM, to participate to this survey (please coordinate participation internally and make sure that a single reply is submitted per site).The survey includes five questions, all in one page.

Instructions

Results

39 Sites would consider to replace their LRMS with SLURM if supported by CREAM

  • Total of 64600 cores (17% of the job slots available in the infrastructure)
  • 23 sites with more than 1k cores

If you are planning a replacement of the current batch system, what are the reasons for this?

  • 35 - Improve the scalability and the stability of the LRMS
  • 20 - The set of features of the current batch system are not sufficient
  • 15 - The new batch system would be easier to deploy and manage
  • 5 - Migrate from a costly solution to a free, open source, one

Note: Multiple answers were allowed

See also:

(CLOSED) LB capabilities, service management and auditing and gLite-CLUSTER

  • Release date: 18 Jan 2011
  • Deadline for submission: 10 Feb 2011

Overview

  • PART 1, Logging and Bookkeeping Service: Logging and Bookkeeping (LB) it is a monitoring service which gathers, aggregates and archives information on infrastructure behaviour from the perspective of users' tasks. The EMI project aims at extending the LB scope and its further integration with other grid services. The first page of the survey contains a set of questions to help LB Team to better design the new features of the LB Service, and to target the real users' needs.
  • PART 2, Remote Grid Service Management (RGSM). Management is performed through a set of notifications issued to the relevant Grid service instance. Examples of management actions are: start, stop, drain etc. The RGSM framework can be used for remote management of a service. EMI has a dedicated task force to investigate the requirements for common service monitoring and management interfaces. This survey is to collect information and requirements from the EGI operations community and sites to understand which technologies are of interest for service management.
  • PART 3, Grid service auditing (GSA), that is a feature that allows a system administrators and users to check the status of a service in terms of load, length of internal queues, and to monitor service workload from a grid point of view over time. Service auditing is different from Nagios-based monitoring as it is not based on probes, but rather on the periodic gathering of service status information.
  • PART 4, gLite CLUSTER, glite-CLUSTER allows the configuration of information related to the batch system environment to be separated from the configuration of the job submission interface. With this service sites will be able to publish their resources information consistently and without any workaround. Even in case of multiple CEs or cluster with heterogeneous hardware configuaration. There are few questions to understand how the publishing of cluster information is a problem for site managers.

Resources

  • pdf version of the survey

Results

Q1:Do you need more platforms to be supported by EMI software, in addition to those already in use (SL5)?

  • 12 responses

Q2:Part One, Logging and Bookkeeping(Questions from 3 to 10)LB: Which services should be watched in the grid in addition to gLite WMS and CREAM, that are already supported by LB?

  • ARC CE 7 - 58%
  • UNICORE CE 5 - 42%
  • Data Transfer 12 - 100%
  • SRM Operations 8 - 67%
  • Other 1

Q3:LB: What aggregated information would be useful (e.g. average queue traversal time, task failure rate etc.)?

  • 12 responses

Q4:LB: And at what level of aggregation (referring to the previous question)?

  • Per Use 6 - 55%
  • Per VO 10 - 91%
  • Per service instance 8 - 73%
  • Other, please specify: 5 - 45%

Q5: Would you leverage capturing dependencies among the tracked entities (e.g. to know that a computational jobs are blocked by failing transfers of their inputs, and to be able to discover detailsimmediately)?

  • Yes 5 - 50%
  • No 5 - 50%

Q6: What is the desired level of complexity of the queries on the service?

  • Simple, like: "all tasks on this CE", "this user's tasks within a given time interval": 4 - 36%
  • More sophisticated, but through current LB querying language: 2 - 18%
  • Full SQL/XQuery power on the task data: 7 - 64%
  • Intermediate, describe here: 1 - 9%

Q7: LB: What are the output formats to be supported

  • Glue-conforming WS interface: 6 - 55%
  • Simple key=value text format: 7 - 64%
  • JSON: 5 - 45%
  • Human readable HTML:2 - 18%
  • Other: 4 - 36%

Q8: What modes of retrieving information are foreseen ?

  • Synchronous (query-response):7 - 64%
  • Asynchronous (subscribe for notification, eventually via message bus): 4 - 36%

Q9: For how long data about the task should be kept?

  • One day 0 0%
  • One week 2 18%
  • One month 4 36%
  • One year or more 5 45%

Q10: Part two, Remote Grid Service Management (questions from 11 to 20) RGSM:Do any of your Grid services come with capabilities to react certain conditions by adapting their behaviour?

  • Yes 4 40%
  • No 6 60%

Q11: According to your day-to-day experience, please describe typical service management scenarios. How is management performed?

  • 8 Responses


Q12: Are there any management commands that can be performed on your Grid services that go beyond specific business logic (e.g. purge persistent data)?

  • Yes 2 25%
  • No 6 75%


Q13: What are the limits of your Grid service management capabilities?Is there a gap between the capabilities offered and your Grid service management needs?

  • 9 responses


Q14:Please list the 5 management commands you would need the most in your setup (e.g. start/stop services, deploy/un-deploy service, purge service data, dynamically change access rights..).

  • 10 Responses


Q15:Which of those 5 management commands apply to all of your Grid services?

  • 9 Responses


Q15:Out of your day-by-day eperience, how many services do you really need to manage remotely?

  • 11 Responses


Q16:are you capable of (un)deploy Grid services at runtime?

  • Yes 1 9%
  • No, but I would need it. 5 45%
  • No, and I don't need it. 5 45%


Q17:If you're deploying stateful Grid services in a site: does the Grid service interface support Grid service state deletion?

  • Yes 2 25%
  • No 6 75%


Q18:What kind of setup would you prefer for remotely managing your Grid services?

  • Dedicated: service management interfaces on each Grid service 5 45%
  • Decoupled: services get their commands from a messaging solution they register to 1 9%
  • Both 5 45%

Q19:Part three, Grid Service Auditing (questions from 21 to 26). GSA: What kind of data are you already collecting about your Grid services and how are you doing it?

  • 13 Responses


Q20:For which services auditing of service status is important?(service status: workload, queue status, etc..)

  • 10 Responses


Q21:For each service above, which data is mainly useful?

  • 8 Responses


Q22:For each service above, which data is mainly useful?

  • 3 Responses

Q23:Are the current service auditing capabilities sufficient, or should this be improved?

  • Yes 4 44%
  • No 5 56%

Q24:Should status data be automatically archived?

  • Yes 9 90%
  • No 1 10%

Q25:Part four, glite-CLUSTER (questions from 27 to 29). gC: How many sites in your NGI/EIRO have heterogeneous clusters, or multiple sub clusters (disjoint sets of workernodes, each set having sufficiently homogeneous properties), or multiple CEs?

  • 0 4 33%
  • Up to 5 6 50%
  • More than 5 2 17%


Q26: gC: How many of those sites reported difficulties in configuring their CEs, in order to properly publish their site capacity?

  • 0 5 38%
  • Up to 5 5 38%
  • More than 5 3 23%


Q27:Given that gLite-CLUSTER is released only for lgc-CE, how many sites in your NGI/EIRO are interested in usinge the gLite-CLUSTER capability?

  • 0 8 62%
  • Up to 5 2 15%
  • More than 5 3 23%

(CLOSED) Batch systems current deployment scenario in the EGI infrastructure

  • Release date: May 18 2011
  • Deadline for submission: 30 June 2011

Overview

Purpose of this survey is to understand what are the batch systems currently mostly deployed.

Results overview

Q1: Which are the batch systems currently deployed in your site? You can have a multiple choice.

  • PBS 12 5%
  • PBS/Torque 139 60%
  • PBS/Maui 142 61%
  • PBS/Moab 7 3%
  • LSF 18 8%
  • SGE(OGE) 20 9%
  • Slurm 5 2%
  • Condor 3 1%
  • Other, please specify 15 6%


Q3:Are you planning to replace the existing batch system in the near future? If so, please select the new one from the list below.

  • No plans to change the batch system 206 89%
  • PBS 0 0%
  • PBS/Torque 4 2%
  • PBS/Maui 3 1%
  • PBS/Moab 0 0%
  • LSF 1 0%
  • SGE(OGE) 8 3%
  • Slurm 7 3%
  • Condor 2 1%
  • Other, please specify 5 2%


Q4:If you have faced integration issues between your local batch system and the deployed middleware, please tell us about them

  • 70 Responses

(CLOSED) top-BDII deployment scenarios

  • Release date: May 24th 2011
  • Deadline for submission: June 30th 2011

Overview

Survey addressed to NGIs

Results

Q1:Do you currently deploy a top-BDII service?

  • Yes 17 94%
  • No 1 6%

Q2: If not, are you planning to deploy one?

  • Yes 2 40%
  • No 3 60%

Q3: Your NGI current top-BDII deployment

  • Without High Availability and/or Load Balancing 6 35%
  • With HA or LB, describe your solution 11 65%

Q4: Are you using failover at the client side using the top-BDII of another NGI?

  • Yes 1 6%
  • No 17 94%

Q5: Number of sites using the current top-BDII of your NGI

  • Unknoun 1 6%
  • From 1 to 5 7 39%
  • From 6 to 10 2 11%
  • From 11 to 15 4 22%
  • From 16 to 20 2 11%
  • From 21 to 25 0 0%
  • From 25 to 30 0 0%
  • More than 30 2 11%

Q6: Top-BDII is a critical service that needs to be highly available. Various strategies to improve the robustness of the service are possible, such as a load balancing configuration implementing a cluster of top-BDIIs (within your infrastructure or shared with other Resource Infrastructures), or by deploying a list of alternative top-BDII instances at the client side. Both scenarios are mostly beneficial to small/medium NGIs where the amount or resources may not justify the deployment of an own top-BDII cluster. Are you interested in this? In case of sufficient interest, EGI can put effort in the definition of a top-BDII deployment model that addresses these needs.

  • Yes 9 56%
  • No 7 44%

(CLOSED) UMD current deployment scenario

  • Release date: September 14th 2011
  • Deadline for submission: September 20th 2011

Overview

In this survey we are collecting information about the level of deployment of UMD and EMI released services into the production infrastructure. The results of the survey will be presented in the next week EGI Technical Forum, and as such we set a deadline until next Monday 19 September 2011 at 12h00 (France time)

Instructions

Survey closed.

Results

Q1:List here which services (if any) did you deploy from the UMD repositories (http://repository.egi.eu), please include the service version.

  • 59 Responses


Q2:List here which services (if any) did you deploy from the EMI repositories (http://emisoft.web.cern.ch/emisoft/), please include the service version.

  • 49 Responses

Q3:Please, leave here additional comments

  • 18 Responses