Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Resource Allocation Task Force brainstorming page"

From EGIWiki
Jump to navigation Jump to search
Line 38: Line 38:
<br>  
<br>  


= Resource allocation components =
= Resource allocation high level process =


[[Image:RA processes.png|600px|RA processes.png]]  
[[Image:RA processes.png|600px|RA processes.png]]  
Line 198: Line 198:


*'''Pros''': gets single point of contact and support from one entity  
*'''Pros''': gets single point of contact and support from one entity  
*'''Cons''':&nbsp;if Broker asks Providers to express their interest to support Customer when request arrives, there is a <span lang="en" id="result_box" class="short_text"> <span class="hps">risk of opportunistic</span> <span class="hps">model  
*'''Cons''':&nbsp;if Broker asks Providers to express their interest to support Customer when request arrives, there is a <span lang="en" class="short_text" id="result_box"> <span class="hps">risk of opportunistic</span> <span class="hps">model  
</span></span>
</span></span>
*'''Pros:'''&nbsp;can have cumulative SLA with Broker and not have to negotiate SLA for a single Provider<br>
*'''Pros:'''&nbsp;can have cumulative SLA with Broker and not have to negotiate SLA for a single Provider<br>


=== Sutable to: ===
=== Sutable to: ===


*"Big" requests  
*"Big" requests  
Line 232: Line 235:


*'''Cons: '''We can end up with lots of different SLA and negotiation procedures  
*'''Cons: '''We can end up with lots of different SLA and negotiation procedures  
*'''Cons''': Has n<span lang="en" class="short_text" id="result_box"><span class="hps alt-edited">o influence on the</span> <span class="hps">RA&nbsp;process</span></span>  
*'''Cons''': Has n<span lang="en" id="result_box" class="short_text"><span class="hps alt-edited">o influence on the</span> <span class="hps">RA&nbsp;process</span></span>  
*'''<span lang="en" class="short_text"><span class="hps">Pros:</span></span>'''<span lang="en" class="short_text"><span class="hps"> </span></span><span lang="en" class="short_text" id="result_box"><span class="hps">Does not require a significant</span> <span class="hps">commitment</span></span><span lang="en" class="short_text" id="result_box"><span class="hps"> of </span></span><span lang="en" class="short_text" id="result_box"><span class="hps">Supporter in the process
*'''<span lang="en" class="short_text"><span class="hps">Pros:</span></span>'''<span lang="en" class="short_text"><span class="hps"> </span></span><span lang="en" id="result_box" class="short_text"><span class="hps">Does not require a significant</span> <span class="hps">commitment</span></span><span lang="en" id="result_box" class="short_text"><span class="hps"> of </span></span><span lang="en" id="result_box" class="short_text"><span class="hps">Supporter in the process
</span></span><br>
</span></span>
 
<br>  


'''Provider'''  
'''Provider'''  

Revision as of 16:53, 25 January 2013

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security




Terminology

Service Level Agreement (SLA) - An agreement between a Service Provider and a customer/client. The SLA describes the IT Service, documents service level targets and specifies the responsibilities of the Service Provider and the customer/client.. A single SLA may cover multiple IT Services or multiple customers/clients.

Operational Level Agreement (OLA) - An agreement between an IT Service Provider and another part of the same organisation. An OLA supports the IT Service Provider’s delivery of IT Services to Users. The OLA defines the goods or Services to be provided and the responsibilities of both parties.

Resource Allocation - 


Please refer to the EGI Glossary for the definitions of the terms used.

Introduction

This page is a brainstorming space for Resource Allocation Task Force. It contains ideas and issues for discussion.

Goal

The main goals are:

  • Attract new users and allocate them some grid resources
  • Provide users with a simplified procedure to find grid resources for their needs and by means of a central unique reference point
  • Foster a virtuous cycle among ‘new’ scientific communities, EGI, NGIs, Resource Centers and funding agencies in order to attract new funding to strengthen and expand the European grid infrastructure according to the needs of those scientific communities
  • Demonstrate to the national funding agencies, EC (and everybody) that the EGI infrastructure and services are valuable, useful and used by a vast variety of different users


The high level concept

RA overview.png

EGI is "a place" where Customer and Resource Provider meet and negotiate resource allocation.


Resource allocation high level process

RA processes.png


Roles and activities

Customer

  • sends a request for resources
  • uses the resources
  • is involved in negotiation process to establish SLAs

(VOs entitled to use resources should be registered and in production status in Operations Portal)

Who:

  • National or international VO
  • New or existing VO

Resource Provider

  • provides resources to the Customer
  • is involved (directly or indirectly) in negotiation process to establish OLAs

(Resource Centres offering resources must be registered in GOCDB as production entities, this ensure that all security policies will be enforced. )

Who:

  • NGIs that own resources or can directly decide on resource allocation policies for a pool of resources of their sites
  • Resource Centres (can be a RP even if NGI is not providing coordination of resource allocation nationally)
  • User communities (can be RPs for other user communities - see the WeNMR case)

Supporter

  • provides tools for RA process
  • supports Customers and RPs in the negotiation and in defining resource requirements 

Who:

  • EGI.eu for international and/or national VOs
  • NGIs for national VOs

Supervisor

  • provides periodic monitoring and reporting of usage and service quality
  • enforces OLAs and SLAs
  • capacity management

Who:

  • EGI.eu for international and/or national VOs
  • NGIs for national VOs

Broker

  • collects and manages Customers demand
  • collects and manages RP offer
  • negotiates SLAs and OLAs
  • selects RPs to satisfy demand

Who (depending on the model):

  • EGI.eu for international VOs
  • NGIs that can coordinate resources allocation with their sites
  • In the Open market and Freedom of choice models - Customers and/or RPs  

Models

3 models were introduced to examine best option for resource allocation process in EGI project.

Different models may be applied depending on

  • types of Customer
    • Active or New VO
    • International or national VO
  • type of request
    • "Big" or "Small"
  • type of NGIs' role in the allocation process.

Model 1 - Open market

RA Model1.png

RP’ resources and Customers’ requests are published on an open market (shopping website like for example ebay) – are visible for everyone.

Supervisor regulates and supervises the market. It determines the rules (e.g. tools, procedures, policies) of the market and acts as a referee in case of agreement violation/negotiation.

Supervior

  • Pros: Knows what is going on in the infrastructure

RP

  • Pros: There are clear rules for resource allocation
  • Pros: In case of agreement violation/negotiations Supervisor is a referee
  • Pros: Knows what are the Customers' current needs
  • Cons: Negotiation process cannot be adapted to specific needs
  • Cons: Lack of external broker to help match needs to resources
  • Cons: Need to sign mutliple SLAs with different Customers.

Customer

  • Pros: There are clear rules for resource allocation
  • Pros: In case of agreement violation/negotiations Supervisor is a referee
  • Pros: Knows which resources are now available
  • Cons: Lack of external broker to help match needs to resources
  • Cons: Need to sign mutliple SLAs with different Providers.

Sutable to:

  • "Small" requests
  • Active Customers
  • National Customer

Roles

  • EGI
    • for international Customers is a Supervisor
    • for national Customers may be a Supporter
  • NGI
    • for international Customers can be a Broker (Model 2) or Supporter (Model 3) for its RC
    • for national Customers can be Supervisor for its RC
  • RC
    • provides resources

Model 2 - Broker

RA Model2.png

Providers’ resources and Customers’ requests are visible to Broker.

Broker matches Providers’ resources to Customers’ requests needs.

Interaction Customer <-> Provider limited.

Broker

  • Cons: most time consuming for Broker
  • Cons: Broker has to get and manage the knowledge what is available in the infrastructure (each site) 

Provider

  • Pros: first round of negotiation is passed to Broker
  • Pros:

Customer

  • Pros: gets single point of contact and support from one entity
  • Cons: if Broker asks Providers to express their interest to support Customer when request arrives, there is a risk of opportunistic model


  • Pros: can have cumulative SLA with Broker and not have to negotiate SLA for a single Provider

Sutable to:

  • "Big" requests
  • Generic and specialised requests
  • Active and New Customers

Activities

  • EGI
    • for international VOs is a Broker
    • for national VOs is a Supporter/Supervisor
  • NGI
    • for international VOs can be a Supporter (Model 3) for its RC
    • for national VOs can be Broker for its RC
  • RC
    • provides resources

Model 3 - Freedom of choice

RA Model3.png

Providers’ resources and Customers’ requests transparent - visible for everyone.

The parties have freedom to decide how they want to negotiate resources and under which conditions (no regulation on Supporter side).

Supporter role is to support Providers and/or Customers on demand. E.g. providing tool for resource allocation, helping in request fulfilment, specialised requests etc.

Supporter

  • Cons: We can end up with lots of different SLA and negotiation procedures
  • Cons: Has no influence on the RA process
  • Pros: Does not require a significant commitment of Supporter in the process


Provider

  • Pros: Freedom to apply own RA procedures
  • Cons: Has to define own RA processes
  • Cons: Need to sign mutliple SLAs with different Customers.

Customer

  • Cons: We can end up with lots of different SLA and negotiation procedures
  • Cons: Has to perform Broker and Supervisor activities with each of the Providers separately
  • Cons: Need to sign mutliple SLAs with different Providers.

Sutable to:

  • National Customer

Activities

  • EGI
    • for international and regional VOs is a Supporter
  • NGI
    • for regional VOs can be Supporter 
  • RC
    • provide resources

Issues for discussion

The roles of NGI, EGI, RC in the process

Requirement:

  • Provide a single broker contact point to the user to hide the complexity of a network of heterogeneous RPs and to establish a single Service Level Agreement, instead of multiple ones with multiple RPs.

Scientific request review

Requirement:

  • a scientific review may not always needed for example when (1) the resource requirements are small, (2) the user community/project is officially supported by one or more NGIs

Service Level Management

Requirement:

  • A single entity (e.g. EGI) must be responsible of ensuring the enforcement of OLAs and SLAs.
  • EGI should be responsible of all activities around service level management for the federated pool to offload VOs and RPs.
  • Simple and standardized SLA. Additional requirements should be allowed where needed by the VO. Many service levels should be made optional, as we do now in the VO ID card.


Open questions:

  • who should be the party of the contract
  • what should be included in SLA


Technical request review

  • needed to be sure that what we promise can be delivered to the Customer

Requirement:

  • A technical review of demand is needed complementing the initial scientific peer review (where applicable). RPs must be involved in this, to technically inspect VO requirements to understand the impact on the resource configurations (cluster, storage set-up, network etc.), to understand if the demand complies to the local policies, and to acknowledge that resource allocation is possible for them.

The acknowledgement for resources use

  • VT_Scientific_Publications_Repository
  • the recommendation about citing EGI in publications is defined here: https://documents.egi.eu/document/1369the implementation requires a change in the Grid AUP; the issue has been raised with the Security Coordination Group and David Kelsey is checking with the big customer (WLCG) if/how they are gonna accept it before making the change... work in progress.

Requirement:

  • VOs must periodically acknowledge usage of resources through scientific publication, press releases. The entities to be acknowledged are EGI, the NGIs and the individual RPs contributing resources. See the wenmr best-practice and the recommendations for a scientific publication repository.

Resources allocations

Requirement:

  • coexistence of dynamic pool (bigger requests) and small static pool for smaller VOs
  • The resource allocation process should not only allow to address a resource request with an offer, but also vice versa a site with free resources to advertise it.
  • The allocation processes involving multiple RPs and the liaison with other RPs should be made transparent to one RP. EGI is responsible of ensuring that the federated RPs collectively deliver the service requested. One RP is not responsible of or not involved with the services provided by other RPs. The provisioning of resource to a federated pool should be with zero overhead to the RP

Monitoring of usage, Reporting and evaluation

Requirement:

  • An external entity (e.g. EGI.eu) is responsible of reporting on usage of the federated resource pool (rather than the VO itself). Enforcement of SLAs and OLAs does not involve VOs (it is devolved to EGI.eu).
  • Efficient usage is a requirement for extension of the grant.

Customer support

Requirement:

  • The broker (EGI.eu) should support the customer to express technical requirements, where needed.

Quality of Service

Requirement

Different levels of Quality of Service could be offered by the federated pool. For example: Level I. Best effort allocation without minimum number of slots allocated (opportunistic usage). Level II. Minimum number of slots allocated at high priority, jobs arriving to that site will enter in execution at the earliest time the cluster occupation permits.

Meeting specific needs of the VO in terms of OS, configuration, software etc.

Requirement:

  • The same processes that are being defined for resource allocation, could be generalized to request services, i.e. to allow new user communities to request services to be provided by NGIs, such as application porting.

Regional VOs

Requirement:

  • Should be up to NGIs to help the VO and allocate resources within NGI
  • EGI provides single point of contact for resource allocation for Customers
  • It is desirable that the same interfaces and tools offered to support international users can also be used to support national user communities. In this case the supervisioning role of EGI.eu (Model 2) and all related service level management functions should be delegated to the NGI.

International VOs

EGI fully support international VOs

Requirements

EGI

  1. EGI is responsible of negotiating with the RPs the requirements collected from VOs. E.g. it is the body who investigates which RPs could be willing to implement the needed server or software setup or the required storage system etc... and a single point of contact for new users (babysitting)
  2. EGI should be responsible of all activities around service level management for the federated pool to offload VOs and RPs.
  3. All EGI security policies are enforced by the RPs contributing resources to the federated pool (e.g. individual RCs must be registered as production sites in GOCDB).
  4. Processes for collection of demand, offer and negotiation of SLAs and OLAs should be automated as much as possible to ensure the service scales.

Resource Provider

  1. Elastic resource provisioning: the federated pool does not need to be defined statically. The pool should include a statically defined minimum set of resources, and additional resources could be contributed on-demand by partners so that resources can be contributed or withdraw in a short time scale depending on which VOs submitted a request and on the short-term resource occupancy of a cluster. By doing so EGI has a chance of attracting more resources.
  2. Scientific review: a scientific review may not always needed for example when (1) the resource requirements are small and do not justify the overhead of a review (threshold can be defined), (2) when resources are requested to kickoff the activities on a new VO (3) the user community/project is already supported by one or more NGIs
  3. Technical review: A technical review of demand is needed to complement the initial scientific peer review (where applicable) and decide which RPs can meet the technical requirements. RPs need to directly interact with a VO representative for this. RPs must be involved in this, to technically inspect VO requirements to understand the impact on the resource configurations (cluster, storage set-up, network etc.), to understand if the demand complies to the local policies, and to acknowledge that resource allocation is possible for them.
  4. Match demand and offer. The resource allocation process should not only allow to address a resource request with an offer, but also vice versa a site with free resources to advertise it.
  5. Quality of Service. Different levels of Quality of Service could be offered by the federated pool (the type and number of levels should be discussed). Two opposite examples of levels are: (1) Best effort allocation without minimum number of slots allocated (opportunistic usage), (2) Minimum number of slots allocated at high priority, jobs arriving to that site will enter in execution at the earliest time the cluster occupation permits.
  6. Acknowledgement of usage. VOs must periodically acknowledge usage of resources through scientific publication, press releases. The entities to be acknowledged are EGI, the NGIs and the individual RPs contributing resources. See the wenmr best-practice and the recommendations for a scientific publication repository.
  7. Efficient usage of resources. Resources allocated must be used by VOs. If usage is insufficient or not acknowledged properly, the agreement can be terminated.
  8. A single entity (e.g. EGI) must be responsible of ensuring the enforcement of OLAs and SLAs.
  9. The allocation processes involving multiple RPs and the liaison with other RPs should be made transparent to one RP. EGI is responsible of ensuring that the federated RPs collectively deliver the service requested. One RP is not responsible of or not involved with the services provided by other RPs. The provisioning of resource to a federated pool should be with minimum overhead to the RP.
  10. Lightweight OLA negotiation: RPs (typically sites) should be relieved from the burden of negotiating OLAs for each VO request. The NGI could function as a proxy and sign the OLA on behalf of the individual sites.
  11. It is desirable that the same interfaces and tools offered to support international users can also be used to support national user communities. In this case the supervisioning role of EGI.eu (Model 2) and all related service level management functions should be delegated to the NGI.

Customer

  1. Provide a single broker contact point to the user to hide the complexity of a network of heterogeneous RPs and to establish a single Service Level Agreement, instead of multiple ones with multiple RPs.
  2. The federated pool of resources should formally support opportunistic usage of resources (opportunistic is intended here as the minimum level of service that can be offered by a RP). This is needed to provide a formal level of engagement between VOs and RPs (which does not exist currently).
  3. Support multiple levels of services in one resource demand: users should be provided with the flexibility of requiring a minimum guaranteed amount of resources complemented by an additional amount of resources to be allocated elastically (for example opportunistically).
  4. Simple and standardized SLA. Additional requirements should be allowed where needed by the VO. Many service levels should be made optional, as we do now in the VO ID card. The broker (EGI.eu for international VOs, NGIs for national VOs) should support the customer to express technical requirements, where needed.
  5. An external entity (e.g. EGI.eu or the NGI) is responsible of reporting on usage of the federated resource pool (rather than the VO itself). Enforcement of SLAs and OLAs does not involve VOs (it is devolved to EGI.eu).
  6. Usage of a federated distributed pool comes with minimum overhead for the user community.
  7. Efficient usage and acknowledgement of usage are two requirements for extension of the grant.

Note. The same processes that are being defined for resource allocation, could be generalized to request services, i.e. to allow new user communities to request services to be provided by NGIs, such as application porting.


References

NGI proposals