Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC19"

From EGIWiki
Jump to navigation Jump to search
(133 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}  
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}  
<br>


{{Ops_procedures
{{Ops_procedures
|Doc_title = Introducing new stacks and middleware in EGI Production Infrastructure
|Doc_title = Integration of new cloud management framework and grid middleware in EGI Production Infrastructure
|Doc_link = [[PROC09|https://wiki.egi.eu/wiki/PROC19]]
|Doc_link = [[PROC19|https://wiki.egi.eu/wiki/PROC19]]
|Version =  
|Version =  
|Policy_acronym = OMB
|Policy_acronym = OMB
|Policy_name = Operations Management Board
|Policy_name = Operations Management Board
|Contact_group = operations-support@mailman.egi.eu
|Contact_group = operations@egi.eu
|Doc_status =  
|Doc_status = DRAFT
|Approval_date =  
|Approval_date =  
|Procedure_statement = A procedure for the steps to introduce new stack (Cloud platform) or middleware (HTC Platform) in EGI Production Infrastructure.
|Procedure_statement = A procedure for the steps to integrate new cloud management framework (Cloud platform) or grid middleware (Grid Platform) in EGI Production Infrastructure.
|Owner = Alessandro Paolini
}}  
}}  


<br>  
<br>  
<u>'''Under construction'''</u>


= Overview  =
= Overview  =


To assure production quality of EGI Infrastructure every stack (Cloud platform) or middleware (HTC Platform) supported by Production Resource Centres needs to fulfil certain requirements. The goal of this procedure is to assure that EGI Infrastructure is fully supported by operations tools.
To assure production quality of EGI Infrastructure every cloud management framework (Cloud platform) or middleware (Grid Platform) supported by Production Resource Centres needs to fulfil certain requirements. The goal of this procedure is to assure EGI Infrastructure compliance.  


= Definitions  =
= Definitions  =


*'''cloud stack''': software for creating, managing, and deploying infrastructure cloud services.
Types of Technology Products:
*'''grid middleware''': software which allows the users to execute jobs in grid infrastructure.  
 
*'''cloud management framework''': software for creating, managing, and deploying infrastructure cloud services.  
*'''grid middleware''': software which allows the users to execute jobs in grid infrastructure.


<br> Please refer to the [[Glossary|EGI Glossary]] for the definitions of the terms used in this procedure.<br>


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.


Please refer to the [[Glossary|EGI Glossary]] for the definitions of the terms used in this procedure.<br>
= Entities involved in the procedure =


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
*'''Technology Provider (TP)''':&nbsp;person representing or leading Technology Provider team
*'''EGI Operations''' '''(EGIOps)'''
*'''Operations Centre (OC)'''
*'''Resource Centre (RC)'''
*'''[[Operations Management Board|Operations Management Board]]''': EGI operations policy board


= Entities involved in the procedure =
= Prerequisites  =
 
Before sending a request:
 
*OC has to have
**the support of TP with effort to integrate with EGI Infrastructure (information system, accounting, monitoring etc), provide support via GGUS and maintain software via UMD
**one or more RC available to deploy the new platform
*TP has to have
**effort to integrate with EGI Infrastructure (information system, accounting, monitoring etc), provide support via GGUS and maintain software via UMD
**the support of one or more OC, with one or more RC available to deploy the new platform and the integration-software developed by the TP
 
= Steps  =
 
== Request submission and validation ==


Please see [[PROC09#Entities_involved_in_the_procedure]]
The request can be send by:


= Prerequisites and responsibilities  =
#Operations Centre
#EGI Operations
#Technology Provider


Please see [[PROC09#Prerequisites_and_responsibilities]]
Resource Centre can also request integration of new cloud management framework or grid middleware. Such request should be first approved by Operations Centre, it belongs to. In such case OC&nbsp;is responsible to create a ticket on behalf of RC. <br>


<br>  
<br>  


= Resource Center status Workflow  =
{| class="wikitable"
|-
! Step
! Action on
! Action
|-
| 1
| Applicant<br>
| Opens a [https://ggus.eu/ GGUS] ticket to Operations to start the process. <pre>Subject: Request for integration of XXX to EGI Production Infrastructure (PROC19)
 
Dear Operations,


Please see [[PROC09#Resource_Center_status_Workflow]]
We would like to request for starting procedure of integrating XXX to EGI Production Infrastructure
https://wiki.egi.eu/wiki/PROC19


== Resource Centre registration  ==
Prerequisite data:
* name of Technology Product:
* Technology Provider (person representing or leading the team) contact details(name, email):
* customers of the Product (eg. user community, Operations Centre):
*&nbsp;motivation:


=== Requirements  ===


Please see [[PROC09#Requirements]]
Best Regards
XXX
</pre>
|-
| 2
| EGIOps
|
Operations contacts the OMB to request the approval of the request.


=== Steps  ===
|}


The following steps are only applicable if '''the Resource Centre is not already registered in GOCDB'''. <br>
== Functional requirements  ==


*Actions tagged '''RC''' are the responsibility of the Resource Centre Operations Manager.
Functional requirements for new product to be integrated:
*Actions tagged '''RP''' are the responsibility of the Resource Infrastructure Operations Manager.  
 
*Actions tagged '''OC''' are the responsibility of the Operations Centre
*support VO concept
*support X.509 certificates
 
== Integration steps  ==
 
Integration covers following areas (where possible steps can be done in parallel):


{| class="wikitable"
{| class="wikitable"
Line 64: Line 114:
! #  
! #  
! Responsible  
! Responsible  
! Action
! Action  
! Additional temporary comments<br>
|-
| 0a
| EGIOps
| When Approved, EGIOps and TP&nbsp;should agree on [https://wiki.egi.eu/wiki/Glossary#Underpinning_Contract Underpinning Agreement (UA)]
| agree on [https://documents.egi.eu/document/2589 Corporate-level Technology Provider Underpinning Agreement] or on a customised version
|- valign="top"
|- valign="top"
| 0
| 0b
| RC
| EGIOps<br>
|  
|  
'''Contact your Resource Infrastructure Operations Manager''' (contact information is available at [http://www.egi.eu/community/resource-providers/ EGI web site]).
Set up an integration Task force for given Technology Product composed of:  


*Provide the required information according to the template available in the [[HOWTO01|Required information]] page.
*Technology Provider representative
*Operations tools representative<br>
*NGI representatives (wanting to deploy Technology Product) with Pilot Site
*EGI Operations representative<br>
*User communities representative (interested in deployment of Technology Product)
*EGI&nbsp;Security team representative
*UMD representative


|- valign="top"
| <br>
| 1
|}
| RP
|
'''Accept or reject registration request''' and communicate this result back to applicant.


*If the Resource Centre is accepted, notify the relevant Operations Centre, handle the Resource Centre information received, and put the Operations Centre in contact with the Resource Centre Operations Manager.
=== Configuration Management ===


|- valign="top"
{| class="wikitable"
| 2
|-
| OC
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
|  
|  
#'''Forward all documentation''':
1a
#*[[HOWTO02|necessary to be read and accept]]
#*documentation how to install and configure the Resource Centre services
#Clarify any doubts or questions.


Include the Operations Centre ROD, CSIRT,&nbsp; or help-desk teams in the step if necessary.
|
GOCDB&nbsp;  


|- valign="top"
| 3
| OC
|  
|  
#'''Add the Resource Centre to the [https://goc.egi.eu/ GOCDB]'''and flag it as "Candidate".
Add new service types agreed within Task Force.<br>
#*Only Regional Management level users (D') can add a site to the NGI and can update the certification status of the site, see [[GOCDB/Input System User Documentation#Roles]]
#Notify the Resource Centre Operations Manager that he/she should [[EGI Operations Start Guide#Joining_operations|Join operations]]
#*In the [https://goc.egi.eu/ GOCDB] he/she should request the 'Resource Centre Operations Manager' role. Approve it when done.
#Notify the Resource Centre Operations Manager that person responsible for security should [[EGI Operations Start Guide#Joining_operations|Join operations]]
#*In the [https://goc.egi.eu/ GOCDB] he/she should request the 'Resource Centre Security Officer' role. Approve it when done.


| <br>
|- valign="top"
|- valign="top"
| 4
| 1b
| RC
| Pilot Site
| Deploy technical service instance and register in GOCDB.
| <br>
|}
 
=== Information System ===
 
{| class="wikitable"
|-
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 2a
| Technology Provider
|  
|  
#'''Complete any missing information for the Resource Centre's entry in the GOCDB'''
Develop software for integration with BDII.  
#*It includs services that are to be integrated into the infrastructure. See [[Fedcloud-tf:WorkGroups:Scenario5#GOCDB|instruction]]
 
#Notify the Operations Centre that the Resource Centre information update is concluded.
<br>
#Note: If the external RC is considering buying host certs, please make sure they source them from an EGI recognised authority. [http://www.eugridpma.org/members/worldmap/ The local national CA] (usually for free or at flat rate) is likely the best source of their SSL certificates.


|- valign="top"
| 5
| OC
|  
|  
'''Check [http://goc.egi.eu/ GOC DB]''' that the Resource Centre's information is correct.  
Analyse the use cases for deciding if the new technology has to be published in the BDII or not, and the relevant set of information to publish.


*Resource Centre (site) roles and any other additional information.
* Must the new technology be published in the BDII?
*Check that contacts receive email (if they are mailing lists, check that outside EGI members are allowed to post there). Site administrator MUST reply to the test email.<br>
** it has to be created the information providers
*Check that the required services for a Resource Centre are properly registered.<br>
* Is it necessary any modification to the Glue Schema for properly publishing the new technology information?
*Check domain names and forward and reverse DNS.
** any modification to the Glue Schema has to be discussed with the Glue Working Group


|- valign="top"
|- valign="top"
| 6
| 2b
| OC
| Pilot Site
|  
| Deploy software for integration with BDII and documentation.  
'''Any other Operations Centre-specific requirements''' (e.g. join a certain VO and/or mailing list, etc.)
| <br>
 
|- valign="top"
|- valign="top"
| 7
| 2c
| OC
| &nbsp; EGI Operations'''<br>'''
|  
| Verify integration
If all previous actions have been completed with success, notify the Resource Centre Operations Manager that the Registration is completed, and contact the Resource Infrastructure Operations Manager to notify that a new candidate Resource Centre exists and is ready to be certified.
| '''Alessandro Paolini, Enol Fernandez, Baptiste Grenier, '''Operations checks documentation
 
|}
|}


After the successful completion of all these steps, the registration phase is completed and the Resource Centre is ready for the start of the <span class="il">certification</span> phase.
=== Monitoring ===


== Resource Centre certification  ==
{| class="wikitable"
|-
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 3a
| Technology Provider
| Develop nagios probe with support from SAM team and documentation.
| [http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/ ARGO Guidelines for monitoring probes]
|- valign="top"
| 3b
| ARGO, EGI Ops
|
Check probe, verify results, add to SAM release.


=== Requirements  ===
Add test to ARGO_MON profile.<br>


#The Resource Centre Certification procedure is only applicable for '''both Resource Centres in "Candidate" or "Suspended"''' status state in GOC DB.<br>
| [[PROC06]] and [[PROC07]]
#A Resource Centre can successfully pass certification only if the conditions required by the [https://documents.egi.eu/document/31 Resource Centre OLA] are met.
|- valign="top"
| 3c
| ARGO, EGI Ops
| Deploy probe in production nagios and documentation.
| Operations checks documentation
|- valign="top"
| 3d
| ARGO, EGI Ops
| if the new technology needs to be monitored by secmon and pakiti, add the related tests in the SEC_MONITOR profile.  
| Operations verify that the security tests are properly executed
|}


=== Steps  ===
=== Operations (ROD) Dashboard ===


The following is a detailed description of the steps required for the transition from the "Candidate"/"Suspended" to the "Certified" state of the Resource Centre.
{| class="wikitable"
|-
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 4
| EGI Ops&nbsp;
| Add test to Operations profile [https://wiki.egi.eu/wiki/PROC06 Setting a Nagios test status to OPERATIONS]
| <br>
|}


*Actions tagged '''RC''' are the responsibility of the Resource Centre Operations Manager.
=== Support ===
*Actions tagged '''RP''' are the responsibility of the Resource Infrastructure Operations Manager.
*Actions tagged '''OC''' are the responsibility of the Operations Centre
*Actions tagged '''CSIRT''' are the responsibility of the Computer Security Incident Response Team


{| class="wikitable"
{| class="wikitable"
Line 159: Line 255:
! #  
! #  
! Responsible  
! Responsible  
! Action
! Action  
|- valign="top"
! Additional temporary comments<br>
| 0
|-
| RP
| 5a
| Technology Provider
|  
|  
The Resource Infrastructure Operations Manager contacts the Resource Centre Operations Manager to request '''the subscription of the [https://documents.egi.eu/public/ShowDocument?docid=31 Resource Centre OLA]'''.
Declare [[FAQ GGUS-QoS-Levels|Quality of Support]] for 3rd level Support Unit (SU) and name of SU
 
[https://wiki.egi.eu/wiki/FAQ_GGUS-New-Support-Unit FAQ GGUS-New-Support-Unit]  


| <br>
|- valign="top"
|- valign="top"
| 1
| 5b
| RC
| GGUS&nbsp;
|  
| &nbsp;Create Support Unit under "Product Teams" category <br>
The Resource Centre Operations Manager notifies the Resource Infrastructure Operations Manager that '''the Resource Centre OLA is accepted''' (if the Resource Centre is has not already endorsed it before for example in case of a suspended Resource Centre), and the Resource Centre is ready to start certification.
| <br>
|}


|- valign="top"
=== Accounting ===
| 2
| RP
|
'''The Resource Infrastructure Operations Manager contacts the Operations Centre asking to start the certification process.'''


|- valign="top"
{| class="wikitable"
| 3
|-
| OC
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 6a
| Technology Provider
|  
|  
If the Resource Centre is in the "Candidate" or "Suspended" state, then '''flag the Resource Centre as "Uncertified".'''
Develop software for integration with APEL


*If it was in the "Suspended" state then check that the reason for suspension has been cleared.
<br>
**If the suspension cause is a security issue, then the EGI CSIRT needs to be contacted to verify that all requested repair operations were successfully applied by the Resource Centre Administrators to fix the issue that caused suspension. See [[SAM#Monitoring_uncertified_sites|instructions]] on how to monitor uncertified RCs.


|- valign="top"
| 4
| OC
|  
|  
'''Check:'''  
'''Define integration and what data should be published.'''  


#'''GOC&nbsp;DB:&nbsp;'''All services are registered in GOCDB according to the requirements of the [https://documents.egi.eu/document/31 Resource Centre OLA], these are published and ALSO that services published in the GOCDB are valid.
* if the new technology is using computing or storage services for which accounting data are already collected, there is no need of new parser/software for integration with APEL
#'''Information system''':&nbsp;Check that the eu.egi.cloud.information.bdii is working, and publishing coherent values'''* Propose to remove it *'''
#*Proposal to eliminate this step since the information system is not production level, yet ''(Peter S. 12 February 2014) '' <br>
#'''Accounting '''
#*Host Certificate DN should be send to APEL-ADMINS@stfc.ac.uk
#'''Monitoring and troubleshooting''' should be possible:
#*the [[OPS vo|OPS VO]] (monitoring) and the [[Dteam vo|DTEAM VO]] (troubleshooting) are configured and supported by the Resource Centre.
#'''OPS, Dteam and regional VOs''' are configured and supported as needed.
#'''Site is integrated in any regional tool as needed '''(for example, the regional accounting infrastructure if present).<br>


|- valign="top"
|- valign="top"
| 5
| 6b
| RC
| APEL&nbsp;
|  
| Validate integration
Fill the ''security survey'' (placeholder, survey not available yet) and forward the required information to the CSIRT.
| Ops support check documentation
|- valign="top"
| 6c
| EGI Accounting Portal&nbsp;
| Display data
| <br>
|}


*The purpose of the survey is to assess that the technology used to provide cloud services fulfills the EGI security policies and procedures.
=== UMD ===
*'''This is an additional step, not yet approved by OMB. Please, ignore until the survey is available''' (Peter S. 12 Feb 2014)


{| class="wikitable"
|-
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 7a
| Technology Provider
| Ensure software developed for the integration of the new Technology Product satisfies [https://wiki.egi.eu/wiki/UMD_Provisioning#Minimal_requirements UMD Minimal Requirements] <br>
Request the inclusion into UMD; see here the [https://wiki.egi.eu/wiki/EGI_Software_Component_Delivery#Initial_activities_-_Joining_UMD_Release_Team information to provide]
| <br>
|- valign="top"
|- valign="top"
| 6
| 7b
| CSIRT
| EGI Ops (UMD representative)
|  
| Technology Provider info is added in [https://wiki.egi.eu/wiki/Technology_Providers TechnologyProviders List] and [https://wiki.egi.eu/wiki/UMD_products_ID_cards UMD Product ID card]
Checks that '''the Resource Centre passes the basic security assessment tests'''<br>  
| <br>
|- valign="top"
| 7c
| EGI Software provisioning Team
| Applies the [https://wiki.egi.eu/wiki/EGI_Software_Provisioning UMD Software Provisioning process] to assess the quality of the new product
| <br>
|- valign="top"
| 7d
| EGI Ops (UMD representative)<br>
| &nbsp;Once confirmed a successful provisioning (step 11c) includes the new product/products into an UMD release and makes it available to the production infrastricture, in the UMD repositories
| <br>
|}


*The security assessment is performed by the the EGI CSIRT.
=== VM image Marketplace ===
*Site administrator should fill in [https://documents.egi.eu/secure/ShowDocument?docid=2114 EGI Federated Cloud Security - Questionnaire for sites deploying cloud technology]
 
This step also apples to certified Resource Centers which introduce cloud resources for the first time.


{| class="wikitable"
|-
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 8a
| Technology Provider
| Implement subscription to VM image lists from EGI MarketPlace and create documentation.
| <br>
|- valign="top"
| 8b
| Pilot Site
| Add service endpoint to GOCDB (type:&nbsp;eu.egi.cloud.vm-metadata.vmcatcher)
| <br>
|- valign="top"
| 8c
| NGI/EGI Ops
| Check eu.egi.cloud.vm-metadata.vmcatcher is passing&nbsp; https://cloudmon.egi.eu/nagios/&nbsp;
| <br>
|- valign="top"
|- valign="top"
| 7
| 8d
| OC
| EGI Cloud VM Image Management SU
|  
| Validate integration
'''If all preliminary tests are passed for 3 consecutive calendar days''', declare an initial maintenance downtime and switch the Resource Centre status to 'Certified'.
| Ops support check documentation
|}


*This ensures that Resource Centre will appear in NAGIOS and GSTAT. '''* Propose to remove it *'''
=== Documentation ===


#*Proposal to eliminate this step since the information system is not production level, yet ''(Peter S. 12 February 2014) '' <br>
{| class="wikitable"
|-
! #  
! Responsible
! Action
! Additional temporary comments<br>
|-
| 9a
| EGI Ops
| Update relevant documentation<br>
| <br>
|- valign="top"
| 9b
| Technology Provider
| Develop documentation for users and admins where missing<br>
| <br>
|- valign="top"
| 9c
| EGI Ops
| Validate Documentation
| <br>
|}


*The target 'Infrastructure' value should be set to 'Production'.
=== Resource Allocation ===


|- valign="top"
{| class="wikitable"
| 8
|-
| OC
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 10<br>
| Resource Allocation
|  
|  
'''The downtime '''should not be closed until the Resource Centre
Add new access method in e-GRANT(if needed)


*appears in all operational tools<br>
Define if the middleware is a new way of accessing resources
**[https://cloudmon.egi.eu/nagios/ Cloud NAGIOS ](NAGIOS)
***And all Nagios tests are passed
**GGUS - the Resource Centre appears in the "Notified Site" field - [https://ggus.eu/ws/ticket_search.php GGUS search]
**[https://grid-monitoring.cern.ch/myegi/ MyEGI]
*accounting data is properly published.<br>


<br> If there are problems with a specific tool, open GGUS tickets to the relevant Support Units.
| '''e-GRANT was dismissed. To evaluate if similar steps are necessary for the AoD service or EGI Marketplace'''
|}


Wait at least two days after the switch to the 'Certified' status to open the ticket, the propagation of the new status to the operational tools or the publication of accounting data may take one or two days.<br>
=== Security ===


{| class="wikitable"
|-
! #
! Responsible
! Action
! Additional temporary comments<br>
|-
| 11a<br>
| Technology Provider
| complete the [https://wiki.egi.eu/wiki/SVG:Software_Security_Checklist EGI SVG Software Security Checklist]
| A brief written response to Chair of SVG (Linda.Cornwall <AT> stfc.ac.uk) is requested
|- valign="top"
|- valign="top"
| 9
| 11b<br>
| OC
| Security team<br>
| '''Notify the Resource Centre Operations Manager that the Resource Centre is certified'''<br> <br>
| Provide recommendations based on provided input<br>  
| <br>
|- valign="top"
|- valign="top"
| 10
| 11c
| OC
| Technology Provider
| Implement recommendations
|  
|  
'''The Operation Center can broadcast '''that a new Resource Centre is now part of the EGI infrastructure.
|- valign="top"
| 11d<br>
| Security Team
| Validate implementation of recommendations
| <br>
|}


This step is OPTIONAL.
=== The Announcement ===


|}
'''EGI Ops''' announces the availability of new product to OMB and includes the announcement in the monthly EGI Broadcast to communicate the availability of the new product to NGIs, VOs, RCs managers
 
<u>After the successful completion of these steps, the Resource Centre is considered as "Certified".</u>


= Revision History  =
= Revision History  =
Line 278: Line 455:
|-
|-
| <br>  
| <br>  
| <br>
| <br>
| <br>
|-
| <br>
| A. Paolini
| 2016-06-03
| Trying to define some rules for integrating the new technology with the information and the accounting system
|-
|
| Alessandro Paolini
| 2016-06-08
| "EGI Operations Support" was decommissioned, changed all the references to "Operations"
|-
|
| Alessandro Paolini
| 2019-01-09
| some minor updates; to decide if keeping the step 9 about "Resource Allocation" or discard it.
|-
|  
|  
| Alessandro Paolini
| 2019-02-04
| step 10a: added the link to the Software Security Checklist
|-
|  
|  
| <br>
| Alessandro Paolini
| 2019-02-19
| moved UMD to step 7; added the link to the page with [https://wiki.egi.eu/wiki/EGI_Software_Component_Delivery#Initial_activities_-_Joining_UMD_Release_Team detailed information] to provide to UMD team
|-
|
| Alessandro Paolini
| 2021-01-08
| added step 3d about security monitoring; updated the link to guidelines for monitoring probes
|}
|}


[[Category:Operations_Procedures]]
[[Category:Operations_Procedures]]

Revision as of 15:57, 27 January 2021

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators




Title Integration of new cloud management framework and grid middleware in EGI Production Infrastructure
Document link https://wiki.egi.eu/wiki/PROC19
Last modified
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group operations@egi.eu
Document Status DRAFT
Approved Date
Procedure Statement A procedure for the steps to integrate new cloud management framework (Cloud platform) or grid middleware (Grid Platform) in EGI Production Infrastructure.
Owner Alessandro Paolini



Overview

To assure production quality of EGI Infrastructure every cloud management framework (Cloud platform) or middleware (Grid Platform) supported by Production Resource Centres needs to fulfil certain requirements. The goal of this procedure is to assure EGI Infrastructure compliance.

Definitions

Types of Technology Products:

  • cloud management framework: software for creating, managing, and deploying infrastructure cloud services.
  • grid middleware: software which allows the users to execute jobs in grid infrastructure.


Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

  • Technology Provider (TP): person representing or leading Technology Provider team
  • EGI Operations (EGIOps)
  • Operations Centre (OC)
  • Resource Centre (RC)
  • Operations Management Board: EGI operations policy board

Prerequisites

Before sending a request:

  • OC has to have
    • the support of TP with effort to integrate with EGI Infrastructure (information system, accounting, monitoring etc), provide support via GGUS and maintain software via UMD
    • one or more RC available to deploy the new platform
  • TP has to have
    • effort to integrate with EGI Infrastructure (information system, accounting, monitoring etc), provide support via GGUS and maintain software via UMD
    • the support of one or more OC, with one or more RC available to deploy the new platform and the integration-software developed by the TP

Steps

Request submission and validation

The request can be send by:

  1. Operations Centre
  2. EGI Operations
  3. Technology Provider

Resource Centre can also request integration of new cloud management framework or grid middleware. Such request should be first approved by Operations Centre, it belongs to. In such case OC is responsible to create a ticket on behalf of RC.


Step Action on Action
1 Applicant
Opens a GGUS ticket to Operations to start the process.
Subject: Request for integration of XXX to EGI Production Infrastructure (PROC19)

Dear Operations,

We would like to request for starting procedure of integrating XXX to EGI Production Infrastructure
https://wiki.egi.eu/wiki/PROC19

Prerequisite data:
* name of Technology Product:
* Technology Provider (person representing or leading the team) contact details(name, email):
* customers of the Product (eg. user community, Operations Centre):
* motivation:


Best Regards
XXX
2 EGIOps

Operations contacts the OMB to request the approval of the request.

Functional requirements

Functional requirements for new product to be integrated:

  • support VO concept
  • support X.509 certificates

Integration steps

Integration covers following areas (where possible steps can be done in parallel):

# Responsible Action Additional temporary comments
0a EGIOps When Approved, EGIOps and TP should agree on Underpinning Agreement (UA) agree on Corporate-level Technology Provider Underpinning Agreement or on a customised version
0b EGIOps

Set up an integration Task force for given Technology Product composed of:

  • Technology Provider representative
  • Operations tools representative
  • NGI representatives (wanting to deploy Technology Product) with Pilot Site
  • EGI Operations representative
  • User communities representative (interested in deployment of Technology Product)
  • EGI Security team representative
  • UMD representative

Configuration Management

# Responsible Action Additional temporary comments

1a

GOCDB 

Add new service types agreed within Task Force.


1b Pilot Site Deploy technical service instance and register in GOCDB.

Information System

# Responsible Action Additional temporary comments
2a Technology Provider

Develop software for integration with BDII.


Analyse the use cases for deciding if the new technology has to be published in the BDII or not, and the relevant set of information to publish.

  • Must the new technology be published in the BDII?
    • it has to be created the information providers
  • Is it necessary any modification to the Glue Schema for properly publishing the new technology information?
    • any modification to the Glue Schema has to be discussed with the Glue Working Group
2b Pilot Site Deploy software for integration with BDII and documentation.
2c   EGI Operations
Verify integration Alessandro Paolini, Enol Fernandez, Baptiste Grenier, Operations checks documentation

Monitoring

# Responsible Action Additional temporary comments
3a Technology Provider Develop nagios probe with support from SAM team and documentation. ARGO Guidelines for monitoring probes
3b ARGO, EGI Ops

Check probe, verify results, add to SAM release.

Add test to ARGO_MON profile.

PROC06 and PROC07
3c ARGO, EGI Ops Deploy probe in production nagios and documentation. Operations checks documentation
3d ARGO, EGI Ops if the new technology needs to be monitored by secmon and pakiti, add the related tests in the SEC_MONITOR profile. Operations verify that the security tests are properly executed

Operations (ROD) Dashboard

# Responsible Action Additional temporary comments
4 EGI Ops  Add test to Operations profile Setting a Nagios test status to OPERATIONS

Support

# Responsible Action Additional temporary comments
5a Technology Provider

Declare Quality of Support for 3rd level Support Unit (SU) and name of SU

FAQ GGUS-New-Support-Unit


5b GGUS   Create Support Unit under "Product Teams" category

Accounting

# Responsible Action Additional temporary comments
6a Technology Provider

Develop software for integration with APEL


Define integration and what data should be published.

  • if the new technology is using computing or storage services for which accounting data are already collected, there is no need of new parser/software for integration with APEL
6b APEL  Validate integration Ops support check documentation
6c EGI Accounting Portal  Display data

UMD

# Responsible Action Additional temporary comments
7a Technology Provider Ensure software developed for the integration of the new Technology Product satisfies UMD Minimal Requirements

Request the inclusion into UMD; see here the information to provide


7b EGI Ops (UMD representative) Technology Provider info is added in TechnologyProviders List and UMD Product ID card
7c EGI Software provisioning Team Applies the UMD Software Provisioning process to assess the quality of the new product
7d EGI Ops (UMD representative)
 Once confirmed a successful provisioning (step 11c) includes the new product/products into an UMD release and makes it available to the production infrastricture, in the UMD repositories

VM image Marketplace

# Responsible Action Additional temporary comments
8a Technology Provider Implement subscription to VM image lists from EGI MarketPlace and create documentation.
8b Pilot Site Add service endpoint to GOCDB (type: eu.egi.cloud.vm-metadata.vmcatcher)
8c NGI/EGI Ops Check eu.egi.cloud.vm-metadata.vmcatcher is passing  https://cloudmon.egi.eu/nagios/ 
8d EGI Cloud VM Image Management SU Validate integration Ops support check documentation

Documentation

# Responsible Action Additional temporary comments
9a EGI Ops Update relevant documentation

9b Technology Provider Develop documentation for users and admins where missing

9c EGI Ops Validate Documentation

Resource Allocation

# Responsible Action Additional temporary comments
10
Resource Allocation

Add new access method in e-GRANT(if needed)

Define if the middleware is a new way of accessing resources

e-GRANT was dismissed. To evaluate if similar steps are necessary for the AoD service or EGI Marketplace

Security

# Responsible Action Additional temporary comments
11a
Technology Provider complete the EGI SVG Software Security Checklist A brief written response to Chair of SVG (Linda.Cornwall <AT> stfc.ac.uk) is requested
11b
Security team
Provide recommendations based on provided input

11c Technology Provider Implement recommendations
11d
Security Team Validate implementation of recommendations

The Announcement

EGI Ops announces the availability of new product to OMB and includes the announcement in the monthly EGI Broadcast to communicate the availability of the new product to NGIs, VOs, RCs managers

Revision History

Version Authors Date Comments





A. Paolini 2016-06-03 Trying to define some rules for integrating the new technology with the information and the accounting system
Alessandro Paolini 2016-06-08 "EGI Operations Support" was decommissioned, changed all the references to "Operations"
Alessandro Paolini 2019-01-09 some minor updates; to decide if keeping the step 9 about "Resource Allocation" or discard it.
Alessandro Paolini 2019-02-04 step 10a: added the link to the Software Security Checklist
Alessandro Paolini 2019-02-19 moved UMD to step 7; added the link to the page with detailed information to provide to UMD team
Alessandro Paolini 2021-01-08 added step 3d about security monitoring; updated the link to guidelines for monitoring probes