Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

GOCDB/Release4/Development/MultipleEndpointsPerService

From EGIWiki
< GOCDB‎ | Release4‎ | Development
Revision as of 18:07, 27 November 2013 by Davidm (talk | contribs)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


GOC DB menu: Home Documentation Index


<< Back to GOCDB/Release4/Development

Multiple Endpoints

Introduction

Define multiple Endpoints per Service. Original requirements:

GOCDB could be extended so that a single service could define multiple Endpoint objects to allow:

  • Multiple GRIS endpoint URLs per service. This will allow the Top-BDII to directly retrieve information about the endpoint.
  • Separating tape from disk endpoints in a single SRM service. This is required to put selected endpoints of the service into downtime.


To address these requirements, we propose adding support for multiple <ENDPOINT> objects/elements per Service. Each Endpoint can define a different URL, for example, one Endpoint could define the actual service URL, a second could define the GRIS ldap URL, whilst a third could define an admin portal URL and so on. The Endpoint.InterfaceName (taken from GLUE2) would be used to distinguish between different endpoint types. Some future provisioning has already been included in the GOCDBv5 database schema so that a service can define multiple endpoints as described at https://wiki.egi.eu/wiki/GOCDB/v5. However, to enable this feature requires further work in the web portal and PI which currently restricts a service to a single endpoint.

Implementation

  • Add multiple <ENDPOINT> objects per service (based on the GLUE2 Endpoint entity).
  • When creating a downtime, an extra step would be required to select which Endpoints from the selected services should be put into downtime.
  • The GOCDB-PI output would be extended for nesting multiple child <ENDPOINT> elements (see below).
  • Each <ENDPOINT> nests child <URL> and <InterfaceName> elements (and other GLUE2 attributes).
  • GOCDB would centrally manage an enumeration of <InterfaceName>s. Each type of InterfaceName would be documented similar to service types.

PI Changes

  • Changes to PI methods and output are detailed below.

get_service_endpoint

 <?xml version="1.0" encoding="UTF-8"?>
 <results>
   <SERVICE PRIMARY_KEY="50257G0">  <!-- RENAME. <SERVICE_ENDPOINT> element to <SERVICE> -->
      <PRIMARY_KEY>50257G0</PRIMARY_KEY>
      <HOSTNAME>dgiref-globus.fzk.de</HOSTNAME>
      <GOCDB_PORTAL_URL>https://goc.egi.eu/portal/....</GOCDB_PORTAL_URL>
      <HOST_OS>SL5</HOST_OS>
      <BETA>N</BETA>
      <SERVICE_TYPE>SRM</SERVICE_TYPE>
      <CORE></CORE>
      <IN_PRODUCTION>Y</IN_PRODUCTION>
      <NODE_MONITORED>Y</NODE_MONITORED>
      <SITENAME>SomeSITE</SITENAME>
      <COUNTRY_NAME>SomeLand</COUNTRY_NAME>
      <COUNTRY_CODE>XX</COUNTRY_CODE>
      <ROC_NAME>NGI_XX</ROC_NAME>
      <URL>https://some.serviceurl.eu:8443/services/se<URL/> <!-- Maintain existing SE URL element for SAM/OpsPortal notifications -->
      <ENDPOINTS>   <!-- NEW. <ENDPOINTS> element wraps zero or more new <ENDPOINT> elements per Service -->
        <ENDPOINT>
           <URL>ldap://sBDII.grid.openu.ac.il:2170/mds-vo-name=LCG-IL-OU,o=grid</URL>
           <InterfaceName>RIS</InterfaceName>
           <!-- Add new static/GLUE2 attributes here -->  
        </ENDPOINT>
         <ENDPOINT>
           <URL>some.srm.nearline.url</URL>
           <InterfaceName>SRM.nearline</InterfaceName>
        </ENDPOINT>
        <ENDPOINT>
           <URL>some.srm.online.url</URL>
           <InterfaceName>SRM.online</InterfaceName>
        </ENDPOINT>      
      <ENDPOINTS>
 </SERVICE> 

 </results>

get_downtime

<?xml version="1.0" encoding="UTF-8"?>
<results>
<DOWNTIME ID="1" PRIMARY_KEY="10G0" CLASSIFICATION="UNSCHEDULED">
  <PRIMARY_KEY>10G0</PRIMARY_KEY>
  <HOSTNAME>somewhere.dl.ac.uk</HOSTNAME>
  <SERVICE_TYPE>SE2</SERVICE_TYPE>
  <SERVICE>service.dl.ac.ukSE2</SERVICE>  <!-- RENAME. <ENDPOINT> to <SERVICE> -->
  <HOSTED_BY>TestSite</HOSTED_BY>
  <GOCDB_PORTAL_URL>https://goc.egi.eu/portal/index.php?Page_Type=Downtimeampid=1</GOCDB_PORTAL_URL>

  <ENDPOINTS>   <!-- NEW. All SEs affected by DT (*may not* inc. all endpoints of a service) -->
    <ENDPOINT>
         <URL>some.srm.nearline.url</URL>
         <InterfaceName>SRM.nearline</InterfaceName>
      </ENDPOINT>
      <ENDPOINT>
         <URL>some.srm.online.url</URL>
         <InterfaceName>SRM.online</InterfaceName>
      </ENDPOINT>   
  </ENDPOINTS>

  <SEVERITY>OUTAGE</SEVERITY>
  <DESCRIPTION>sample</DESCRIPTION>
  <INSERT_DATE>1384526939</INSERT_DATE>
  <START_DATE>1384531200</START_DATE>
  <END_DATE>1384621200</END_DATE>
  <FORMATED_START_DATE>2013-11-15 16:00</FORMATED_START_DATE>
  <FORMATED_END_DATE>2013-11-16 17:00</FORMATED_END_DATE>
</DOWNTIME>
</results>

get_downtime_to_broadcast

<?xml version="1.0"?>
<results>
 <DOWNTIME ID="57205437" PRIMARY_KEY="14101G0" CLASSIFICATION="SCHEDULED">
  <PRIMARY_KEY>14101G0</PRIMARY_KEY>
  <SITENAME>wuppertalprod</SITENAME>
  <HOSTNAME/>
  <SERVICE_TYPE/>
  <HOSTED_BY>TestSite</HOSTED_BY>
  <SEVERITY>OUTAGE</SEVERITY>
  <DESCRIPTION>dCache upgrade</DESCRIPTION>
  <GOCDB_PORTAL_URL>https://goc.egi.eu/portal/index.php?Page_Type=Downtimeampid=1</GOCDB_PORTAL_URL>

  <ENDPOINTS>   <!-- NEW. All SEs affected by DT (*may not* inc. all endpoints of a service) -->
    <ENDPOINT>
         <URL>some.srm.nearline.url</URL>
         <InterfaceName>SRM.nearline</InterfaceName>
      </ENDPOINT>
      <ENDPOINT>
         <URL>some.srm.online.url</URL>
         <InterfaceName>SRM.online</InterfaceName>
      </ENDPOINT>   
  </ENDPOINTS>

  <INSERT_DATE>1263908942</INSERT_DATE>
  <START_DATE>1264154400</START_DATE>
  <END_DATE>1264158000</END_DATE>
  <REMINDER_START_DOWNTIME>3155760000</REMINDER_START_DOWNTIME>
  <BROADCASTING_START_DOWNTIME/>
 </DOWNTIME>
</results>

get_downtime_nested_se (comming soon)

<?xml version="1.0" encoding="UTF-8"?>
<results>
    <DOWNTIME ID="1" PRIMARY_KEY="10G0" CLASSIFICATION="UNSCHEDULED">
        <SEVERITY>OUTAGE</SEVERITY>
        <DESCRIPTION>sample</DESCRIPTION>
        <INSERT_DATE>1384526939</INSERT_DATE>
        <START_DATE>1384531200</START_DATE>
        <END_DATE>1384621200</END_DATE>
        <FORMATED_START_DATE>2013-11-15 16:00</FORMATED_START_DATE>
        <FORMATED_END_DATE>2013-11-16 17:00</FORMATED_END_DATE>
        <GOCDB_PORTAL_URL>https://goc.egi.eu/portal/index.php?Page_Type=DowntimeAMPid=1</GOCDB_PORTAL_URL>
        <SERVICES>
            <SERVICE>
                <PRIMARY_KEY>1</PRIMARY_KEY>
                <HOSTNAME>somehost.dl.ac.uk</HOSTNAME>
                <SERVICE_TYPE>SE2</SERVICE_TYPE>
                <HOSTED_BY>TestSite</HOSTED_BY>
                <!-- 2 SEs affected by DT (*may not* be all endpoints of service) -->
                <AFFECTED_ENDPOINTS> 
                     <ENDPOINT>
                       <URL>some.srm.nearline.url</URL>
                       <InterfaceName>SRM.nearline</InterfaceName>
                     </ENDPOINT>
                     <ENDPOINT>
                       <URL>some.srm.online.url</URL>
                       <InterfaceName>SRM.online</InterfaceName>
                     </ENDPOINT>
                  </AFFECTED_ENDPOINT>   
                </ENDPOINTS>
            </SERVICE>
            <SERVICE>
                <PRIMARY_KEY>2</PRIMARY_KEY>
                <HOSTNAME>somehost2.dl.ac.uk</HOSTNAME>
                <SERVICE_TYPE>SE2</SERVICE_TYPE>
                <HOSTED_BY>TestSite2</HOSTED_BY>
                <!-- 1 SE affected by DT (*may not* be all endpoints of a service) -->
                <AFFECTED_ENDPOINTS>   
                     <ENDPOINT>
                       <URL>some.srm.nearline.url</URL>
                       <InterfaceName>SRM.nearline</InterfaceName>
                     </ENDPOINT>
                  </AFFECTED_ENDPOINT>   
                </ENDPOINTS>
            </SERVICE>
        </SERVICES>
    </DOWNTIME>
</results>

get_service_group

<?xml version="1.0" encoding="UTF-8"?>
<results>
<SERVICE_GROUP PRIMARY_KEY="57654G0">  
   <NAME>OPSTOOLS</NAME>
   <DESCRIPTION>All EGI Operational Tools</DESCRIPTION>
   <MONITORED>Y</MONITORED>
   <CONTACT_EMAIL>gocdb-admins@mailtalk.ac.uk</CONTACT_EMAIL>
   <GOCDB_PORTAL_URL> https://elided </GOCDB_PORTAL_URL>
   <SERVICE>                                 <!-- RENAME. <SERVICE_ENDPOINT> to <SERVICE> -->
      <HOSTNAME>goc.egi.eu</HOSTNAME>
      <GOCDB_PORTAL_URL>https://elided</GOCDB_PORTAL_URL>
      <SERVICE_TYPE>egi.GOCDB</SERVICE_TYPE>
      <HOST_IP/>
      <HOSTDN>/C=UK/O=eScience/OU=CLRC/L=RAL/CN=goc.egi.eu</HOSTDN>
      <IN_PRODUCTION>Y</IN_PRODUCTION>
      <NODE_MONITORED>Y</NODE_MONITORED>
      <ENDPOINTS>                           <!-- NEW. <ENDPOINTS> wraps service's <ENDPOINT>s-->
        <ENDPOINT>
           <URL>some url </URL>
           <InterfaceName>RIS</InterfaceName>
        </ENDPOINT>
         <ENDPOINT>
           <URL>some endpoint url</URL>
           <InterfaceName>eg.SRM.nearline</InterfaceName>
        </ENDPOINT>
        <ENDPOINT>
           <URL>some endpoint url</URL>
           <InterfaceName>eg.SRM.online</InterfaceName>
        </ENDPOINT>      
      <ENDPOINTS>
   </SERVICE>   

   <SERVICE>   
    ...elided...
   </SERVICE>   

</SERVICE_GROUP>
</results>

Impact on other tools

  • Tools such as SAM and Ops portal would also need to support multiple endpoints, i.e. for monitoring and downtime notification (an implementation in GOCDB alone wouldn’t be much use).
  • We have received a mail from Marian Babik (Wed 09/05/2012 17:09) explaining that SAM needs time to develop support for multiple endpoints before this can be introduced in GOCDB. SAM have asked us to hold off on implementing multiple endpoints until they have a development plan and proposal for how to resolve their issues.

Monitoring

SAM has to be updated to run tests against endpoint and not services. For each service SAM has to read all the endpoints registered in GOCDB and run the suitable tests according to the endpoint type (InterfaceName). It is not clear how we have to compute the service availability. How can we consider a service with, for example, one endpoint running and the other in downtime? Should we consider each endpoint as a service for the availability? Then, a discussion inside EGI.eu is needed to understand how to manage the multiple endpoints in a service for the availability computation.

Ops portal

The new feature has an impact on the downtime notification module. The Ops portal product team stated that this a legacy module and that should be rewritten to be updated. Considering that adding these activities in the product roadmaps is not easy at this stage (mainly for SAM that is involved in the migration of central services), we would like to know the EGI.eu position about that and the priority we should give to this requirement.