Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "NGI DE:Join as resource centre"

From EGIWiki
Jump to navigation Jump to search
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[NGI_DE_CH_Operations_Center|NGI-DE/NGI-CH Operations Center]]
{{ngi-de-header|Join as Resource Centre}}
{{ngi-de-header|Join as Resource Centre}}


Line 4: Line 6:




In German National GRID Initiative (NGI-DE) there are three types of  middleware resources. They are gLite, UNICORE and Globus. To be a NGI-DE site, at least one of the three middlewares has to be supported and the site has to register in GOCDB. To be an UNICORE site, UNICOREX and unicore-gateway of UNICORE6 must be supported. To be the Globus site, globus-GRIDFTP and GRAM5 of Globus version 5 must be supported. The gLite site should have the site-BDII, the Storage Element (SE) or/and Computing Element (CE)  with at least eight Worker Node cores.
In the National GRID Initiative Germany(NGI-DE) there are three types of  middleware resources suppported. They are gLite, UNICORE and Globus. To be a NGI-DE site, at least one of the three middlewares has to be supported and the site has to register in GOCDB. To be an UNICORE site, UNICOREX and unicore-gateway of UNICORE6 must be supported. To be the Globus site, globus-GRIDFTP and GRAM5 of Globus version 5 must be supported. The gLite site should have the site-BDII, the Storage Element (SE) or/and Computing Element (CE)  with at least eight Worker Node cores.


Below you can find a procedure on how to contribute resources to German National GRID Initiative NGI-DE. In case of doubts, please contact us by e-mail: ngi-de-admin@lists.kit.edu
Below you can find a procedure on how to contribute resources to NGI-DE. In case of doubts, please contact us by e-mail: ngi-de-admin@lists.kit.edu


<!-- A small presentation about the procedure can be found here: [http://egee.fzk.de/sa1/afm/Howto_join_EGEE_DECH.ppt ppt].
<!-- A small presentation about the procedure can be found here: [http://egee.fzk.de/sa1/afm/Howto_join_EGEE_DECH.ppt ppt].
Line 16: Line 18:


* willingness to set-up the site
* willingness to set-up the site
* primary Site Manager - the person entitled to represent the Resource Centre in NGI-DE that owns a personal certificate  
* Primary Site Manager - the person entitled to represent the Resource Centre in NGI-DE that owns a personal certificate  


Actions by Primary Resource Managers:
Actions by Primary Site Managers:


* send a digitally signed e-mail to ngi-de-admin@lists.kit.edu that include the following data and statements:  
* send a digitally signed e-mail to ngi-de-admin@lists.kit.edu that include the following data and statements:  
Line 60: Line 62:
* register the site to GOC DB as a candidate site;
* register the site to GOC DB as a candidate site;
* confirm registration to Primary Site Manager.
* confirm registration to Primary Site Manager.
* create dteam group(optional) /dteam/dech/..your_resource_center.
* create dteam group(optional) /dteam/NGI_DE/..your_resource_center.
 
Note for the gLite site: After several hours the site should be visible in NGI_DE national gstat instance http://ngi-de-gstat.gridka.de/gstat/


<!--Note for the gLite site: After several hours the site should be visible in NGI_DE national gstat instance http://ngi-de-gstat.gridka.de/gstat/
-->
<!---
<!---
Note: After several hours the site should be visible in GIIS test zone: http://goc.grid.sinica.edu.tw/gstat/test
Note: After several hours the site should be visible in GIIS test zone: http://goc.grid.sinica.edu.tw/gstat/test
Line 76: Line 78:
* after approval of your role by a regional manager:
* after approval of your role by a regional manager:
* fill all missing information in GOC DB about the site including names of machines. The most critical are: GIISURL and section "Nodes".
* fill all missing information in GOC DB about the site including names of machines. The most critical are: GIISURL and section "Nodes".
* add other site administrators and security officers to GOCDB and assign the appropriate roles to them. There should be a least one 'Security Officer'.
* add other site administrators and security officers to GOCDB and assign the appropriate roles to them. There should be at least one 'Security Officer'.
* create a site admin contact list and a security incident response (CSIRT) list and add to your GOCDB entry. The mailing lists should reach at least two people. They should be willing to react quickly to requests, in particular to security incidences.  
* create a site admin contact list and a security incident response (CSIRT) list and add to your GOCDB entry. The mailing lists should reach at least two people. They should be willing to react quickly to requests, in particular to security incidences.  


Actions by All Admins:
Actions by All Admins:


* registering in dteam VO (for testing purposes) as described at: https://lcg-voms.cern.ch:8443/vo/dteam/vomrs for the group /dteam/dech/..your_resource_center..  
* registering in dteam VO (for testing purposes) as described at: https://lcg-voms.cern.ch:8443/vo/dteam/vomrs for the group /dteam/NGI_DE/..your_resource_center..  


ROC staff Actions:
ROC staff Actions:
Line 88: Line 90:
* switch site to non-certified status
* switch site to non-certified status
* enable monitoring of the site  
* enable monitoring of the site  
 
* Add the new site to the configuration file of the NGI_DE Nagios monitoring test instance (rocmon-fzk.gridka.de). After reconfiguration the UNCERTIFIED sites will appear in (https://rocmon-fzk.gridka.de/nagios/). To access the page you need to register as vo dteam member
Note: After several hours the site should be visible on the NGI Nagios monitoring test instance( https://rocmon-fzk.gridka.de/nagios/ ). To access the page you need to register as vo dteam member


== STEP 3: Installation ==
== STEP 3: Installation ==
Line 95: Line 96:
Site Admins Actions:
Site Admins Actions:


* check which version of middleware is obligatory for production installations: http://glite.web.cern.ch/glite/packages/latestRelease.asp
* check which version of middleware is obligatory for production installations:  
* install the grid middleware according to the documentation on the release pages.
* gLite middleware page:
* a minimum set of services is one computing element (CE), one storage element (SE), eight worker nodes (WNs), as well as a SiteBDII and a monitoring box (MON)
http://glite.web.cern.ch/glite/packages/latestRelease.asp  
* use this topBDII during the certification process: BDII_HOST=bdii-fzk.gridka.de  
* UNICORE middleware page
http://www.unicore.eu/index.php
<!-- Globus middleware page -->
* install the grid middleware according to the documentation on the release pages.  
 
* a minimum set of services is:
* for gLite:
one computing element (CE), and/or one storage element (SE), eight worker nodes (WNs), as well as a SiteBDII and a monitoring box (MON)  
* for UNICORE
UNICOREX and gateway
* for Globus
globus-GRIDFTP and GRAM
* use this topBDII during the certification process: BDII_HOST= bdii-fzk.gridka.de  


Important notes:
Important notes:
 
*consult the Nagios Monitor results to check status of your site; https://rocmon-fzk.gridka.de/nagios admins who has access of the NGI Nagios page can submit service test via the nagios web page.
* consult the GIIS Monitor results to check status of your site; http://gstat.gridops.org/gstat/test
* admins registered in GOC DB (cert loaded into browser required) could submit SAM tests to their sites using: https://cic.gridops.org/samadmin/ -> SAM Admin's page
* find the results of run SAM jobs at https://lcg-sam.cern.ch:8443/sam-uncert/sam.py
 
(or https://lcg-sam.cern.ch:8443/sam-pps-uncert/sam.py for PPS sites)
 
* some more help in testing your installation can be found here: http://goc.grid.sinica.edu.tw/gocwiki/HowToTestYourSite
* some more help in testing your installation can be found here: http://goc.grid.sinica.edu.tw/gocwiki/HowToTestYourSite
* feel free to ask any question about setting-up the site to the ROC. We're ready to help you in any trouble with site installation and deployment.  
<!--* feel free to ask any question about setting-up the site to the ROC. We're ready to help you in any trouble with site installation and deployment.
-->


== STEP 4: Certification ==
== STEP 4: Certification ==
Line 116: Line 124:
* inform the ROC that site is fully installed and configured properly;
* inform the ROC that site is fully installed and configured properly;
* fix issues raised by ROC staff.
* fix issues raised by ROC staff.
* register in the ROC DECH support portal as support staff (for your site) under https://dech-support.fzk.de/pages/support.php
* register your site in the NGI-DE helpdesk as support staff under  
* (subscribe all admins to DECH ROC site admins mailing list. There you can ask questions, share your expertise and get known about recent EGEE news relevant for DECH region. The list's address is XXX-to-be-announced-XXX)
https://helpdesk.ngi-de.eu/index.php?mode=register
* subscribe all site admins to your site admins mailing list, and register your site mailing list to NGI-DE operations mailing list (NGI-DE-OPERATIONS@LISTSERV.DFN.DE). There you can ask questions, share your expertise and get known about recent news relevant for NGI-DE.  


ROC staff actions:
ROC staff actions:


* check if the site is fully functional and inform the site managers about detected issues;
* check if the site is fully functional and inform the site managers about detected issues;
* if everything is OK, switch site to certified status and schedule "Initial maintenance" for five working days due to necessity to check if the site is working properly in production environment (some features can be verified only in production mode);
* if everything is OK, switch site to certified status and schedule "Initial maintenance" for five working days due to necessity to check if the site is working properly in production environment (some features can be verified only in production mode. After at latest three hours since the site is switched to Certified the site should be visible in the NGI-DE production instance, https://ngi-de-nagios.gridka.de);
* if everything is OK and initial scheduled downtime is over, the site is fully certified!  
* if everything is OK and initial scheduled downtime is over, the site is fully certified!


== FINAL REMARKS ==
== FINAL REMARKS ==


Some important information for certified sites concerning operation in DECH ROC below:
Some important information for certified sites concerning operation in NGI-DE below:


# If your site fails SAM tests you may receive a ticket. Note that COD's send-tickets-priority is based on number of CPUs at a site, so you might not get this immediately after the failure. Please be pro-active - monitor your site and fix problems before tickets are raised.
# If your site fails Nagios tests you may receive a ticket after 24 hours since the alarm notification. Please be pro-active - monitor your site and fix problems before tickets are raised.
# If you have problems with solving a ticket/problem or you don't know how to handle it you shall send an e-mail asking for support at grid-support-dech at iwr.fzk.de
<!--# If you have problems with solving a ticket/problem or you don't know how to handle it you shall send an e-mail asking for support at grid-support-dech at iwr.fzk.de
# Don't forget to submit your weekly reports: https://cic.in2p3.fr/index.php?id=rc&subid=rc_report&js_status=2
# Don't forget to submit your weekly reports: https://cic.in2p3.fr/index.php?id=rc&subid=rc_report&js_status=2
# Middleware is a subject of obligatory upgrades, you will be informed about new releases and deadline for upgrades by EGEE Broadcast and/or DECH ROC.
# Middleware is a subject of obligatory upgrades, you will be informed about new releases and deadline for upgrades by EGEE Broadcast and/or DECH ROC-->
# There is a statistic check of site availability and reliability once per month. Sites which have an availability less than 70% and reliability less than 75% are requested through a GGUS to motivate the poor performance provided. Sites which have an availability of less than 70% for three consecutive months will be suspended.

Latest revision as of 09:36, 8 July 2011

NGI-DE/NGI-CH Operations Center

Ngi-de-logo-trans.gif


NGI-DE wiki


Join as Resource Centre


NGI-DE Site Registration and Certification Procedure

In the National GRID Initiative Germany(NGI-DE) there are three types of middleware resources suppported. They are gLite, UNICORE and Globus. To be a NGI-DE site, at least one of the three middlewares has to be supported and the site has to register in GOCDB. To be an UNICORE site, UNICOREX and unicore-gateway of UNICORE6 must be supported. To be the Globus site, globus-GRIDFTP and GRAM5 of Globus version 5 must be supported. The gLite site should have the site-BDII, the Storage Element (SE) or/and Computing Element (CE) with at least eight Worker Node cores.

Below you can find a procedure on how to contribute resources to NGI-DE. In case of doubts, please contact us by e-mail: ngi-de-admin@lists.kit.edu


STEP 1 - Registration

Requirements:

  • willingness to set-up the site
  • Primary Site Manager - the person entitled to represent the Resource Centre in NGI-DE that owns a personal certificate

Actions by Primary Site Managers:

  • send a digitally signed e-mail to ngi-de-admin@lists.kit.edu that include the following data and statements:
   Personal Data of Primary Site Manager:
   Name:

   Email:

   Telephone:

   Hours:

   Certificate DN:

   I'm the Primary Site Manager of the site described below.

   Site (GIIS) Name:

   Official Name of Hosting Institution:

   Domain: 
   Site Email Address:

   Site Telephone Number:

   Site Emergency Number:

   Country:

   All administrators and other necessary personnel at the site will be informed of and agree to abide by all Grid operating policies described at:

   the Grid Site Operations Security Policy

   The Site Security Contact and the team members will be informed of and agree to

    the Security Incident Response Policy 

ROC Staff Actions:

  • open a ggus ticket to follow up the procedure
  • register the site to GOC DB as a candidate site;
  • confirm registration to Primary Site Manager.
  • create dteam group(optional) /dteam/NGI_DE/..your_resource_center.


STEP 2: Preparation

Actions by Primary Site Manager:

  • register as a user in the GOCDB
  • apply for the 'Site Manager' role of your newly created site:
  • after approval of your role by a regional manager:
  • fill all missing information in GOC DB about the site including names of machines. The most critical are: GIISURL and section "Nodes".
  • add other site administrators and security officers to GOCDB and assign the appropriate roles to them. There should be at least one 'Security Officer'.
  • create a site admin contact list and a security incident response (CSIRT) list and add to your GOCDB entry. The mailing lists should reach at least two people. They should be willing to react quickly to requests, in particular to security incidences.

Actions by All Admins:

ROC staff Actions:

  • test site and especially security contacts
  • switch site to non-certified status
  • enable monitoring of the site
  • Add the new site to the configuration file of the NGI_DE Nagios monitoring test instance (rocmon-fzk.gridka.de). After reconfiguration the UNCERTIFIED sites will appear in (https://rocmon-fzk.gridka.de/nagios/). To access the page you need to register as vo dteam member

STEP 3: Installation

Site Admins Actions:

  • check which version of middleware is obligatory for production installations:
  • gLite middleware page:

http://glite.web.cern.ch/glite/packages/latestRelease.asp

  • UNICORE middleware page

http://www.unicore.eu/index.php

  • install the grid middleware according to the documentation on the release pages.
  • a minimum set of services is:
  • for gLite:

one computing element (CE), and/or one storage element (SE), eight worker nodes (WNs), as well as a SiteBDII and a monitoring box (MON)

  • for UNICORE

UNICOREX and gateway

  • for Globus

globus-GRIDFTP and GRAM

  • use this topBDII during the certification process: BDII_HOST= bdii-fzk.gridka.de

Important notes:

STEP 4: Certification

Actions by Site Admins:

  • inform the ROC that site is fully installed and configured properly;
  • fix issues raised by ROC staff.
  • register your site in the NGI-DE helpdesk as support staff under

https://helpdesk.ngi-de.eu/index.php?mode=register

  • subscribe all site admins to your site admins mailing list, and register your site mailing list to NGI-DE operations mailing list (NGI-DE-OPERATIONS@LISTSERV.DFN.DE). There you can ask questions, share your expertise and get known about recent news relevant for NGI-DE.

ROC staff actions:

  • check if the site is fully functional and inform the site managers about detected issues;
  • if everything is OK, switch site to certified status and schedule "Initial maintenance" for five working days due to necessity to check if the site is working properly in production environment (some features can be verified only in production mode. After at latest three hours since the site is switched to Certified the site should be visible in the NGI-DE production instance, https://ngi-de-nagios.gridka.de);
  • if everything is OK and initial scheduled downtime is over, the site is fully certified!

FINAL REMARKS

Some important information for certified sites concerning operation in NGI-DE below:

  1. If your site fails Nagios tests you may receive a ticket after 24 hours since the alarm notification. Please be pro-active - monitor your site and fix problems before tickets are raised.
  2. There is a statistic check of site availability and reliability once per month. Sites which have an availability less than 70% and reliability less than 75% are requested through a GGUS to motivate the poor performance provided. Sites which have an availability of less than 70% for three consecutive months will be suspended.