Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

NGI DE CH Operations Center:Monitoring

From EGIWiki
Jump to navigation Jump to search


NGI-DE NGI-CH Monitoring

Mailinglist

ngi-de-monitoring@lists.kit.edu

Participants

Dimitri Nilsen (KIT) Foued Jrad (KIT) Alessandro Usai (SWITCH) Andres Aeschlimann (SWITCH)

Plan for ARC Testing set up in Nagios 15.9.11

  1. Customize the file /etc/grid-monitoring/org.ndgf.conf with the NGI services.
  2. NorduGrid Logging Improvement:
    1. Edit the xrls templates files (in /usr/share/grid-monitoring/org.ndgf/<SERVICE>/xrsl) for all the services and add (gmlog = "gmlog") to them e.g.
more /usr/share/grid-monitoring/org.ndgf/lfc/xrsl

(executable = "testjob.sh")
(jobname = "lfc")
(stdout = "testjob.out")
(gmlog = "gmlog")
(stderr = "testjob.err")
(inputfiles = ("testjob.sh" "/usr/share/grid-monitoring/org.ndgf/lfc/testjob.sh")
              ("file" "%LFC_TESTFILE%"))
(outputfiles = ("testjob.out" "")("testjob.err" "")
               ("outfile" "%LFC_STORAGE_W%/%HOST%-lfc-%TIME%"))
(walltime = "15 min")
(memory = "256")</nowiki>

This will ensure that in case of error the gmlog (useful for debugging) is sent back as part of the outputsandbox. Files in /usr/share/grid-monitoring/org.ndgf : gridftp/xrsl, jobsubmit/xrsl, lfc/xrsl, rls/xrsl, srm/xrsl

  1. Data management requirements for NorduGrid: the testfile used for the LFC/SRM/GridFTP tests must be created/managed manually.

Notice (19.10.2011): It is important that the file/LFC entry be created with the same credentials used by the Nagios monitoring node! A robot certificate will be used in the near future: checks to be carried out with it and the dCache and LFC nodes used by NGI_DE. For the time being, for the ARC tests in the test system (this only affects NGI_CH!), feronia.switch.ch (DPM) and lodur.switch.ch (LFC) are used instead.

Current Status 19.10.2011

On rocmon-fzk.gridka.de, File /etc/grid-monitoring/org.ndgf.conf has been customized to enable the ARC tests. For the time being feronia.switch.ch (DPM) and lodur.switch.ch (LFC) are used.

Dmitry will provide access to the official LFC and SRM services by the production probes.

Migration to ARC 1: Because of https://ggus.eu/tech/ticket_show.php?ticket=72260 further changes had to be made by hand in files

/etc/grid-monitoring/org.ndgf.conf

/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-lfc

/usr/share/grid-monitoring/org.ndgf/lfc/xrsl

to ensure LFC tests run successfully (this is backward compatible with ARC 0.8).

To be done:

1) access to the production system ngi-de-nagios.gridka.de to be granted to Alessandro.

2) change of the site-info.def file in both the test and production systems, to grant admin rights to Alessandro (to be done by Alessandro)

3) update 14? to be discussed

4) ARC enabling on the production system (to be done by Alessandro)

5) dCache access: once the robot certificate is used, this should not be a problem any more