Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAMUpdate23"

From EGIWiki
Jump to navigation Jump to search
 
(14 intermediate revisions by 3 users not shown)
Line 11: Line 11:




Detailed list of all new features and bug fixes can be found here: [https://github.com/ARGOeu/sam-probes/issues?q=is%3Aissue+milestone%3AUpdate-23 ARGO/SAM github].
Detailed list of all new features and bug fixes can be found here: [https://github.com/ARGOeu/sam-probes/issues?q=is%3Aissue+milestone%3AUpdate-23 ARGO/SAM github- Milestone Update23].


== Installation ==
== Installation ==
Line 37: Line 37:
  -rw------- 1 root root 4836 Sep 18 14:44 userkey.pem
  -rw------- 1 root root 4836 Sep 18 14:44 userkey.pem
   
   
  $ /opt/globus/bin/myproxy-init -c 4320 -k NagiosRetrieve-<hostname>-<VO name> -s MYPROXY -l nagios -x -Z <host DN>
  $ myproxy-init -c 4320 -k NagiosRetrieve-<hostname>-<VO name> -s <MYPROXY-name> -l nagios -x -Z <host DN>


=== YUM repositories ===
=== YUM repositories ===


==== Staged Rollout ====
==== OS/EPEL repos ====
 
* Add the following config to all CentOS/SL base repositories:
'''Important''': do NOT perform staged rollout upgrade on your production SAM. SAM Update-23 removes tests that are currently part of ROC_CRITICAL and removal will cause all CEs to become UNKNOWN.
  exclude=mysql51*
 
Sites participating in staged rollout should use the following repo config files:
* [http://repository.egi.eu/sw/production/umd/candidate/3/repofiles/sl5/UMD-3-base.repo UMD-3-base.repo]
* [http://repository.egi.eu/sw/production/umd/candidate/3/repofiles/sl5/UMD-3-updates.repo UMD-3-updates.repo]
* sam.repo:
  [sam]
name=SAM repo
baseurl=http://rpm.hellasgrid.gr/mash/centos5-sam-23/$basearch
enabled=1
priority=10
gpgcheck=0


After [[#Installation]] or [[#Upgrade]] in order to enable new tests create file /etc/ncg/ncg-localdb.d/newtests with the following content:
* If you don't have it already please install [http://download.fedoraproject.org/pub/epel/5/i386/repoview/epel-release.html The newest version of 'epel-release' for EL5].  
ADD_SERVICE_METRIC!Site-BDII!org.bdii.GLUE2-Validate
* If you have priority set on EPEL repository, make sure the priority value is '''higher than 10'''.
  ADD_SERVICE_METRIC!FTS!ch.cern.FTS3-Service
ADD_SERVICE_METRIC!FTS!ch.cern.FTS3-StalledTransfers
and rerun:
ncg.reload.sh
These tests will be visible only in Nagios interface and not in the MyEGI one.


==== Production ====
==== Production ====
Line 68: Line 52:


Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo
Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo
Add the following config to all CentOS/SL base repositories:
exclude=mysql51*
If you have priority set on EPEL repository, make sure it is lower than the SAM one.


=== Package installation ===
=== Package installation ===
Line 101: Line 80:
      
      
Check MyProxy credentials
Check MyProxy credentials
  $ nagios-run-check <hostname> hr.srce.GridProxy-Get-VO
  $ nagios-run-check <hostname> hr.srce.GridProxy-Get-<VO-name>


== Upgrade ==
== Upgrade ==
Line 128: Line 107:
*grid-monitoring-config-gen-0.95.0-1.el5
*grid-monitoring-config-gen-0.95.0-1.el5
*grid-monitoring-probes-eu.egi.sec-1.0.11-24.el5
*grid-monitoring-probes-eu.egi.sec-1.0.11-24.el5
*glite-yaim-nagios-1.11.2-1.el5  
*glite-yaim-nagios-1.11.3-1.el5  
*msg-nagios-bridge-1.1.0-1.el5
*msg-nagios-bridge-1.1.0-1.el5
*mrs-1.8.0-1.el5  
*mrs-1.8.0-1.el5  
*mywlcg-1.5.6-3.el5
*mywlcg-1.5.6-3.el5
*nagios-gocdb-downtime-0.25.0-1.el5  
*nagios-gocdb-downtime-0.25.0-1.el5  
*ncg-metric-config-1.5.0-1.el5
*ncg-metric-config-1.5.1-1.el5
*poem-0.9.91-1.el5
*poem-0.9.91-1.el5
*poem-sync-0.9.91-1.el5
*poem-sync-0.9.91-1.el5
Line 141: Line 120:
Packages moved/added to the UMD-3 repo:  
Packages moved/added to the UMD-3 repo:  


*emi-cream-nagios-1.0.1-5.el5.sam  
*emi-cream-nagios-1.0.1-6.el5.sam  
*emi.dcache.srm-probes-1.0.1-1  
*emi.dcache.srm-probes-1.0.1-1  
*egi-mpi-nagios-0.0.7-4.1  
*egi-mpi-nagios-0.0.7-4.1  
Line 150: Line 129:
*grid-monitoring-probes-cadist-0.6.0-1.el5  
*grid-monitoring-probes-cadist-0.6.0-1.el5  
*grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5  
*grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5  
*grid-monitoring-probes-hr.srce-0.38.0-1.el5  
*grid-monitoring-probes-hr.srce-0.38.1-1.el5  
*nagios-plugins-argus-1.1.0-2.el5  
*nagios-plugins-argus-1.1.0-2.el5  
*nagios-plugins-emi.glexec-0.3.0-1.sl5  
*nagios-plugins-emi.glexec-0.3.0-1.sl5  
Line 178: Line 157:
  --wn-se-rep-file
  --wn-se-rep-file
  --wn-bdii
  --wn-bdii
Modifications can be found in directories '''/etc/ncg/ncg-localdb.d/jobsubmit''' and '''/etc/ncg-metric-config.d/'''.


== Yaim variable changes ==
== Yaim variable changes ==

Latest revision as of 10:18, 17 December 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager


Major changes

Major changes in SAM Update-23:

  • Probes are moved to the UMD-3 repository. This decision was approved by the OMB in order to enable probe developers to update probes more frequently and independently from SAM releases.
  • Removal of the SAM GridMon (sam-gridmon) and its dependencies. SAM Update-23 supports only SAM Nagios (sam-nagios). In the future version SAM GridMon will be replaced with the ARGO engine.


Detailed list of all new features and bug fixes can be found here: ARGO/SAM github- Milestone Update23.

Installation

This guide is based on the previous SAM Administration guide: [1].

Prerequisites

Install your host certificate to secure the Nagios portal:

$ ls -l /etc/grid-security/host*
-rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 28 19:25 /etc/grid-security/hostkey.pem
 
$ openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client"
SSL client : Yes

SELINUX needs to be disabled to proceed with the installation. If it is enabled, follow the instructions below and reboot the machine:

$ setenforce 0
$ sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config

Generate MyProxy credential (only needed if you are not using robot certificates), steps should be perform on an UI box:

$ ls -l .globus/
total 16
-rw-r--r-- 1 root root 4908 Sep 18 14:44 usercert.pem
-rw------- 1 root root 4836 Sep 18 14:44 userkey.pem

$ myproxy-init -c 4320 -k NagiosRetrieve-<hostname>-<VO name> -s <MYPROXY-name> -l nagios -x -Z <host DN>

YUM repositories

OS/EPEL repos

  • Add the following config to all CentOS/SL base repositories:
exclude=mysql51*

Production

Follow the instructions for installation of UMD-3 and EPEL repositories: http://repository.egi.eu/category/umd_releases/distribution/umd-3/. In this manual we assume that priority of the UMD-3 is 1 as it is defined in the umd-release package.

Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo

Package installation

Perform the following installation steps:

$ yum -y install ca-policy-egi-core httpd mysql51
$ yum -y install nagios.x86_64
$ yum install sam-nagios

Configuration

SAM uses Yaim for configuration. A detailed specification of all SAM configuration parameters is available in the SAM documentation:

Check the Yaim variables changes below: SAMUpdate23#Yaim_variable_changes.

In addition, check the FAQs for common configurations and problems: [2]

Once the site-info.def is ready, run Yaim:

$ /opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS

Validation

Check the Nagios web interface and SAM portal are up

  • https://<hostname>/nagios
  • http://<hostname>/myegi

Check MyProxy credentials

$ nagios-run-check <hostname> hr.srce.GridProxy-Get-<VO-name>

Upgrade

Upgrade from Update-22 is fully supported and it does not require SAM box reinstall. Procedure is the following:

  • remove UMD-2 repo
 yum remove umd-release
 rm -rf /etc/yum.repos.d/UMD-2-*
  • add UMD repositories:
    • install UMD-3 repo. For the StagedRollout sites please use the repos reccomended above
 wget http://repository.egi.eu/sw/production/umd/3/sl5/x86_64/updates/umd-release-3.0.1-1.el5.noarch.rpm
 yum --nogpgcheck localinstall umd-release-3.0.1-1.el5.noarch.rpm
  • add the following config to all EPEL base repositories:
 exclude=perl-DateTime
  • update everything
 yum update
  • configuration
 /opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS

Upgrade from release older than Update-22 is not supported and it requires clean installation.

Package changes

Updated packages from the SAM repo:

  • atp-1.27.19-1.el5.noarch.rpm
  • grid-monitoring-config-gen-0.95.0-1.el5
  • grid-monitoring-probes-eu.egi.sec-1.0.11-24.el5
  • glite-yaim-nagios-1.11.3-1.el5
  • msg-nagios-bridge-1.1.0-1.el5
  • mrs-1.8.0-1.el5
  • mywlcg-1.5.6-3.el5
  • nagios-gocdb-downtime-0.25.0-1.el5
  • ncg-metric-config-1.5.1-1.el5
  • poem-0.9.91-1.el5
  • poem-sync-0.9.91-1.el5
  • sam-nagios-1.23.0-2.el5
  • sam-release-1.23.0-1.el5

Packages moved/added to the UMD-3 repo:

  • emi-cream-nagios-1.0.1-6.el5.sam
  • emi.dcache.srm-probes-1.0.1-1
  • egi-mpi-nagios-0.0.7-4.1
  • emi-wms-nagios-3.5.0-3.sl5
  • glue-validator-2.0.25-0
  • grid-monitoring-org.activemq-probes-0.15-1.el5
  • grid-monitoring-org.nagiosexchange-probes-0.19-1.el5
  • grid-monitoring-probes-cadist-0.6.0-1.el5
  • grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5
  • grid-monitoring-probes-hr.srce-0.38.1-1.el5
  • nagios-plugins-argus-1.1.0-2.el5
  • nagios-plugins-emi.glexec-0.3.0-1.sl5
  • nagios-plugins-dg-1.0.1-1.el5
  • nagios-plugins-emi.glexec-0.3.0-1.sl5
  • nagios-plugins-emi.glexec-config-1.0.0-2.el5
  • nagios-plugins-fts-3.2.30-1.el5
  • nagios-plugins-lfc-0.9.5-2.el5.sam
  • nordugrid-arc-nagios-plugins-1.8.1-1
  • nordugrid-arc-nagios-plugins-egi-1.8.1-1
  • perl-GridMon-1.0.73-1.el5
  • qcg-broker-nagios-probe-3.4.0-3
  • qcg-comp-nagios-probe-3.4.0-9
  • qcg-ntf-nagios-probe-3.4.0-2
  • unicore-nagios-plugins-2.3.2-0.sl5

Obsoleted packages:

  • nagios-plugins-wn-rep
  • gstat-validation

NCG config changes

  • Because of removal of org.sam.WN-Rep* tests, running Yaim will delete config file /etc/ncg/ncg-localdb.d/jobsubmit. On the existing SAM installations, remove all custom configuration of emi.cream.*-JobState test's parameters:
--wn-lfc
--wn-se-rep
--wn-se-rep-file
--wn-bdii

Modifications can be found in directories /etc/ncg/ncg-localdb.d/jobsubmit and /etc/ncg-metric-config.d/.

Yaim variable changes

Default values changed:

Variables obsoleted:

  • JOBSUBMIT_WN_LFC
  • JOBSUBMIT_WN_SE_REP
  • JOBSUBMIT_WN_SE_REP_FILE

Test changes

Tests added:

  • ch.cern.FTS3-Service
  • ch.cern.FTS3-StalledTransfers
  • org.bdii.GLUE2-Validate

Tests removed:

  • org.nordugrid.ARC-CE-LFC-result
  • org.nordugrid.ARC-CE-lfc
  • org.nordugrid.ARC-CE-LFC-submit
  • org.sam.WN-RepDel
  • org.sam.WN-RepISenv
  • org.sam.WN-RepFree
  • org.sam.WN-RepCr
  • org.sam.WN-RepGet
  • org.sam.WN-RepRep
  • org.sam.WN-Rep