Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAMUpdate23"

From EGIWiki
Jump to navigation Jump to search
Line 152: Line 152:
*grid-monitoring-probes-cadist-0.6.0-1.el5  
*grid-monitoring-probes-cadist-0.6.0-1.el5  
*grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5  
*grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5  
*grid-monitoring-probes-hr.srce-0.38.0-1.el5  
*grid-monitoring-probes-hr.srce-0.38.1-1.el5  
*nagios-plugins-argus-1.1.0-2.el5  
*nagios-plugins-argus-1.1.0-2.el5  
*nagios-plugins-emi.glexec-0.3.0-1.sl5  
*nagios-plugins-emi.glexec-0.3.0-1.sl5  

Revision as of 13:39, 9 December 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager


Major changes

Major changes in SAM Update-23:

  • Probes are moved to the UMD-3 repository. This decision was approved by the OMB in order to enable probe developers to update probes more frequently and independently from SAM releases.
  • Removal of the SAM GridMon (sam-gridmon) and its dependencies. SAM Update-23 supports only SAM Nagios (sam-nagios). In the future version SAM GridMon will be replaced with the ARGO engine.


Detailed list of all new features and bug fixes can be found here: ARGO/SAM github- Milestone Update23.

Installation

This guide is based on the previous SAM Administration guide: [1].

Prerequisites

Install your host certificate to secure the Nagios portal:

$ ls -l /etc/grid-security/host*
-rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 28 19:25 /etc/grid-security/hostkey.pem
 
$ openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client"
SSL client : Yes

SELINUX needs to be disabled to proceed with the installation. If it is enabled, follow the instructions below and reboot the machine:

$ setenforce 0
$ sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config

Generate MyProxy credential (only needed if you are not using robot certificates), steps should be perform on an UI box:

$ ls -l .globus/
total 16
-rw-r--r-- 1 root root 4908 Sep 18 14:44 usercert.pem
-rw------- 1 root root 4836 Sep 18 14:44 userkey.pem

$ myproxy-init -c 4320 -k NagiosRetrieve-<hostname>-<VO name> -s <MYPROXY-name> -l nagios -x -Z <host DN>

YUM repositories

OS/EPEL repos

  • Add the following config to all CentOS/SL base repositories:
exclude=mysql51*

Staged Rollout

IMPORTANT: do NOT perform staged rollout upgrade on your production SAM. SAM Update-23 removes tests that are currently part of ROC_CRITICAL and removal will cause all CEs to become UNKNOWN.

Sites participating in staged rollout should use the following repo config files (note: they are not the production UMD repositories):

[sam]
name=SAM repo 
baseurl=http://rpm.hellasgrid.gr/mash/centos5-sam-23/$basearch
enabled=1
priority=10
gpgcheck=0

After #Installation or #Upgrade in order to enable new tests create file /etc/ncg/ncg-localdb.d/newtests with the following content:

ADD_SERVICE_METRIC!Site-BDII!org.bdii.GLUE2-Validate
ADD_SERVICE_METRIC!FTS!ch.cern.FTS3-Service
ADD_SERVICE_METRIC!FTS!ch.cern.FTS3-StalledTransfers

and rerun:

ncg.reload.sh

These tests will be visible only in Nagios interface and not in the MyEGI one.

Production

Follow the instructions for installation of UMD-3 and EPEL repositories: http://repository.egi.eu/category/umd_releases/distribution/umd-3/. In this manual we assume that priority of the UMD-3 is 1 as it is defined in the umd-release package.

Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo

Package installation

Perform the following installation steps:

$ yum -y install ca-policy-egi-core httpd mysql51
$ yum -y install nagios.x86_64
$ yum install sam-nagios

Configuration

SAM uses Yaim for configuration. A detailed specification of all SAM configuration parameters is available in the SAM documentation:

Check the Yaim variables changes below: SAMUpdate23#Yaim_variable_changes.

In addition, check the FAQs for common configurations and problems: [2]

Once the site-info.def is ready, run Yaim:

$ /opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS

Validation

Check the Nagios web interface and SAM portal are up

  • https://<hostname>/nagios
  • http://<hostname>/myegi

Check MyProxy credentials

$ nagios-run-check <hostname> hr.srce.GridProxy-Get-<VO-name>

Upgrade

Upgrade from Update-22 is fully supported and it does not require SAM box reinstall. Procedure is the following:

  • remove UMD-2 repo
 yum remove umd-release
 rm -rf /etc/yum.repos.d/UMD-2-*
  • add UMD repositories:
    • install UMD-3 repo. For the StagedRollout sites please use the repos reccomended above
 wget http://repository.egi.eu/sw/production/umd/3/sl5/x86_64/updates/umd-release-3.0.1-1.el5.noarch.rpm
 yum --nogpgcheck localinstall umd-release-3.0.1-1.el5.noarch.rpm
  • add the following config to all EPEL base repositories:
 exclude=perl-DateTime
  • update everything
 yum update
  • configuration
 /opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS

Upgrade from release older than Update-22 is not supported and it requires clean installation.

Package changes

Updated packages from the SAM repo:

  • atp-1.27.19-1.el5.noarch.rpm
  • grid-monitoring-config-gen-0.95.0-1.el5
  • grid-monitoring-probes-eu.egi.sec-1.0.11-24.el5
  • glite-yaim-nagios-1.11.3-1.el5
  • msg-nagios-bridge-1.1.0-1.el5
  • mrs-1.8.0-1.el5
  • mywlcg-1.5.6-3.el5
  • nagios-gocdb-downtime-0.25.0-1.el5
  • ncg-metric-config-1.5.1-1.el5
  • poem-0.9.91-1.el5
  • poem-sync-0.9.91-1.el5
  • sam-nagios-1.23.0-2.el5
  • sam-release-1.23.0-1.el5

Packages moved/added to the UMD-3 repo:

  • emi-cream-nagios-1.0.1-6.el5.sam
  • emi.dcache.srm-probes-1.0.1-1
  • egi-mpi-nagios-0.0.7-4.1
  • emi-wms-nagios-3.5.0-3.sl5
  • glue-validator-2.0.25-0
  • grid-monitoring-org.activemq-probes-0.15-1.el5
  • grid-monitoring-org.nagiosexchange-probes-0.19-1.el5
  • grid-monitoring-probes-cadist-0.6.0-1.el5
  • grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5
  • grid-monitoring-probes-hr.srce-0.38.1-1.el5
  • nagios-plugins-argus-1.1.0-2.el5
  • nagios-plugins-emi.glexec-0.3.0-1.sl5
  • nagios-plugins-dg-1.0.1-1.el5
  • nagios-plugins-emi.glexec-0.3.0-1.sl5
  • nagios-plugins-emi.glexec-config-1.0.0-2.el5
  • nagios-plugins-fts-3.2.30-1.el5
  • nagios-plugins-lfc-0.9.5-2.el5.sam
  • nordugrid-arc-nagios-plugins-1.8.1-1
  • nordugrid-arc-nagios-plugins-egi-1.8.1-1
  • perl-GridMon-1.0.73-1.el5
  • qcg-broker-nagios-probe-3.4.0-3
  • qcg-comp-nagios-probe-3.4.0-9
  • qcg-ntf-nagios-probe-3.4.0-2
  • unicore-nagios-plugins-2.3.2-0.sl5

Obsoleted packages:

  • nagios-plugins-wn-rep
  • gstat-validation

NCG config changes

  • Because of removal of org.sam.WN-Rep* tests, running Yaim will delete config file /etc/ncg/ncg-localdb.d/jobsubmit. On the existing SAM installations, remove all custom configuration of emi.cream.*-JobState test's parameters:
--wn-lfc
--wn-se-rep
--wn-se-rep-file
--wn-bdii

Yaim variable changes

Default values changed:

Variables obsoleted:

  • JOBSUBMIT_WN_LFC
  • JOBSUBMIT_WN_SE_REP
  • JOBSUBMIT_WN_SE_REP_FILE

Test changes

Tests added:

  • ch.cern.FTS3-Service
  • ch.cern.FTS3-StalledTransfers
  • org.bdii.GLUE2-Validate

Tests removed:

  • org.nordugrid.ARC-CE-LFC-result
  • org.nordugrid.ARC-CE-lfc
  • org.nordugrid.ARC-CE-LFC-submit
  • org.sam.WN-RepDel
  • org.sam.WN-RepISenv
  • org.sam.WN-RepFree
  • org.sam.WN-RepCr
  • org.sam.WN-RepGet
  • org.sam.WN-RepRep
  • org.sam.WN-Rep