Difference between revisions of "SAMUpdate23"
Line 69: | Line 69: | ||
Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo | Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo | ||
==== OS/EPEL repos ==== | |||
Add the following config to all CentOS/SL base repositories: | Add the following config to all CentOS/SL base repositories: | ||
exclude=mysql51* | exclude=mysql51* |
Revision as of 00:35, 19 November 2014
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Tools menu: | • Main page | • Instructions for developers | • AAI Proxy | • Accounting Portal | • Accounting Repository | • AppDB | • ARGO | • GGUS | • GOCDB |
• Message brokers | • Licenses | • OTAGs | • Operations Portal | • Perun | • EGI Collaboration tools | • LToS | • EGI Workload Manager |
Major changes
Major changes in SAM Update-23:
- Probes are moved to the UMD-3 repository. This decision was approved by the OMB in order to enable probe developers to update probes more frequently and independently from SAM releases.
- Removal of the SAM GridMon (sam-gridmon) and its dependencies. SAM Update-23 supports only SAM Nagios (sam-nagios). In the future version SAM GridMon will be replaced with the ARGO engine.
Detailed list of all new features and bug fixes can be found here: ARGO/SAM github- Milestone Update23.
Installation
This guide is based on the previous SAM Administration guide: [1].
Prerequisites
Install your host certificate to secure the Nagios portal:
$ ls -l /etc/grid-security/host* -rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem -r-------- 1 root root 887 Oct 28 19:25 /etc/grid-security/hostkey.pem $ openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client" SSL client : Yes
SELINUX needs to be disabled to proceed with the installation. If it is enabled, follow the instructions below and reboot the machine:
$ setenforce 0 $ sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
Generate MyProxy credential (only needed if you are not using robot certificates), steps should be perform on an UI box:
$ ls -l .globus/ total 16 -rw-r--r-- 1 root root 4908 Sep 18 14:44 usercert.pem -rw------- 1 root root 4836 Sep 18 14:44 userkey.pem $ /opt/globus/bin/myproxy-init -c 4320 -k NagiosRetrieve-<hostname>-<VO name> -s MYPROXY -l nagios -x -Z <host DN>
YUM repositories
Staged Rollout
IMPORTANT: do NOT perform staged rollout upgrade on your production SAM. SAM Update-23 removes tests that are currently part of ROC_CRITICAL and removal will cause all CEs to become UNKNOWN.
Sites participating in staged rollout should use the following repo config files:
- UMD-3-base.repo
- UMD-3-updates.repo
- sam.repo:
[sam] name=SAM repo baseurl=http://rpm.hellasgrid.gr/mash/centos5-sam-23/$basearch enabled=1 priority=10 gpgcheck=0
After #Installation or #Upgrade in order to enable new tests create file /etc/ncg/ncg-localdb.d/newtests with the following content:
ADD_SERVICE_METRIC!Site-BDII!org.bdii.GLUE2-Validate ADD_SERVICE_METRIC!FTS!ch.cern.FTS3-Service ADD_SERVICE_METRIC!FTS!ch.cern.FTS3-StalledTransfers
and rerun:
ncg.reload.sh
These tests will be visible only in Nagios interface and not in the MyEGI one.
Production
Follow the instructions for installation of UMD-3 and EPEL repositories: http://repository.egi.eu/category/umd_releases/distribution/umd-3/. In this manual we assume that priority of the UMD-3 is 1 as it is defined in the umd-release package.
Add SAM repository from here: http://repository.egi.eu/sw/production/sam/1/repofiles/sam.repo
OS/EPEL repos
Add the following config to all CentOS/SL base repositories:
exclude=mysql51*
If you have priority set on EPEL repository, make sure it is lower than the SAM one.
Package installation
Perform the following installation steps:
$ yum -y install ca-policy-egi-core httpd mysql51 $ yum -y install nagios.x86_64 $ yum install sam-nagios
Configuration
SAM uses Yaim for configuration. A detailed specification of all SAM configuration parameters is available in the SAM documentation:
- Common configuration options: SAM Configuration via YAIM-Common
- SAM-Nagios specific options: SAM Configuration via YAIM-SAMNagios
Check the Yaim variables changes below: SAMUpdate23#Yaim_variable_changes.
In addition, check the FAQs for common configurations and problems: [2]
Once the site-info.def is ready, run Yaim:
$ /opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS
Validation
Check the Nagios web interface and SAM portal are up
- https://<hostname>/nagios
- http://<hostname>/myegi
Check MyProxy credentials
$ nagios-run-check <hostname> hr.srce.GridProxy-Get-VO
Upgrade
Upgrade from Update-22 is fully supported and it does not require SAM box reinstall. Procedure is the following:
- remove UMD-2 repo
yum remove umd-release rm -rf /etc/yum.repos.d/UMD-2-*
- add UMD repositories:
- install UMD-3 repo. For the StagedRollout sites please use the repos reccomended above
wget http://repository.egi.eu/sw/production/umd/3/sl5/x86_64/updates/umd-release-3.0.1-1.el5.noarch.rpm yum --nogpgcheck localinstall umd-release-3.0.1-1.el5.noarch.rpm
- add the following config to all EPEL base repositories:
exclude=perl-DateTime
- update everything
yum update
- configuration
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS
Upgrade from release older than Update-22 is not supported and it requires clean installation.
Package changes
Updated packages from the SAM repo:
- atp-1.27.19-1.el5.noarch.rpm
- grid-monitoring-config-gen-0.95.0-1.el5
- grid-monitoring-probes-eu.egi.sec-1.0.11-24.el5
- glite-yaim-nagios-1.11.2-1.el5
- msg-nagios-bridge-1.1.0-1.el5
- mrs-1.8.0-1.el5
- mywlcg-1.5.6-3.el5
- nagios-gocdb-downtime-0.25.0-1.el5
- ncg-metric-config-1.5.0-1.el5
- poem-0.9.91-1.el5
- poem-sync-0.9.91-1.el5
- sam-nagios-1.23.0-2.el5
- sam-release-1.23.0-1.el5
Packages moved/added to the UMD-3 repo:
- emi-cream-nagios-1.0.1-5.el5.sam
- emi.dcache.srm-probes-1.0.1-1
- egi-mpi-nagios-0.0.7-4.1
- emi-wms-nagios-3.5.0-3.sl5
- glue-validator-2.0.25-0
- grid-monitoring-org.activemq-probes-0.15-1.el5
- grid-monitoring-org.nagiosexchange-probes-0.19-1.el5
- grid-monitoring-probes-cadist-0.6.0-1.el5
- grid-monitoring-probes-ch.cern.sam-1.6.15-1.el5
- grid-monitoring-probes-hr.srce-0.38.0-1.el5
- nagios-plugins-argus-1.1.0-2.el5
- nagios-plugins-emi.glexec-0.3.0-1.sl5
- nagios-plugins-dg-1.0.1-1.el5
- nagios-plugins-emi.glexec-0.3.0-1.sl5
- nagios-plugins-emi.glexec-config-1.0.0-2.el5
- nagios-plugins-fts-3.2.30-1.el5
- nagios-plugins-lfc-0.9.5-2.el5.sam
- nordugrid-arc-nagios-plugins-1.8.1-1
- nordugrid-arc-nagios-plugins-egi-1.8.1-1
- perl-GridMon-1.0.73-1.el5
- qcg-broker-nagios-probe-3.4.0-3
- qcg-comp-nagios-probe-3.4.0-9
- qcg-ntf-nagios-probe-3.4.0-2
- unicore-nagios-plugins-2.3.2-0.sl5
Obsoleted packages:
- nagios-plugins-wn-rep
- gstat-validation
NCG config changes
- Because of removal of org.sam.WN-Rep* tests, running Yaim will delete config file /etc/ncg/ncg-localdb.d/jobsubmit. On the existing SAM installations, remove all custom configuration of emi.cream.*-JobState test's parameters:
--wn-lfc --wn-se-rep --wn-se-rep-file --wn-bdii
Yaim variable changes
Default values changed:
- ATP_ROOT_URL: https://mon.egi.eu/atp
- N2MS_ROLES_URL: http://mon.egi.eu/nagios-roles.conf
- OPS_MONITOR_DN: /C=HR/O=edu/OU=srce/CN=opsmon.egi.eu
- POEM_SYNC_URLS: http://mon.egi.eu/poem/api/0.1/json/
Variables obsoleted:
- JOBSUBMIT_WN_LFC
- JOBSUBMIT_WN_SE_REP
- JOBSUBMIT_WN_SE_REP_FILE
Test changes
Tests added:
- ch.cern.FTS3-Service
- ch.cern.FTS3-StalledTransfers
- org.bdii.GLUE2-Validate
Tests removed:
- org.nordugrid.ARC-CE-LFC-result
- org.nordugrid.ARC-CE-lfc
- org.nordugrid.ARC-CE-LFC-submit
- org.sam.WN-RepDel
- org.sam.WN-RepISenv
- org.sam.WN-RepFree
- org.sam.WN-RepCr
- org.sam.WN-RepGet
- org.sam.WN-RepRep
- org.sam.WN-Rep