Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "ROD MW alarm template"

From EGIWiki
Jump to navigation Jump to search
imported>Krakow
(No difference)

Revision as of 13:48, 23 October 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 



This page provides templates for the tickets which should be send by ROD team to sites in case of software version or service type decommission campaign.

EMI 2 retirement campaign

Dear site administrators,

According to our monitoring, your site is running EMI 2 software, 
which will reach end of security updates and support on 30-04-2014 
(http://www.eu-emi.eu/releases#MajRel). 
Please check the list of affected service end-points on the Operations 
Dashboard of your site: https://operations-portal.egi.eu/dashboard

EMI 2 software versions are detected through queries to the information 
discovery system (BDII). Information published should accurately reflect 
the version of the running software, however, if you think the software 
run by the service end-points indicated in the dashboard is not EMI 2, 
please report a false positive in this ticket.

Unsupported software MUST be retired no later than 1 month after its 
End of Security Updates and support (https://wiki.egi.eu/wiki/PROC16#Policy) 
- see the details below.

With this ticket we request you two actions:

1. PROVIDE FEEDBACK ABOUT YOUR UPGRADE PLANS
Please provide information about you upgrade plans according to the 
following template.

Information about your upgrade plans must be supplied within 10 working 
days from the date this ticket is received.

*****************************************************
TEMPLATE

Your site name:

Affected service end-points (please list them):

Expected date for the completion of the upgrade of the end-points 
above (specify target date):

Technical issues preventing you from upgrading (provide input only 
if applicable):
*****************************************************

2. UPGRADE of EMI 2 SERVICES
Side administrators are requested to upgrade/retire unsupported 
software before it gets unsupported (by 30-04-2014).

The decommissioning deadline expires one month after the end of support of the 
software, for EMI 2 products this is: 

31-05-2014
 
* https://wiki.egi.eu/wiki/Software_Retirement_Calendar#Decommissioning_Calendar_EMI2
* https://wiki.egi.eu/wiki/PROC16#Decommissioning_deadline

Resource Centres that do not respond to this ticket, do not provide 
acceptable upgrade plans and timelines, or show no progress risk suspension 
after the decommissioning deadline (https://wiki.egi.eu/wiki/PROC16#Policy). 

If you decide to decommission services, please follow the appropriate 
procedure PROC12: http://wiki.egi.eu/wiki/PROC12 as the community must 
be informed about your decommissioning of services.

HOW TO HANDLE THIS TICKET

Please provide feedback about your upgrade plans within 10 working days 
from the date this ticket is received.
Site administrators who provide information about their upgrade plans 
MUST NOT close a ticket until the alarm disappears in the site Operations 
Dashboard (see link above). Please keep the ticket in status "in progress"/"On hold" 
until all unsupported products are either decommissioned or upgraded.

Thanks for your cooperation.

ROD team 


Sites not publishing user DN data to APEL

Dear Site Administrators

You are being contacted because according to the EGI central monitoring 
Nagios service (https://midmon.egi.eu/nagios/) your site is not publishing 
user DN data in the accounting usage records that are centrally gathered
by the APEL accounting repository of EGI.

The test detecting sites that are not publishing user DN information is 
named: eu.egi.APEL-UserDN
This test is raising alarms in the operations dashboard 
(https://operations-portal.in2p3.fr/dashboard) as of 02-08-2013
as announced at: https://operations-portal.in2p3.fr/broadcast/archive/id/990
For more information about the test see:
https://wiki.egi.eu/wiki/MW_Nagios_tests#eu.egi.APEL-UserDN

Publishing of user DN information is important for the accuracy of the 
Inter-NGI accounting reports that are available on the APEL accounting
portal: 
- https://accounting.egi.eu/repinterngi.php
- https://accounting.egi.eu/interngi_charts_country.php
- https://accounting.egi.eu/interngi_charts_ngi.php

SOLUTION
In order to fix this issue sites must reconfigure the emi-apel service. 
Instructions for enabling user DN publishing are available at:
https://wiki.egi.eu/wiki/Agenda-01-06-2012#2.1_Sites_not_publishing_user_DN_in_the_usage_record
(please see "APEL publisher is not properly configured").

The easiest solution for EGI sites is to set the following Yaim variable and rerun Yaim:
APEL_PUBLISH_USER_DN=yes

Thanks for your cooperation.

ROD team 

Decommissioning of non SHA-2 compliant services

Dear Site Administrators

You are being contacted as your site was detected to be hosting one or more service
instances (CREAM, VOMS, WMS, StoRM and/or dCache) that are not capable of handling SHA-2 certificates.

The latest IGTF time line statement on SHA-2 Secure Digest Mechanisms 
requires that as of 01-12-2013:
"- CAs should begin to phase out issuance of SHA-1 end entity certificates
- CAs should issue SHA-2 (SHA-256 or SHA-512) end entity certificates by default."
(https://www.eugridpma.org/documentation/hashrat/sha2-timeline) 

The implication of this is that after 01-12-2013 users that hold a SHA-2 X.509 
certificate, will experience authentication failures when accessing services that 
are not SHA-2 capable, where the term "services" includes any grid services and 
community-specific services like gateways and portals requiring X.509 authentication. 

*Please note*: After the 1st of December 2013 services not SHA-2 enabled will need to be 
in *Downtime*, until upgraded to a SHA-2 compliant version. 

Information about the minimum version supporting SHA-2 is available for EMI and IGE
products at: https://wiki.egi.eu/wiki/SHA-2_support_middleware_baseline

For CREAM, VOMS and WMS the minimum software versions required to guarantee SHA-2 
support are:
- CREAM: UMD-2/EMI-2 v. 1.14.4 released in UMD update 2.4.1
- VOMS: UMD-2/EMI-2 v. 2.0.9 and beyond [*]  
- WMS: UMD-3/EMI-3 v. 3.5.0 and beyond 
- StoRM: UMD-2/EMI-3 v.1.11.2 and beyond
- dCache: EMI-2: 2.2.17 and beyond, UMD-3/EMI-3: 2.6.5 and beyond

In order to avoid authentication errors for users holding a SHA-2 certificate, we 
highly recommend that services that are not SHA-2 compliant are upgraded to a 
version of the software which supports SHA-2 by:

31-11-2013

Please provide information in this ticket about your upgrade plans in two weeks time, and report any
problem with the proposed upgrade deadline.

Thanks for your cooperation.
ROD team 
[*] Recent versions of VOMS due to a problem [1] may not publish themselves in the 
information system, and this may generate false alarms. 
If the ticket is related to a VOMS service, check if it is published in the site BDII. 
If not, please follow the instructions in [2] and restart the BDII on the VOMS. 
Once the VOMS service is publishing itself, if the release version is 2.0.9 or beyond, 
the alarm should disappear in 24h.
[1] https://ggus.eu/ws/ticket_info.php?ticket=95981
[2] http://www.eu-emi.eu/releases/emi-3-monte-bianco/updates/-/asset_publisher/8D6t/content/issues-with-new-version-of-sudo-package-1-7-2p1-14-el5_8-2

[IMPORTANT]: Recent version of VOMS - due to a problem currently under investigation - may not publish themselves in the information system. A VOMS not published in the information system triggers a critical alarm in midmon for the sha-2 monitoring, if the VOMS is a recent SHA-2 enabled version this generate a false positive alarm in the operations dashboard. There are two possible scenarios for an alarm:

  • VOMS publishes itself in the information system: alarm valid, not a false positive
  • VOMS does not publish itself in the information system: ROD team should ask in the ticket to

the site manager to check the version of the deployed VOMS, if it's v2.0.9 and beyond the alarm is a false positive and can be ignored, otherwise if older than v2.0.9 the alarm is valid, the VOMS is not SHA-2 compliant. To solve the publishing issue, caused by a known sudo problem, site administrators - as suggested in the ticket template - must follow the instructions in this EMI known issue page. Additional details are available also in the GGUS ticket discussion.

Sites not declaring eu.egi.MPI CREAM service instances

Dear Site Administrators

The current MPI metric org.sam.WN-MPI is associated to three service types:
MPICH, MPICH2 and OPENMPI, need to be replaced by a new service type *in GOCDB*: eu.egi.MPI
eu.egi.MPI is a dummy service type needed in GOCDB to enable the running of MPI tests associated to CREAM service instances that support MPI capabilities.

REQUIRED ACTION
The new GOCDB service instances associated to type eu.egi.MPI need to be registered in GOCDB in preparation to Service Availability Monitoring (SAM) Update 22 (whose release is being finalized).
Site administrators of CREAM CEs that support MPI are requested to create one new service instance of type eu.egi.MPI for each CREAM CE instance which supports MPI.
The hostname of the service end-point has to be the one of the CREAM CE, as shown in the following examples:
https://goc.egi.eu/portal/gocdbpi/public/?method=get_service_endpoint&service_type=eu.egi.MPI

The current service types used by SAM (MPICH, MPICH2 and OPENMPI) are not in GOCDB but rather used in SAM by extracting information from BDII. These will be obsoleted automatically with SAM Update 22 and no action is required from site administrators.

eu.egi.MPI service instances can be registered in GOCDB at any time; this does not interfere with the current production SAM monitoring infrastructure. Please implement the requested changes in GOCDB and close this ticket when finished.

Thanks for your cooperation.

ROD team 

Old templates

EMI 1 retirement campaign (ARGUS, LFC, VOMS)

Dear site administrators,

According to our monitoring, your site is running EMI 1 software, 
which will reach end of security updates and support on 30-04-2013 
(http://www.eu-emi.eu/releases#MajRel). 
Please check the list of affected service end-points on the Operations 
Dashboard of your site: https://operations-portal.egi.eu/dashboard

EMI 1 software versions are detected through queries to the information 
discovery system (BDII). Information published should accurately reflect 
the version of the running software, however, if you think the software 
run by the service end-points indicated in the dashboard is not EMI 1, 
please report a false positive in this ticket.

Unsupported software MUST be retired no later than 1 month after its 
End of Security Updates and support (https://wiki.egi.eu/wiki/PROC16#Policy) 
- see the details below.

With this ticket we request you two actions:

1. PROVIDE FEEDBACK ABOUT YOUR UPGRADE PLANS
Please provide information about you upgrade plans according to the 
following template as soon as possible.

Information about your upgrade plans must be supplied within 5 working 
days from the date this ticket is received.

*****************************************************
TEMPLATE

Your site name:

Affected service end-points (please list them):

Expected date for the completion of the upgrade of the end-points 
above (specify target date):

Technical issues preventing you from upgrading (provide input only 
if applicable):
*****************************************************

2. UPGRADE of EMI 1 SERVICES
Side administrators are requested to upgrade/retire unsupported 
software before it gets unsupported (by 30-04-2013).

The decommissioning deadline expires one month after the end of support of the 
software, for EMI 1 products this is: 

31-05-2013
 
* https://wiki.egi.eu/wiki/Software_Retirement_Calendar#Decommissioning_Calendar_EMI1
* https://wiki.egi.eu/wiki/PROC16#Decommissioning_deadline

Resource Centres that do not respond to this ticket, do not provide 
acceptable upgrade plans and timelines, or show no progress risk suspension 
after the decommissioning deadline (https://wiki.egi.eu/wiki/PROC16#Policy). 

If you decide to decommission services, please follow the appropriate 
procedure PROC12: http://wiki.egi.eu/wiki/PROC12 as the community must 
be informed about your decommissioning of services.

HOW TO HANDLE THIS TICKET

Please provide immediate feedback about your upgrade plans.
Site administrators who provide information about their upgrade plans 
MUST NOT close a ticket until the alarm disappears in the site Operations 
Dashboard (see link above). Please keep the ticket in status "in progress"/"On hold"  
until all unsupported products are either decommissioned or upgraded.

Thanks for your cooperation.

ROD team 

DPM: Retirement campaign of EMI 1 versions and EMI 2 versions < 1.8.6

Dear site administrators,

According to our monitoring, your site is running a DPM software version 
that needs to be upgraded. Purpose of this ticket is to notify you about
the need to update your DPM software and to collect information about
your upgrade plans. DPM software versions are detected through queries 
to the information discovery system (BDII). Information published should 
accurately reflect the version of the running software, however, if you 
think the software run by the service end-points indicated in the 
dashboard is EMI 2 v. 1.8.6, please report a false positive in this ticket.

Why should your DPM service be updated? One of these cases may apply to you.

(1) You are running a EMI 1 DPM software version
EMI 1 software will reach end of security updates and support on 30-04-2013 
(http://www.eu-emi.eu/releases#MajRel). Unsupported software MUST be retired 
no later than 1 month after its End of Security Updates and support 
(https://wiki.egi.eu/wiki/PROC16#Policy) - see the details below.

(2) You are running a EMI 2 DPM software version affected by vulnerability
EGI-SVG-2012-2683 (https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2011-2683).
All DPM versions older than EMI2 1.8.6 are affected. The advisory recommends 
that sites update their software as soon as possible. 

With this ticket we request you two actions:

ACTION 1. PROVIDE FEEDBACK ABOUT YOUR UPGRADE PLANS
Please provide information about you upgrade plans according to the 
following template.

Information about your upgrade plans must be supplied within 10 working 
days from the date this ticket is received.

*****************************************************
TEMPLATE

Your site name:

Affected service end-points (please list them):

Expected date for the completion of the upgrade of the end-points 
above (specify target date):

Technical issues preventing you from upgrading (provide input only 
if applicable):
*****************************************************

ACTION 2. UPGRADE of DPM 
Site administrators are requested to upgrade/retire EMI 1 DPM instances and
EMI 2 DPM instances affected by vulnerability SVG-2011-2683 by 30-04-2013.
The decommissioning hard deadline is: 31-05-2013

Resource Centres that do not respond to this ticket, do not provide 
acceptable upgrade plans and timelines, or show no progress, risk suspension 
after the decommissioning deadline (https://wiki.egi.eu/wiki/PROC16#Policy). 

If you decide to decommission services, please follow the appropriate 
procedure PROC12: http://wiki.egi.eu/wiki/PROC12 as the community must 
be informed about your decommissioning of services.

HOW TO HANDLE THIS TICKET

Please provide feedback about your upgrade plans within 10 working days 
from the date this ticket is received. If you think the DPM instances
detected are a false positive, please report this in the ticket.
Also report any technical issue that may prevent you from upgrading
according to the proposed timeline.
Site administrators who provide information about their upgrade plans 
MUST NOT close a ticket until the alarm disappears in the site Operations 
Dashboard (https://operations-portal.egi.eu/dashboard). Please keep the 
ticket in status "in progress"/"On hold" until the DPM service is decommissioned 
or upgraded.

Thanks for your cooperation.

ROD team 

Information about the probes used to detect DPM software versions

DPM instances (EMI 1 + EMI 2 versions < 1.8.6) are detected through the following three tests:

- eu.egi.sec.DPM-EMI-1 - looking for EMI-1 DPM in GLUE1 branch (doesn't cover version 1.8.6)
- eu.egi.sec.DPM-GLUE2-EMI-1 - looking for EMI-1 DPM in GLUE2 branch (covers version 1.8.6)
- eu.egi.sec.DPM-GLUE2-EMI-2 - looking for EMI-2 DPM in GLUE2 branch with version different than 1.8.6

The probes are documented at: https://wiki.egi.eu/wiki/MW_SAM_tests#Middleware_monitoring_SAM_instance

EMI 1 retirement campaign

Dear site administrators,

According to our monitoring, your site is running EMI 1 software, 
which will reach end of security updates and support on 30-04-2013 
(http://www.eu-emi.eu/releases#MajRel). 
Please check the list of affected service end-points on the Operations 
Dashboard of your site: https://operations-portal.egi.eu/dashboard

EMI 1 software versions are detected through queries to the information 
discovery system (BDII). Information published should accurately reflect 
the version of the running software, however, if you think the software 
run by the service end-points indicated in the dashboard is not EMI 1, 
please report a false positive in this ticket.

Unsupported software MUST be retired no later than 1 month after its 
End of Security Updates and support (https://wiki.egi.eu/wiki/PROC16#Policy) 
- see the details below.

With this ticket we request you two actions:

1. PROVIDE FEEDBACK ABOUT YOUR UPGRADE PLANS
Please provide information about you upgrade plans according to the 
following template.

Information about your upgrade plans must be supplied within 10 working 
days from the date this ticket is received.

*****************************************************
TEMPLATE

Your site name:

Affected service end-points (please list them):

Expected date for the completion of the upgrade of the end-points 
above (specify target date):

Technical issues preventing you from upgrading (provide input only 
if applicable):
*****************************************************

2. UPGRADE of EMI 1 SERVICES
Side administrators are requested to upgrade/retire unsupported 
software before it gets unsupported (by 30-04-2013).

The decommissioning deadline expires one month after the end of support of the 
software, for EMI 1 products this is: 

31-05-2013
 
* https://wiki.egi.eu/wiki/Software_Retirement_Calendar#Decommissioning_Calendar_EMI1
* https://wiki.egi.eu/wiki/PROC16#Decommissioning_deadline

Resource Centres that do not respond to this ticket, do not provide 
acceptable upgrade plans and timelines, or show no progress risk suspension 
after the decommissioning deadline (https://wiki.egi.eu/wiki/PROC16#Policy). 

If you decide to decommission services, please follow the appropriate 
procedure PROC12: http://wiki.egi.eu/wiki/PROC12 as the community must 
be informed about your decommissioning of services.

HOW TO HANDLE THIS TICKET

Please provide feedback about your upgrade plans within 10 working days 
from the date this ticket is received.
Site administrators who provide information about their upgrade plans 
MUST NOT close a ticket until the alarm disappears in the site Operations 
Dashboard (see link above). Please keep the ticket in status "in progress"/"On hold" 
until all unsupported products are either decommissioned or upgraded.

Thanks for your cooperation.

ROD team 

gLite 3.2 DPM, LFC and/or Worker Nodes campaign

Dear site administrators,

According to our monitoring, your site is running gLite 3.2 DPM, LFC and/or Worker Nodes, whose software reached end of life on 30-11-2012. 
Please check the list of affected service end-points on the Operations Dashboard of your site: https://operations-portal.egi.eu/dashboard

gLite software products are detected through queries to the information discovery system (BDII). 
Information published should accurately reflect the version of the running software, however, if you think the software run by the service 
end-points indicated in the dashboard is supported, please report a false positive in this ticket.

This ticket is to warn you about the deployment of unsupported gLite 3.2 DPM and/or WN software in your site and to request an 
upgrade/decommissioning of these according to the following guideline.

1. PROVIDE FEEDBACK ABOUT YOUR UPGRADE PLANS

Please provide information about you upgrade plans according to the following template.

Information about your upgrade plans must be supplied within 10 working days from the date this ticket is received.

TEMPLATE

Your site name:

Affected service end-points (please list them):

Expected date for the completion of the upgrade of the end-points above (specify target date):

Technical issues preventing you from upgrading (provide input only if applicable):

2. UPGRADE of gLite 3.2 UNSUPPORTED DPM/LFC/WN

DEADLINE for the upgrade/decommissioning is: 31-01-2013

End-points that cannot be upgraded by that time, must be put in downtime as of: 01-02-2013

Resource Centres that do not respond to this ticket, do not provide acceptable plans and timelines, or show no progress risk suspension. 
Suspension does *not* concern Resource Centres that are actively working to solve the issue, have documented plans with acceptable timelines,
or are facing technical issues that prevent them from decommissioning/upgrading unsupported products.

If you decide to decommission services, please follow the appropriate procedure PROC12: http://wiki.egi.eu/wiki/PROC12 - the community must be informed  about the decommissioning of services.

HOW TO HANDLE THIS TICKET

Please provide feedback about your upgrade plans within 10 working days from the date this ticket is received.
Site administrators who provide information about their upgrade plans MUST NOT close a ticket until the alarm disappears in the site Operations Dashboard (see link above). Please keep the ticket in status "in progress"/"On hold" until all unsupported products are either decommissioned or upgraded.

Thanks for your cooperation.

ROD team