From EGIWiki
Jump to: navigation, search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Detailed agenda: Grid Operations Meeting 22 October 2012

EVO direct link Pwd: gridops
EVO details Indico page

1. Middleware releases and staged rollout

1.1. Update on the status of EMI updates

Cristina (EMI) sent apologies for this meeting. The products listed in the previous meeting are all confirmed, and currently have successfully passed the EMI certification.

1.2. Staged Rollout

UMD 1.9.0 release date is 29 October (next week), and freeze date is today (allow until tomorrow for arrival of late staged rollout reports):

UMD 2.3.0 release date is mid November. To include several leftovers from the initial EMI2 release and updates. We aim to include also some components from this week update, as well as from IGE 3.0 release. The current status is the following:

The UNICOREX/6 is actually only the unicore nagios probe, to be tested with collaboration from the SAM/Nagios teams.

Expecting from this week EMI update, Clients (UI and WN) containing the new GFAL/lcg_utils 1.13.9, WMS 3.4 and LB 3.2.9

Other components from this update, we will try to also include in the next UMD, since most of the process can go in parallel.

2. Operational Issues

2.1 Monitoring of unsupported middleware

COD is currently opening GGUS ticket vs sites deploying unsupported gLite middleware. Tickets have been opened vs sites with critical alarms in the custom security dashboard.

2.1.1 The timeline of the process is the following
  1. Between October 8th and October 10th COD opened the first batch of tickets
    1. Some sites solved the problem generating the critical alarms in few days (upgrading or decommissioning the service), therefore the ticket was closed by COD
  2. On October 15th a new probe has been put in production to monitor unsupported CREAM services (some CREAM instances did not publish correctly the version). COD opened tickets to sites with new alarms but without a ticket in a open status.
    1. This unfortunately meant a new ticket submitted for some sites with the previous one closed.
  3. On October 19th a new probe to check WMS instances was released in production
    1. COD team will update the tickets already opened to warn site managers that there is a new problem detected in the site
  4. On October 19th the false positives caused by CONDOR installations have been removed
  5. Today the GGUS ticket template is being updated, the next tickets opened will contain more information about the workflow expected for these tickets
  6. In the coming days a probe for dCache is expected
2.1.2 Additional information about the tickets
2.1.3 Unresponsive sites

In this separate wiki page NGIs can find the Unresponsive sites on Oct 22: sites who have not answered to the ticket opened by COD.

Note: unresponsive sites are eligible for suspension after November 1st.

2.1.4 Decommissioning of lcg-ce

Reminder: Decommissioned lcg-ce instances have to be removed from GOCDB and site BDII!

2.2 Dependency problem with gridsite-apache and globus

2.2.1 gridsite-apache

The following UMD2 products:

have dependencies on gridsite-apache, while the latest update of gridsite obsoletes gridsite-apache. Both the updates repository of EMI and UMD contain the latest version of gridsite without any gridsite-apache package and breaking the lcgdm-dav-server coming with DPM and LFC.


2.2.2 globus-gass-copy-progs

The following UMD-2 packages:

Have a dependency with the globus-gass-copy-progs package which is not in the UMD repositories but it comes from EPEL. The globus-gass-copy-progs package has dependencies with other globus libraries which are part of IGE components currently released in UMD.

A recent EPEL updgrade released a new globus-gass-copy-progs package, with updated dependencies to newer libraries (released in EPEL as well), the Globus libraries in UMD are unfortunately too old. UMD repositories protect from EPEL and yum cannot download the newer libraries, and it fails.

globus-gass-copy-progs requieres globus-gass-copy(x86-64) = 8.6-1.el6 and UMD repository contains globus-gass-copy-8.4-1.el6.x86_64.


2.3 monitoring issues with WN SL6

SAM Update 17.1 fails to monitor some sites that are deploying UMD2 WN on SL6. The problems (segfault) are described in these tickets:

Emir produced a new build for the WN probe, and the patch will be made available soon for the NGI Nagios administrators. This patch fixes the problem for SL6 WN, but it makes impossible to monitor Lcg-CE and in general 32bit CEs.

2.4 new service types in gocdb in production

The following service type will be put in production in GOCDB, today:

These service types were removed from GOCDB because ATP was not able to handle them (Nagios configuration returned errors). SAM 17.1 update patches the problem, and now ATP properly handles the new service types. Please upgrade the SAM instance if your NGI is still deploying an older version.

3. AOB

3.1 UserDN publication

There are still sites not publishing the UserDN in the usage records: Missing UserDN 22 Oct 2012, (83 sites). Please, follow up with these sites to fix their APEL configuration.

3.2 Next meeting

Proposal: November 5th 2012 14:00 Amsterdam time


Minutes are available here.

Personal tools