Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-03-12-2012"

From EGIWiki
Jump to navigation Jump to search
 
(7 intermediate revisions by 2 users not shown)
Line 64: Line 64:
The middleware services that are unsupported since the end of November will raise ''critical alarms'' on the ROD dashboard by the end of this week. The probes are ready, currently the testing is being finalized, and Operations portal team is working for their integration in the operational dashboard.  
The middleware services that are unsupported since the end of November will raise ''critical alarms'' on the ROD dashboard by the end of this week. The probes are ready, currently the testing is being finalized, and Operations portal team is working for their integration in the operational dashboard.  


ROD teams have to follow the following [https://wiki.egi.eu/wiki/PROC01#Escalation_for_operational_problem_with_unsupported_MW_at_site.C2.A0 escalation procedure], to follow up with the unsupported middleware alarms. The overall procedure for the unsupported middleware decommissioning is [[PROC16]].  
ROD teams have to follow the following [https://wiki.egi.eu/wiki/PROC01#Escalation_for_operational_problem_with_unsupported_MW_at_site.C2.A0 escalation procedure], to follow up with the unsupported middleware alarms. The overall procedure for the unsupported middleware decommissioning is [[PROC16]].
 
===== ARC unsupported middleware =====
There are still unsupported old versions of ARC in the infrastructure. They should be removed as well.


==== 2.2 Updates from DMSU  ====
==== 2.2 Updates from DMSU  ====
Line 106: Line 109:


=== 3. AOB  ===
=== 3. AOB  ===
==== 3.1 UMD documentations ====
(Alessandro Usai)


==== 3.1 Next meeting  ====
==== 3.2 Next meeting  ====


2 weeks time would be Dec 17, the day before OMB.  
2 weeks time would be Dec 17, the day before OMB.  


*We would need to skip to January 7th  
*We would need to skip to January 7th
*Intermediate proposal: '''Friday Dec 14th'''


=== 4. Minutes  ===
=== 4. Minutes  ===
 
Available in the indico page: [https://indico.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=1266 Minutes].
[[Category:Grid_Operations_Meetings]]
[[Category:Grid_Operations_Meetings]]

Latest revision as of 18:22, 7 December 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Detailed agenda: Grid Operations Meeting 03 December 2012

EVO direct link Pwd: gridops
EVO details Indico page


1. Middleware releases and staged rollout

1.1. Update on the status of EMI updates

Cristina Aiftimiei (EMI) reports on the EMI updates. Twiki page with more information

1.2. Staged Rollout

  • From EMI1 we have 3 products - to be included in the next UMD1 update:
    • VOMS java api 2.0.9 to be verified, affects at least the ui and wn
    • Gridsite 1.7.23 under staged rollout
    • LB 3.2.9 has been released in UMD2 already
  • From IGE3 there are several products:
    • To be verified:
    • See the dashboard
    • In staged rollout:
      • Globus gridftp 5.2.2
      • Gridway 5.12.0
  • From EMI2 there are several products:
    • To be verified:
      • ARC 2.0.1
      • DPM and LFC 1.8.5
      • dCache 2.2.5: in fact only the dcap library solving some problems with the gsidcap access
      • glite-MPI 1.4.0
      • VOMS 2.0.9
    • In staged rollout:
      • Gridsite 1.7.24
      • WMS 3.4.0 - see below for more on this

2. Operational Issues

2.1 Unsupported middleware update

Middleware services planned to be upgraded by end of November

There are currently (last check Dec 1st) 28 sites, who declared a plan to upgrade their services by the end of November, still with unsupported middleware, without a downtime on those services.
By today EGI Operations will open a new batch of NGI GGUS tickets, asking:

  • To open a downtime for the unsupported services by Friday COB
  • Sites with late plans (beyond November) should be already in downtime, if any of these sites have not done so they must open the downtime as soon as possible, possibly today COB
  • Sites with CLASSIC SE service types registered in GOCDB will be asked to remove those services.
Unsupported VOMS

VOMS is a critical services for the VOs, VOMS tickets status will be assessed one by one. Never the less sites deploying unsupported VOMS must provide an upgrade plans, or the technical reasons to delay the upgrade.

DPM LFC and WN

The middleware services that are unsupported since the end of November will raise critical alarms on the ROD dashboard by the end of this week. The probes are ready, currently the testing is being finalized, and Operations portal team is working for their integration in the operational dashboard.

ROD teams have to follow the following escalation procedure, to follow up with the unsupported middleware alarms. The overall procedure for the unsupported middleware decommissioning is PROC16.

ARC unsupported middleware

There are still unsupported old versions of ARC in the infrastructure. They should be removed as well.

2.2 Updates from DMSU

FTS jobs abort with "No site found for host xxx.yyy" error

Details GGUS #87929

From time to time, some FTS transfers fail with the message above. The problem was reported at CNAF, IN2P3, and GRIDKA, noticed by Atlas, CMS, and LHCb VOs. The problem is appearing and disappearing in rather short and unpredictable intervals.

Exact reasons are not yet understood, we keep investigating. Reports from sites affected by similar problem will be appreciated.

Update Nov 20: The user reports that both problem disappeared, probably fixed together.

LCMAPS-plugins-c-pep in glexec fails at RH6 based WNs

Details GGUS #88520

Due to replacement of OpenSSL with NSS in the RH6 based distributions, LCMAPS-plugins-c-pep invoked from glexec fails on talking to Argus PEP via curl.

This is a known issue, as mentioned in EMI glexec release notes however, the workaround is not described in a usable way there.

Once we make sure we understand it properly and that the fix works, it will be documented properly at UMD pages and passed to the developers to

  1. fix the documentation
  2. try to deploy the workaround automatically when NSS-poisoned system is detected

UPDATE Nov 19th: the fix is now well explained in the known issues section and it will be included in a future yaim update

WMS does not work with ARC CE 2.0.1

Details GGUS #88630, further info Condor ticket #3062

The format of jobid changed in in the ARC CE release 12. This is not recognised by Condor prior to version 7.8.3. However, current EMI-1 WMS uses Condor 7.8.0. This breaks submission from WMS to ARC CE.

The problem hence affects CMS SAM tests as well as their production jobs.

Hence updates to ARC CE 12 should be done carefully before the Condor update is available from EMI.

UPDATE Nov 26th: on a test WMS it was installed Condor 7.8.6, and the submission to ARC seemed to work fine; since this WMS isn't available any more, further deeper tests should be performed, perhaps using the EMI-TESTBED infrastructure

3. AOB

3.1 UMD documentations

(Alessandro Usai)

3.2 Next meeting

2 weeks time would be Dec 17, the day before OMB.

  • We would need to skip to January 7th

4. Minutes

Available in the indico page: Minutes.