Difference between revisions of "Agenda-12-06-2017"

From EGIWiki
Jump to navigation Jump to search
Line 117: Line 117:
|-
|-
| INFN-T1  
| INFN-T1  
| storm-fe-lhcb.cr.cnaf.infn.it, storm-fe.cr.cnaf.infn.it, storm-fe-archive.cr.cnaf.infn.it
| removed
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326
|  
| SOLVED
|-
|-
| NCG-INGRID-PT  
| NCG-INGRID-PT  

Revision as of 08:26, 7 June 2017


General information

Middleware

CMD

UMD

Preview repository

Operations

Testing FedCloud sites

Feedback from Helpdesk

yearly review of the information registered into GOC-DB

Failures with the updated CREAM probes

After the release of the updated CREAM probes on May 4th, several sites are failing the JobCancel and/or JobPurge ones (GGUS 128151):

  • the error message is: "Received timeout while fetching results".

The main reason is that in those CEs there isn't a job slot reserved for the ops tests.

As explained in the CREAM probes wiki:

  • JobCancel: cancel an active job
    • This metric submits a job directly to the selected CREAM CE, waits until the job gain the IDLE, RUNNING or REALLY-RUNNING state and then tries to cancel it. Finally it checks if the job has been correctly cancelled.
  • JobPurge: purge a terminted job
    • This metric is analogous of cream_jobCancel.py. It submits a short job (e.g. hostname.jdl), waits its termination (e.g DONE-OK) and then it tries to purge it. Finally, in order to verify the purging operation was successfully executed, the probe checks the job status by executing the glite-ce-job-status command which just in this scenario, must fail because the job doesn't exist anymore.

They both have a timeout of 15 minutes, so if the test job is not executed by that time, the probes return a failure. Please assign the ops jobs an higher priority and reserve them 1 job slot, they only require few seconds for being executed.

These failures didn't occur before May 4th because in the first version of the probes the returned status was "UNKNOWN" instead of the most proper one "CRITICAL".

List of failing CREAM-CEs from nagios (not all of them are affected by this problem):

  • 45 CREAM-CEs affected (13% of the total ones)
  • the sites can ask a recomputation of the May statistics

Monthly Availability/Reliability

Proposal to modify the declaration of scheduled interventions

Decommissioning EMI WMS

As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.

NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing

Moderate usage by few VOs:

  • NGI_CZ: eli-beams.eu
  • NGI_GRNET: see
  • NGI_IT: calet.org, compchem, theophys, virgo
  • NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
  • NGI_UK: mice, t2k.org

EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:

  • compchem is already testing DIRAC
  • calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
  • mice: enabled on the GridPP DIRAC server

We need the VO feedback for better defining technical details and timeline:

  • NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.

WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:

  • WMS will be removed from production starting from 1st January 2018.
    • VOs have 8 months to find alternatives or migrate to DIRAC
  • Considering that this is not an update, the decommission can be performed in few weeks.

IPv6 readiness plans

    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10 and 2.13 (to modify)

  • support for the dCache 2.10 ended at December 2016, tickets opened by EGI Operations to track decommissioning
  • dCache 2.13 decommissioning procedure started, in June the probes will get CRITICAL, support from dCache ends in July, upgrades to be performed by August
  • please upgrade to 2.16, whose support ends on May 2018, or to 3.0
    • take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.13 instances and follow up with the NGIs/sites at the beginning of August

Testing the new webdav probes

Site Host GGUSID note
CYFRONET-LCG2 se01.grid.cyfronet.pl https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325 SOLVED
GRIF node12.datagrid.cea.fr https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329
IGI-BOLOGNA darkstorm.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930 SOLVED
INFN-T1 removed https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326 SOLVED
NCG-INGRID-PT gftp01.ncg.ingrid.pt https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327 SOLVED
UKI-NORTHGRID-LIV-HEP hepgrid11.ph.liv.ac.uk https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328 SOLVED
egee.irb.hr lorienmaster.irb.hr

Missing steps:

AOB

Next meeting