Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Agenda-13-06-2016

From EGIWiki
Jump to navigation Jump to search


General information

UMD/CMD

Staged rollout updates

Preview repository

on 2016-05-17 released:

  • preview 1.2.0
    • LCMAPS-plugins-vo-ca-ap 0.0.1-1
    • STORM 1.11.11
  • Preview 2.1.0
    • NorduGrid ARC 15.03 update 6
    • LCMAPS-plugins-vo-ca-ap 0.0.1-1

Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository

Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.

Operations

EGI Operations Support activities stopped

  • Operations Support core activity has not been re-bid in the phase 2 of the EGI core activities
  • all Operations Support activities have been moved to the EGI.eu Operations
  • all the operational procedures involving operations support have been updated pointing to EGI operations. Please, let us know if we

missed to update any documents.

  • The operations support support unit in GGUS has been decommissioned. Please, use the Operations support unit instead from now on.

Monthly Availability/Reliability

Decommissioning SL5

Status and actions

  • from this week on EGI Operations will start suspending sites that host SL5 services in production and not set under downtime

NGIs argus server not properly configured

Some time ago (more than a year I think), EGI ran a campaign to have NGIs run a "NGI Argus" service. This campaign resulted in new services being added to goc-db for each NGI.

Unfortunately, as explained in the OMB in February, our monitoring is currently unable to check the deployment of these services: - For 6 services, our monitoring cannot contact the NGI Argus - For 18 services, our monitoring is not authorized to get the right information from the NGI Argus - For 1 service, our monitoring indicates that the NGI Argus is not properly configured and does not pull the rules from argus.cern.ch

In the end, only 5 services are properly configured and monitored!

The changes are rather easy:

  • If we can't contact them, the site needs to make sure that there is no firewall blocking 195.251.55.111 from accessing the argus 'pap' port
  • If we are not authorized, the site needs to add the right ACE to their argus authorization
pap-admin add-ace 'CN=srv-111.afroditi.hellasgrid.gr,OU=afroditi.hellasgrid.gr,O=HellasGrid, C=GR' 'POLICY_READ_LOCAL|POLICY_READ_REMOTE|CONFIGURATION_READ'

The current status of the infrastructure can be found:

  • In the secmon nagios (not sure you have access to this):

https://secmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_ngi.ARGUS&style=detail&sorttype=1&sortoption=3

  • On the security dashboard:

https://operations-portal.egi.eu/csiDashboard/ngi/any/tab/list/filter/monitoring/page/list?tsid=4

On the security dashboard, each NGI should have a "argus-ban" result:

  • "Ok" means ok
  • "Unknown" means that we can't contact them
  • "High" means that we are not authorized
  • "Critical" means that argus is not pull rules from argus.cern.ch

The parent ticket is https://ggus.eu/?mode=ticket_info&ticket_id=120770

2016_05_09 UPDATE pending tickets:

FedCloud status

A/R Profile March April May
improvements 2 6 5
unchanged 11 7 5
worsening 9 10 12
  • CYFRONET-CLOUD (+100%): in the old profile it fails the accounting test
  • GoeGRID (+80.7%): in the old profile it fails the cdmi test
  • TR-FC1-ULAKBIM (+47.59%): it was failing the accounting test in the old profile
  • HG-09-Okeanos-Cloud: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122012 (SOLVED, updated the cert)
    • failures with the probes:
    • eu.egi.cloud.OCCI-Context-ops: CATEGORIES CRITICAL - SSL_connect returned=1 errno=0 state=error: certificate verify failed
    • eu.egi.cloud.OCCI-VM-ops: CRITICAL - SSL connection with "https://okeanos-occi2.hellasgrid.gr:9000/" could not be established! SSL_connect
  • MK-04-FINKICLOUD unreachable
  • NCG-INGRID-PT (+26.74%): https://ggus.eu/index.php?mode=ticket_info&ticket_id=122013 (a new server are going to be put in production, decommissioning the old one)
    • failures mainly with the cloud probes:
    • eu.egi.cloud.OCCI-VM-ops (sometimes warning, sometimes critical): WARNING - "http://aurora.ncg.ingrid.pt:8787" failed to instantiate a COMPUTE instance in the given timeframe! Timeout: 300s
    • eu.egi.cloud.OpenStack-VM-ops: Critical: could not fetch flavor ID, endpoint does not correctly exposes available flavors: 110 Connection timed out
  • SCAI (-21.61%) https://ggus.eu/index.php?mode=ticket_info&ticket_id=122015 (CAs not completely updated)
    • some repeated failures with the CA probes
    • also eu.egi.cloud.OCCI-VM-ops CRITICAL - Unexpected response from https://fc.scai.fraunhofer.de:8787/! Net::HTTP::Post failed! HTTP Response status: [500] Internal Server Error : The server has either erred or is incapable of performing the requested operation.
  • UPV-GRyCAP (-24.56) https://ggus.eu/index.php?mode=ticket_info&ticket_id=122014 (SOLVED, CAs updated)
    • it is still failing the eu.egi.OCCI-IGTF probe
    • org.nagios.OCCI-TCP: 05-11-2016 17:56:27 Connection refused

AOB

Next meeting