Difference between revisions of "EGI-InSPIRE:Sa1 2012-12-18"

From EGIWiki
Jump to: navigation, search
(SA1.2 Security)
 
(23 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Template:Op menubar}}  
+
{{Template:EGI-Inspire menubar}}
 +
 
 
{{Template:Inspire_reports_menubar}}
 
{{Template:Inspire_reports_menubar}}
 
{{TOC_right}}  
 
{{TOC_right}}  
[[Category:SA1 weekly report]]
+
 
  
 
=Progress of SA1 issues=  
 
=Progress of SA1 issues=  
Line 10: Line 11:
 
=Milestones/Deliverables=
 
=Milestones/Deliverables=
 
<!-- T. Ferrari -->
 
<!-- T. Ferrari -->
* D4.7 Operations Sustainability.
+
* D4.7 Operations Sustainability: started internal review
  
 
=SA1.1 Activity Management=  
 
=SA1.1 Activity Management=  
Line 16: Line 17:
  
 
MEETINGS
 
MEETINGS
*
+
* TF-NOC meeting and contacts established with GEANT operations and EduPERT
 +
* OMB meeting: chairing and preparation of material
 +
* JRA1 meeting and requirement for change of availability computation
 +
* PC meeting
 +
* EGI-CSIRT meeting on operational implications of having central banning (according to the proposed change in the policy for service operations)
 +
* GDB meeting: update on status and progress of obsolete middleware decommissioning
 +
* Monday weekly meeting on coordination of support and tools for the mw upgrade campaign
 +
* EGI Champions meeting
 +
* contribution to TCB meeting (Tf-accounting task force, status of adoption of SSM, adoption of gridftp, advancement of actions on BDII)
  
 
ACTIVITIES
 
ACTIVITIES
*  
+
* finalization of first D4.7 draft
 +
* handling of EGI.eu domain nameserver incident on (15-12-2012)
 +
* handling of incident concerning GGUS (17-12-2012)
 +
* planning of activities for testbed on central allocation of resources
 +
* preparation activities for NGI and Global operations task sustainability, for the Evolving EGI workshop
 +
* assessment of IGE release support plans
 +
* preparation of document for TCB about classification of products (integrated, contributed, community) and changes around the software provisioning activities of EGI
 +
* planning of ARC CE decommissioning campaign
 +
* coordination of issues around DPM version monitoring for mw decommissioning
 +
* assessment of status of VOMS upgrade, followup of sites not responding to requests of upgrades or failing to put service end points in downtime, and definition of list of sites eligible for suspension this week
 +
* preparation work for the extension of the NGI availability monthly reports
 +
* Assessment of QCG information system use cases
  
 
=SA1.2 Security=  
 
=SA1.2 Security=  
Line 30: Line 50:
 
* We will release an SVG advisory for a 'Low' risk issue soon as this is now fixed
 
* We will release an SVG advisory for a 'Low' risk issue soon as this is now fixed
 
* Work on a procedure for handling compromised certificates
 
* Work on a procedure for handling compromised certificates
* Organised meeting on central user banning
+
* Organised and held meeting on central user banning (13th Dec) for presentation at OMB on 18th Dec
  
 
= SA1.3 Staged rollout =
 
= SA1.3 Staged rollout =
Line 46: Line 66:
 
=SA1.3 Integration=
 
=SA1.3 Integration=
 
<!-- M. Krakowian from Nov 2012 -->
 
<!-- M. Krakowian from Nov 2012 -->
*
+
no progress
  
 
=SA1.4 Central tools=  
 
=SA1.4 Central tools=  
 
<!--E. Imamagic -->
 
<!--E. Imamagic -->
  
*  
+
* On Saturday from 4am to 10.30 am (CET) the *egi.eu domain was unreachable. That caused - among other issues - a GOCDB outage. This should not have been caused problems directly to service monitoring, based on my information.
 +
* Middlware monitoring
 +
** first version of GLUE2 probe deployed on middleware monitoring instance. (https://rt.egi.eu/rt/Ticket/Display.html?id=4733)
 +
** GLUE2 probe should start raising alarms in Dashboard from December 19th
 +
** DPM probe improvement related to overlapping gLite/EMI versions (https://rt.egi.eu/rt/Ticket/Display.html?id=4810)
 +
** ARC-CE version analysis (https://rt.egi.eu/rt/Ticket/Display.html?id=4768)
 +
* Presentation of operational tools and middleware monitoring instances at the OMB.
 +
* New InterNGI usage functionality released on Accounting Portal (https://operations-portal.egi.eu/broadcast/archive/id/840)
 +
* Central operational tools outages
 +
** the *egi.eu domain was unreachable on Saturday 15th from 4am to 10.30 am (CET)
 +
** GGUS was unreachable on Monday 17th from 10:40 to 17:20 (CET) due to network outage: The network failure yesterday (Monday, 17.12.2012) was caused by two  independent, almost simultaneously occurring faults in the network of  the KIT. Due to the interaction this resulted in a very unclear picture  about the real reasons. On North Campus, there was a hardware failure in one of the core  backbone router, the redundant hardware part rebooted completely
 +
unexpected and  without any event without any configuration. On South Campus, there was another fault that was caused by the network  in a building. Because of the large impact the localization of the cause  was very difficult. The causing network components in the wiring closets have been replaced in the afternoon of Dec 17th.
 +
** GOCDB and APEL were unreachable on Tuesday 18th from 07:50 to 11:00 (CET) due to network outage (
  
 
=SA1.5 Accounting=  
 
=SA1.5 Accounting=  
'''<!--A. Packer--> Repository '''
+
Stop of republishing of user DN for historical information and instructions given to site administrators
*
+
Request to move nikhef to SSM production deployment
*
 
 
 
'''<!--A. Packer--> Portal'''
 
  
 
=SA1.6 Helpdesk=  
 
=SA1.6 Helpdesk=  
Line 69: Line 98:
 
=SA1.7 Support=
 
=SA1.7 Support=
 
<!--  trompert -->
 
<!--  trompert -->
*  
+
* preparation of proposal for revision of GOCDB business logic
 +
* preparation of wg about revision of nagios probes released by EMI
  
 
== Software Support ==
 
== Software Support ==
 
<!-- A Krenek -->
 
<!-- A Krenek -->
 +
* no report received
  
 
== Network Support  ==
 
== Network Support  ==
 
<!-- Mario Reale -->
 
<!-- Mario Reale -->
 +
* no report received
  
 
=SA1.8 Availability and core services=
 
=SA1.8 Availability and core services=
<!--C. Kanellopoulos-->
+
<!--P. Korosoglou-->
*  
+
 
 +
* A/R Recomputation requests handling
 +
** GGUS 89418 Informing sam nagios about suspended sites on the reports
 +
** Received final A/R reports from Sam Nagios SU. Communication with them according the removal of the test profile name from the title.
 +
** Communication with Sam Nagios SU, regarding some issues in the reports.
 +
* Issues with VOMS registration procedure (regarding Dteam VO migration) sent to VOMS development team
 +
* Setup of EGI Catch All CA Registration Authority in Nigeria
 +
* Changeover of EGI Catch ALL CA Registration Authority in Tirana, Albania
 +
* Issue in certification infrastructure resolved
  
 
== Documentation ==
 
== Documentation ==
 
<!-- M. Krakowian -->
 
<!-- M. Krakowian -->
*  
+
* new subforums for new NGIs and sites :
 +
** sites http://go.egi.eu/NewSiteForum
 +
** NGIs http://go.egi.eu/NewNGIForum
 +
* final version of EGI OLA https://documents.egi.eu/document/1093
 +
* reorganization of [[Middleware]] wiki page
  
 
= Meetings=
 
= Meetings=
 
<!--all-->
 
<!--all-->
*
+
* Evolving EGI workshop

Latest revision as of 17:19, 6 January 2015

EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports




Progress of SA1 issues

Nothing new to report.

Milestones/Deliverables

  • D4.7 Operations Sustainability: started internal review

SA1.1 Activity Management

MEETINGS

  • TF-NOC meeting and contacts established with GEANT operations and EduPERT
  • OMB meeting: chairing and preparation of material
  • JRA1 meeting and requirement for change of availability computation
  • PC meeting
  • EGI-CSIRT meeting on operational implications of having central banning (according to the proposed change in the policy for service operations)
  • GDB meeting: update on status and progress of obsolete middleware decommissioning
  • Monday weekly meeting on coordination of support and tools for the mw upgrade campaign
  • EGI Champions meeting
  • contribution to TCB meeting (Tf-accounting task force, status of adoption of SSM, adoption of gridftp, advancement of actions on BDII)

ACTIVITIES

  • finalization of first D4.7 draft
  • handling of EGI.eu domain nameserver incident on (15-12-2012)
  • handling of incident concerning GGUS (17-12-2012)
  • planning of activities for testbed on central allocation of resources
  • preparation activities for NGI and Global operations task sustainability, for the Evolving EGI workshop
  • assessment of IGE release support plans
  • preparation of document for TCB about classification of products (integrated, contributed, community) and changes around the software provisioning activities of EGI
  • planning of ARC CE decommissioning campaign
  • coordination of issues around DPM version monitoring for mw decommissioning
  • assessment of status of VOMS upgrade, followup of sites not responding to requests of upgrades or failing to put service end points in downtime, and definition of list of sites eligible for suspension this week
  • preparation work for the extension of the NGI availability monthly reports
  • Assessment of QCG information system use cases

SA1.2 Security

  • Monthly team meeting was held on Thursday 13th Dec.
  • started defining SA1.2 detailed workplan for 2013
  • Planning for best-efforts CSIRT cover during the Christmas/New Year holidays
  • Participate in WLCG security meeting at FNAL (17/18 Dec)
  • Continue work on Operations/Infrastructure (including security) track at ISGC 2013
  • We will release an SVG advisory for a 'Low' risk issue soon as this is now fixed
  • Work on a procedure for handling compromised certificates
  • Organised and held meeting on central user banning (13th Dec) for presentation at OMB on 18th Dec

SA1.3 Staged rollout

  • Final release candidate of UMD 2.3.1, containing:
    • ige gridftp 5.2.2
    • dpm and lfc 1.8.5
    • dcache 2.2.5 (contains only the dcap library)
    • gridsite 1.7.24
  • Preparing the release UMD 2.4.0, taking into account what is left now from IGE 3.0 and the several EMI updates, as well as what should be out in the next emi2 updates of December 2012 and January 2013 updates. Components already already in Stage Rollout:
    • IGE.gridway.sl5.x86_64-5.12.0
    • EMI.wms.sl6.x86_64-3.4.0

SA1.3 Integration

no progress

SA1.4 Central tools

  • On Saturday from 4am to 10.30 am (CET) the *egi.eu domain was unreachable. That caused - among other issues - a GOCDB outage. This should not have been caused problems directly to service monitoring, based on my information.
  • Middlware monitoring
  • Presentation of operational tools and middleware monitoring instances at the OMB.
  • New InterNGI usage functionality released on Accounting Portal (https://operations-portal.egi.eu/broadcast/archive/id/840)
  • Central operational tools outages
    • the *egi.eu domain was unreachable on Saturday 15th from 4am to 10.30 am (CET)
    • GGUS was unreachable on Monday 17th from 10:40 to 17:20 (CET) due to network outage: The network failure yesterday (Monday, 17.12.2012) was caused by two independent, almost simultaneously occurring faults in the network of the KIT. Due to the interaction this resulted in a very unclear picture about the real reasons. On North Campus, there was a hardware failure in one of the core backbone router, the redundant hardware part rebooted completely

unexpected and without any event without any configuration. On South Campus, there was another fault that was caused by the network in a building. Because of the large impact the localization of the cause was very difficult. The causing network components in the wiring closets have been replaced in the afternoon of Dec 17th.

    • GOCDB and APEL were unreachable on Tuesday 18th from 07:50 to 11:00 (CET) due to network outage (

SA1.5 Accounting

Stop of republishing of user DN for historical information and instructions given to site administrators Request to move nikhef to SSM production deployment

SA1.6 Helpdesk

  • Shopping list meeting to prioritise requests for GGUS
  • Implementation and maintenance work on GGUS including report generator
  • Migration of GGUS mail boxes to new infrastructure, see https://rt.egi.eu/rt/Ticket/Display.html?id=4700
  • GGUS release

SA1.7 Support

  • preparation of proposal for revision of GOCDB business logic
  • preparation of wg about revision of nagios probes released by EMI

Software Support

  • no report received

Network Support

  • no report received

SA1.8 Availability and core services

  • A/R Recomputation requests handling
    • GGUS 89418 Informing sam nagios about suspended sites on the reports
    • Received final A/R reports from Sam Nagios SU. Communication with them according the removal of the test profile name from the title.
    • Communication with Sam Nagios SU, regarding some issues in the reports.
  • Issues with VOMS registration procedure (regarding Dteam VO migration) sent to VOMS development team
  • Setup of EGI Catch All CA Registration Authority in Nigeria
  • Changeover of EGI Catch ALL CA Registration Authority in Tirana, Albania
  • Issue in certification infrastructure resolved

Documentation

Meetings

  • Evolving EGI workshop