Difference between revisions of "EGI-InSPIRE:Sa1 2012-12-18"
Jump to navigation
Jump to search
(24 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{Template: | {{Template:EGI-Inspire menubar}} | ||
{{Template:Inspire_reports_menubar}} | {{Template:Inspire_reports_menubar}} | ||
{{TOC_right}} | {{TOC_right}} | ||
=Progress of SA1 issues= | =Progress of SA1 issues= | ||
Line 10: | Line 11: | ||
=Milestones/Deliverables= | =Milestones/Deliverables= | ||
<!-- T. Ferrari --> | <!-- T. Ferrari --> | ||
* D4.7 Operations Sustainability | * D4.7 Operations Sustainability: started internal review | ||
=SA1.1 Activity Management= | =SA1.1 Activity Management= | ||
Line 16: | Line 17: | ||
MEETINGS | MEETINGS | ||
* | * TF-NOC meeting and contacts established with GEANT operations and EduPERT | ||
* OMB meeting: chairing and preparation of material | |||
* JRA1 meeting and requirement for change of availability computation | |||
* PC meeting | |||
* EGI-CSIRT meeting on operational implications of having central banning (according to the proposed change in the policy for service operations) | |||
* GDB meeting: update on status and progress of obsolete middleware decommissioning | |||
* Monday weekly meeting on coordination of support and tools for the mw upgrade campaign | |||
* EGI Champions meeting | |||
* contribution to TCB meeting (Tf-accounting task force, status of adoption of SSM, adoption of gridftp, advancement of actions on BDII) | |||
ACTIVITIES | ACTIVITIES | ||
* | * finalization of first D4.7 draft | ||
* handling of EGI.eu domain nameserver incident on (15-12-2012) | |||
* handling of incident concerning GGUS (17-12-2012) | |||
* planning of activities for testbed on central allocation of resources | |||
* preparation activities for NGI and Global operations task sustainability, for the Evolving EGI workshop | |||
* assessment of IGE release support plans | |||
* preparation of document for TCB about classification of products (integrated, contributed, community) and changes around the software provisioning activities of EGI | |||
* planning of ARC CE decommissioning campaign | |||
* coordination of issues around DPM version monitoring for mw decommissioning | |||
* assessment of status of VOMS upgrade, followup of sites not responding to requests of upgrades or failing to put service end points in downtime, and definition of list of sites eligible for suspension this week | |||
* preparation work for the extension of the NGI availability monthly reports | |||
* Assessment of QCG information system use cases | |||
=SA1.2 Security= | =SA1.2 Security= | ||
<!-- D. Kelsey --> | <!-- D. Kelsey --> | ||
* | * Monthly team meeting was held on Thursday 13th Dec. | ||
* started defining SA1.2 detailed workplan for 2013 | |||
* Planning for best-efforts CSIRT cover during the Christmas/New Year holidays | |||
* Participate in WLCG security meeting at FNAL (17/18 Dec) | |||
* Continue work on Operations/Infrastructure (including security) track at ISGC 2013 | |||
* We will release an SVG advisory for a 'Low' risk issue soon as this is now fixed | |||
* Work on a procedure for handling compromised certificates | |||
* Organised and held meeting on central user banning (13th Dec) for presentation at OMB on 18th Dec | |||
= SA1.3 Staged rollout = | = SA1.3 Staged rollout = | ||
Line 39: | Line 66: | ||
=SA1.3 Integration= | =SA1.3 Integration= | ||
<!-- M. Krakowian from Nov 2012 --> | <!-- M. Krakowian from Nov 2012 --> | ||
no progress | |||
=SA1.4 Central tools= | =SA1.4 Central tools= | ||
<!--E. Imamagic --> | <!--E. Imamagic --> | ||
* | * On Saturday from 4am to 10.30 am (CET) the *egi.eu domain was unreachable. That caused - among other issues - a GOCDB outage. This should not have been caused problems directly to service monitoring, based on my information. | ||
* Middlware monitoring | |||
** first version of GLUE2 probe deployed on middleware monitoring instance. (https://rt.egi.eu/rt/Ticket/Display.html?id=4733) | |||
** GLUE2 probe should start raising alarms in Dashboard from December 19th | |||
** DPM probe improvement related to overlapping gLite/EMI versions (https://rt.egi.eu/rt/Ticket/Display.html?id=4810) | |||
** ARC-CE version analysis (https://rt.egi.eu/rt/Ticket/Display.html?id=4768) | |||
* Presentation of operational tools and middleware monitoring instances at the OMB. | |||
* New InterNGI usage functionality released on Accounting Portal (https://operations-portal.egi.eu/broadcast/archive/id/840) | |||
* Central operational tools outages | |||
** the *egi.eu domain was unreachable on Saturday 15th from 4am to 10.30 am (CET) | |||
** GGUS was unreachable on Monday 17th from 10:40 to 17:20 (CET) due to network outage: The network failure yesterday (Monday, 17.12.2012) was caused by two independent, almost simultaneously occurring faults in the network of the KIT. Due to the interaction this resulted in a very unclear picture about the real reasons. On North Campus, there was a hardware failure in one of the core backbone router, the redundant hardware part rebooted completely | |||
unexpected and without any event without any configuration. On South Campus, there was another fault that was caused by the network in a building. Because of the large impact the localization of the cause was very difficult. The causing network components in the wiring closets have been replaced in the afternoon of Dec 17th. | |||
** GOCDB and APEL were unreachable on Tuesday 18th from 07:50 to 11:00 (CET) due to network outage ( | |||
=SA1.5 Accounting= | =SA1.5 Accounting= | ||
Stop of republishing of user DN for historical information and instructions given to site administrators | |||
Request to move nikhef to SSM production deployment | |||
=SA1.6 Helpdesk= | =SA1.6 Helpdesk= | ||
Line 62: | Line 98: | ||
=SA1.7 Support= | =SA1.7 Support= | ||
<!-- trompert --> | <!-- trompert --> | ||
* | * preparation of proposal for revision of GOCDB business logic | ||
* preparation of wg about revision of nagios probes released by EMI | |||
== Software Support == | == Software Support == | ||
<!-- A Krenek --> | <!-- A Krenek --> | ||
* no report received | |||
== Network Support == | == Network Support == | ||
<!-- Mario Reale --> | <!-- Mario Reale --> | ||
* no report received | |||
=SA1.8 Availability and core services= | =SA1.8 Availability and core services= | ||
<!-- | <!--P. Korosoglou--> | ||
* | |||
* A/R Recomputation requests handling | |||
** GGUS 89418 Informing sam nagios about suspended sites on the reports | |||
** Received final A/R reports from Sam Nagios SU. Communication with them according the removal of the test profile name from the title. | |||
** Communication with Sam Nagios SU, regarding some issues in the reports. | |||
* Issues with VOMS registration procedure (regarding Dteam VO migration) sent to VOMS development team | |||
* Setup of EGI Catch All CA Registration Authority in Nigeria | |||
* Changeover of EGI Catch ALL CA Registration Authority in Tirana, Albania | |||
* Issue in certification infrastructure resolved | |||
== Documentation == | == Documentation == | ||
<!-- M. Krakowian --> | <!-- M. Krakowian --> | ||
* | * new subforums for new NGIs and sites : | ||
** sites http://go.egi.eu/NewSiteForum | |||
** NGIs http://go.egi.eu/NewNGIForum | |||
* final version of EGI OLA https://documents.egi.eu/document/1093 | |||
* reorganization of [[Middleware]] wiki page | |||
= Meetings= | = Meetings= | ||
<!--all--> | <!--all--> | ||
* | * Evolving EGI workshop |
Latest revision as of 18:19, 6 January 2015
EGI Inspire Main page |
Inspire reports menu: | Home • | SA1 weekly Reports • | SA1 Task QR Reports • | NGI QR Reports • | NGI QR User support Reports |
Progress of SA1 issues
Nothing new to report.
Milestones/Deliverables
- D4.7 Operations Sustainability: started internal review
SA1.1 Activity Management
MEETINGS
- TF-NOC meeting and contacts established with GEANT operations and EduPERT
- OMB meeting: chairing and preparation of material
- JRA1 meeting and requirement for change of availability computation
- PC meeting
- EGI-CSIRT meeting on operational implications of having central banning (according to the proposed change in the policy for service operations)
- GDB meeting: update on status and progress of obsolete middleware decommissioning
- Monday weekly meeting on coordination of support and tools for the mw upgrade campaign
- EGI Champions meeting
- contribution to TCB meeting (Tf-accounting task force, status of adoption of SSM, adoption of gridftp, advancement of actions on BDII)
ACTIVITIES
- finalization of first D4.7 draft
- handling of EGI.eu domain nameserver incident on (15-12-2012)
- handling of incident concerning GGUS (17-12-2012)
- planning of activities for testbed on central allocation of resources
- preparation activities for NGI and Global operations task sustainability, for the Evolving EGI workshop
- assessment of IGE release support plans
- preparation of document for TCB about classification of products (integrated, contributed, community) and changes around the software provisioning activities of EGI
- planning of ARC CE decommissioning campaign
- coordination of issues around DPM version monitoring for mw decommissioning
- assessment of status of VOMS upgrade, followup of sites not responding to requests of upgrades or failing to put service end points in downtime, and definition of list of sites eligible for suspension this week
- preparation work for the extension of the NGI availability monthly reports
- Assessment of QCG information system use cases
SA1.2 Security
- Monthly team meeting was held on Thursday 13th Dec.
- started defining SA1.2 detailed workplan for 2013
- Planning for best-efforts CSIRT cover during the Christmas/New Year holidays
- Participate in WLCG security meeting at FNAL (17/18 Dec)
- Continue work on Operations/Infrastructure (including security) track at ISGC 2013
- We will release an SVG advisory for a 'Low' risk issue soon as this is now fixed
- Work on a procedure for handling compromised certificates
- Organised and held meeting on central user banning (13th Dec) for presentation at OMB on 18th Dec
SA1.3 Staged rollout
- Final release candidate of UMD 2.3.1, containing:
- ige gridftp 5.2.2
- dpm and lfc 1.8.5
- dcache 2.2.5 (contains only the dcap library)
- gridsite 1.7.24
- Preparing the release UMD 2.4.0, taking into account what is left now from IGE 3.0 and the several EMI updates, as well as what should be out in the next emi2 updates of December 2012 and January 2013 updates. Components already already in Stage Rollout:
- IGE.gridway.sl5.x86_64-5.12.0
- EMI.wms.sl6.x86_64-3.4.0
SA1.3 Integration
no progress
SA1.4 Central tools
- On Saturday from 4am to 10.30 am (CET) the *egi.eu domain was unreachable. That caused - among other issues - a GOCDB outage. This should not have been caused problems directly to service monitoring, based on my information.
- Middlware monitoring
- first version of GLUE2 probe deployed on middleware monitoring instance. (https://rt.egi.eu/rt/Ticket/Display.html?id=4733)
- GLUE2 probe should start raising alarms in Dashboard from December 19th
- DPM probe improvement related to overlapping gLite/EMI versions (https://rt.egi.eu/rt/Ticket/Display.html?id=4810)
- ARC-CE version analysis (https://rt.egi.eu/rt/Ticket/Display.html?id=4768)
- Presentation of operational tools and middleware monitoring instances at the OMB.
- New InterNGI usage functionality released on Accounting Portal (https://operations-portal.egi.eu/broadcast/archive/id/840)
- Central operational tools outages
- the *egi.eu domain was unreachable on Saturday 15th from 4am to 10.30 am (CET)
- GGUS was unreachable on Monday 17th from 10:40 to 17:20 (CET) due to network outage: The network failure yesterday (Monday, 17.12.2012) was caused by two independent, almost simultaneously occurring faults in the network of the KIT. Due to the interaction this resulted in a very unclear picture about the real reasons. On North Campus, there was a hardware failure in one of the core backbone router, the redundant hardware part rebooted completely
unexpected and without any event without any configuration. On South Campus, there was another fault that was caused by the network in a building. Because of the large impact the localization of the cause was very difficult. The causing network components in the wiring closets have been replaced in the afternoon of Dec 17th.
- GOCDB and APEL were unreachable on Tuesday 18th from 07:50 to 11:00 (CET) due to network outage (
SA1.5 Accounting
Stop of republishing of user DN for historical information and instructions given to site administrators Request to move nikhef to SSM production deployment
SA1.6 Helpdesk
- Shopping list meeting to prioritise requests for GGUS
- Implementation and maintenance work on GGUS including report generator
- Migration of GGUS mail boxes to new infrastructure, see https://rt.egi.eu/rt/Ticket/Display.html?id=4700
- GGUS release
SA1.7 Support
- preparation of proposal for revision of GOCDB business logic
- preparation of wg about revision of nagios probes released by EMI
Software Support
- no report received
Network Support
- no report received
SA1.8 Availability and core services
- A/R Recomputation requests handling
- GGUS 89418 Informing sam nagios about suspended sites on the reports
- Received final A/R reports from Sam Nagios SU. Communication with them according the removal of the test profile name from the title.
- Communication with Sam Nagios SU, regarding some issues in the reports.
- Issues with VOMS registration procedure (regarding Dteam VO migration) sent to VOMS development team
- Setup of EGI Catch All CA Registration Authority in Nigeria
- Changeover of EGI Catch ALL CA Registration Authority in Tirana, Albania
- Issue in certification infrastructure resolved
Documentation
- new subforums for new NGIs and sites :
- final version of EGI OLA https://documents.egi.eu/document/1093
- reorganization of Middleware wiki page
Meetings
- Evolving EGI workshop