EGI-InSPIRE:Sa1 2012-12-18
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Inspire reports menu: | Home • | SA1 weekly Reports • | SA1 Task QR Reports • | NGI QR Reports • | NGI QR User support Reports |
Progress of SA1 issues
Nothing new to report.
Milestones/Deliverables
- D4.7 Operations Sustainability: started internal review
SA1.1 Activity Management
MEETINGS
- TF-NOC meeting and contacts established with GEANT operations and EduPERT
- OMB meeting: chairing and preparation of material
- JRA1 meeting and requirement for change of availability computation
- PC meeting
- EGI-CSIRT meeting on operational implications of having central banning (according to the proposed change in the policy for service operations)
- GDB meeting: update on status and progress of obsolete middleware decommissioning
- Monday weekly meeting on coordination of support and tools for the mw upgrade campaign
- EGI Champions meeting
- contribution to TCB meeting (Tf-accounting task force, status of adoption of SSM, adoption of gridftp, advancement of actions on BDII)
ACTIVITIES
- finalization of first D4.7 draft
- handling of EGI.eu domain nameserver incident on (15-12-2012)
- handling of incident concerning GGUS (17-12-2012)
- planning of activities for testbed on central allocation of resources
- preparation activities for NGI and Global operations task sustainability, for the Evolving EGI workshop
- assessment of IGE release support plans
- preparation of document for TCB about classification of products (integrated, contributed, community) and changes around the software provisioning activities of EGI
- planning of ARC CE decommissioning campaign
- coordination of issues around DPM version monitoring for mw decommissioning
SA1.2 Security
- Monthly team meeting was held on Thursday 13th Dec.
- started defining SA1.2 detailed workplan for 2013
- Planning for best-efforts CSIRT cover during the Christmas/New Year holidays
- Participate in WLCG security meeting at FNAL (17/18 Dec)
- Continue work on Operations/Infrastructure (including security) track at ISGC 2013
- We will release an SVG advisory for a 'Low' risk issue soon as this is now fixed
- Work on a procedure for handling compromised certificates
- Organised and held meeting on central user banning (13th Dec) for presentation at OMB on 18th Dec
SA1.3 Staged rollout
- Final release candidate of UMD 2.3.1, containing:
- ige gridftp 5.2.2
- dpm and lfc 1.8.5
- dcache 2.2.5 (contains only the dcap library)
- gridsite 1.7.24
- Preparing the release UMD 2.4.0, taking into account what is left now from IGE 3.0 and the several EMI updates, as well as what should be out in the next emi2 updates of December 2012 and January 2013 updates. Components already already in Stage Rollout:
- IGE.gridway.sl5.x86_64-5.12.0
- EMI.wms.sl6.x86_64-3.4.0
SA1.3 Integration
no progress
SA1.4 Central tools
- On Saturday from 4am to 10.30 am (CET) the *egi.eu domain was unreachable. That caused - among other issues - a GOCDB outage. This should not have been caused problems directly to service monitoring, based on my information.
- Middlware monitoring
- first version of GLUE2 probe deployed on middleware monitoring instance. (https://rt.egi.eu/rt/Ticket/Display.html?id=4733)
- GLUE2 probe should start raising alarms in Dashboard from December 19th
- DPM probe improvement related to overlapping gLite/EMI versions (https://rt.egi.eu/rt/Ticket/Display.html?id=4810)
- ARC-CE version analysis (https://rt.egi.eu/rt/Ticket/Display.html?id=4768)
- Presentation of operational tools and middleware monitoring instances at the OMB.
- New InterNGI usage functionality released on Accounting Portal (https://operations-portal.egi.eu/broadcast/archive/id/840)
- Central operational tools outages
- the *egi.eu domain was unreachable on Saturday 15th from 4am to 10.30 am (CET)
- GGUS was unreachable on Monday 17th from 10:40 to 17:20 (CET) due to network outage: The network failure yesterday (Monday, 17.12.2012) was caused by two independent, almost simultaneously occurring faults in the network of the KIT. Due to the interaction this resulted in a very unclear picture about the real reasons. On North Campus, there was a hardware failure in one of the core backbone router, the redundant hardware part rebooted completely
unexpected and without any event without any configuration. On South Campus, there was another fault that was caused by the network in a building. Because of the large impact the localization of the cause was very difficult. The causing network components in the wiring closets have been replaced in the afternoon of Dec 17th.
- GOCDB and APEL were unreachable on Tuesday 18th from 07:50 to 11:00 (CET) due to network outage (
SA1.5 Accounting
Repository
Portal
SA1.6 Helpdesk
- Shopping list meeting to prioritise requests for GGUS
- Implementation and maintenance work on GGUS including report generator
- Migration of GGUS mail boxes to new infrastructure, see https://rt.egi.eu/rt/Ticket/Display.html?id=4700
- GGUS release
SA1.7 Support
Software Support
Network Support
SA1.8 Availability and core services
- A/R Recomputation requests handling
- GGUS 89418 Informing sam nagios about suspended sites on the reports
- Received final A/R reports from Sam Nagios SU. Communication with them according the removal of the test profile name from the title.
- Communication with Sam Nagios SU, regarding some issues in the reports.
- Issues with VOMS registration procedure (regarding Dteam VO migration) sent to VOMS development team
- Setup of EGI Catch All CA Registration Authority in Nigeria
- Changeover of EGI Catch ALL CA Registration Authority in Tirana, Albania
- Issue in certification infrastructure resolved
Documentation
- new subforums for new NGIs and sites :
- final version of EGI OLA https://documents.egi.eu/document/1093
- reorganization of Middleware wiki page