Difference between revisions of "Agenda-13-02-2017"
Line 76: | Line 76: | ||
Currently (see [[MAN02 Service intervention management]]) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled. | Currently (see [[MAN02 Service intervention management]]) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled. | ||
WLCG proposed the [https://indico.cern.ch/event/607744/contributions/2449767/subcontributions/218703/attachments/1402467/2141097/LongDowntimes-170126.pdf following modification]: | |||
* a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance | |||
* a scheduled intervention longer than 5 days must be declared at least 1 month in advance | |||
* any other intervention that don't fulfill the rules above will be considered unscheduled | |||
== Monthly Availability/Reliability == | == Monthly Availability/Reliability == |
Revision as of 17:09, 9 February 2017
General information
- the Operations meeting will be on the 2nd Monday of the month
- the EGI Operations Meeting schedule for first half of 2016 is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
UMD/CMD/Preview
- CMD-OS (OpenStack) released http://repository.egi.eu/category/os-distribution/cmd-os-1/
- Keystone-VOMS 9.0.3
- ooi 0.3.2
- gridsite 2.3.3
- Cloud BDII Information provider 0.6.12
- Xrootd in EPEL-testing ( 4.5.0) looking for sites to test it
- Update to frontier-squid-3 in UMD4
- major upgrade and it has some incompatibilities with frontier-squid-2 based versions, as detailed here: https://twiki.cern.ch/twiki/bin/view/Frontier/InstallSquid#Upgrading
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=125691
Preview repository
Released on 2017-01-19:
- Preview 1.8.0 AppDB info (sl6): dpm-dsi 1.9.11, frontier-squid 3.5.23-2.1, LFC 1.9.0
- Preview 2.8.0 AppDB info (CentOS 7): dpm-dsi 1.9.11, frontier-squid 3.5.23-2.1, LFC 1.9.0
Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.
Operations
Feedback from Helpdesk
- [2016-12-13] Services using JGlobus fail with RFC proxies from certificates from some CAs
- Affecting dCache < v2.14, BeStMan
- Services using JGlobus fail with RFC proxies having Non-Repudiation key usage flag set, e.g. those created by usual voms-proxy-init from Grid Canada certificate
- https://ggus.eu/?mode=ticket_info&ticket_id=124650
IPv6 readiness plans
Decommissioning of dCache 2.10
- start decommissioning campaign
- instruction on how to migrate
Testing of the storage accounting
As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.
More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage
List of sites available for test.
Software upgrades for OpenStack cloud RCs (TO BE UPDATED)
- keystone-VOMS and cloud-info-provider updates available, need to be installed on all OpenStack sites
- as keystone-VOMS last version is only compatible with Liberty and Mitaka, in case OpenStack is Kilo (or older) an upgrade plan of OpenStack has been asked
- according to EGI policies https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software OpenStack Kilo or older should NOT be running on the infrastructure! we are asking for discussing this point at the next OMB (October 27)
- as many sites are finding difficulties in planning upgrades against the very tight release cycle of OpenStack, please come with suggestion and reply with details in the tickets in order to shape the best (shared) proposal
- ticket campaign ONGOING for all OpenStack sites, asked to upgrade to keystone-VOMS >=8.0.3, cloud-info-provider >=0.6, and plans for the future (OpenStack version currently deployed, plans for upgrades, usual specific RC upgrade schedule), UPDATE:
- INDIGO-CATANIA-STACK and INFN-CATANIA-STACK moving to Mitaka (no plan)
- IISAS-GPUCloud Liberty
- FZJ user isolation bug fixed in Newton, not in Mitaka (investigating about a backport to Mitaka), waiting for solution
- IN2P3-IRES Mitaka
- CETA-GRID using Icehouse, planning mid-term upgrade (Newton?)
- IISAS-FedCloud Mitaka from Ubuntu 16.04 LTS installed
- BIFI upgrading to Mitaka
- SCAI upgraded to Mitaka
- INFN-PADOVA-STACK FIXED using Liberty
- IFCA-LCG2 using Liberty
- CYFRONET-CLOUD running Juno, evaluating Mitaka
- TR-FC1-ULAKBIM FIXED using Liberty
- NCG-INGRID-PT, using Mitaka, up to date
- RECAS-BARI preparing upgrade to Mitaka from Ubuntu 16.04 LTS (deadline by end of year)
- 100IT Liberty (evaluating Mitaka)
Proposal to modify the declaration of scheduled interventions
Currently (see MAN02 Service intervention management) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.
WLCG proposed the following modification:
- a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
- a scheduled intervention longer than 5 days must be declared at least 1 month in advance
- any other intervention that don't fulfill the rules above will be considered unscheduled
Monthly Availability/Reliability
- Underperformed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific GGUS 125427
- TW-NCUHEP: site-bdii unstable
- NGI_DE GGUS 125430
- UNI-SIEGEN-HEP: waiting for the fix for CREAM probe.
- NGI_NL: GGUS 123532
- BelGrid-UCL: UNKNOWN status returned by CREAM probes, waiting for the fix for CREAM probe.
- NGI_UA:
- UA-NSCMBR GGUS 125839: on nagios the ARC-CE tests are OK, on ARGO it is reported an UNKNOWN status
- AsiaPacific GGUS 125427
- Sites suspended after past A/R reports:
- TUDresden-ZIH (NGI_DE)
- Underperformed sites after 3 consecutive months and underperformed NGIs:
- WUT (NGI_PL) GGUS 126367
ARGO proposal to use GOCDB as the only source of topology information
- slides in October Operations Meeting agenda
- ARGO Proposal (September OMB)
- ARGO and GOC-DB updates from November OMB
- Timescale:
- New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints
- Then creation of a web UI view for uncertified sites in ARGO
- Uncertified sites will be asked to fill in the service endpoints information. Follow the How to add URL service endpoint information into GOC-DB
- (OPTIONAL) use the GOC-DB test instance for testing the procedure
- As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored
- By Q2 2017: support for multiple service endpoints
VAPOR
- VAPOR 2.1 released in September, it replaced GSTAT
- important for presenting the amount of computing and storage resources of the infrastructure
- new version 2.2 is about to be released:
- please test it going on the dev instance http://operations-portal.egi.eu/vapor_dev
- working on some known issues of the previous version
- each NGI should review the information provided by their sites and let us know any inconsistency: https://operations-portal.egi.eu/vapor_dev/resources/GL2ResSummary
- we need your feedback to improve the service
- report any comment into https://ggus.eu/index.php?mode=ticket_info&ticket_id=124872
AOB
Next meeting
- Feb 13th, 2016 https://indico.egi.eu/indico/event/3140/
- new calendar available until June 2017 https://indico.egi.eu/indico/category/32/