Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-11-04-2016"

From EGIWiki
Jump to navigation Jump to search
(Created page with "{{TOC right}} = General information = * the Operations meeting will be on the '''2nd Monday of the month''' * the EGI Operations Meeting schedule for '''first half of 2016''' ...")
 
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOC right}}  
{{TOC right}}  


'''POSTPONED ON April 18th https://wiki.egi.eu/wiki/Agenda-18-04-2016'''
= General information =
= General information =


Line 7: Line 8:


= News from URT =
= News from URT =
* A Critical bug which causes file loss.has been discovered on the DPM dmlite-shell new drain command released in DPM 1.8.10. One site in production has been affected https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Shell#Newfunctionality:Drain
** broadcast sent on March 10th
** if you have run the new drain commands at your site, contact the DPM Development team through GGUS (data consistency check is needed)
** '''DO NOT use the new drain commands''' (documented at https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Shell#Newfunctionality:Drain) and until the fixed components are released please continue to use the old dpm-drain command
** UMD 3.14.1 release notes updated


== Staged rollout updates  ==
== Staged rollout updates  ==
* frontier-squid 2.7.24.2 (centos7)
* voms-admin 3.4.1 (sl6)
* storm 1.8.10 (sl6)


== Next releases  ==
== Next releases  ==


= Preview repository =
= Preview repository =
On April 1st it was released '''[[Preview 2.0.0]]'''
The second major release of Preview was created for releasing the products available on '''CentOS 7''' and '''Scientific Linux 6''' platforms that are about to be included in UMD4.


On March 9th it was released the first update of Preview:
The products available in this first release are only for CentOS 7 platform:


* STORM 1.11.10
* ARC
* VOMS Admin server 3.4.2
* Argus
* VOMS Server 2.0.13
* dcache
* VOMS API Java 3.0.6, 3.1.0, 3.2.0
* fts3
* site-bdii
* top-bdii


see details in https://wiki.egi.eu/wiki/Preview_1.1.0
The '''Scientific Linux 6''' products will be available in one of the next updates.


Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository
Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository
Line 38: Line 34:


= Operational issues  =
= Operational issues  =
== Globus GSI clients moving to STRICT_RFC2818 by default ==
* the release of the update that will change the default name compatibility mode from "HYBRID" to "STRICT_RFC2818" is '''planned for April 1, 2016'''.
* EGI Broadcast sent in August already warning about the change, already advising "'''site managers to make sure that all the hostnames and aliases used to connect to a service are included in its host certificate Subject Alternative Name field''', at the latest by the end of the year"
* sites that could be affected by this future change are the ones running services whose clients may use globus-gssapi-gsi for authentication (CE, FTS, SRM, GridFTP, MyProxy, WMS) and using DNS aliases which are not included within the SAN (Subject Alternative Name) field of the certificate (including the host name itself)


== Aligning Fedcloud sites to the A/R procedures ==
== Aligning Fedcloud sites to the A/R procedures ==
Line 87: Line 78:
** TR-FC1-ULAKBIM (OpenStack)
** TR-FC1-ULAKBIM (OpenStack)
*** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15 '''IN PROGRESS'''
*** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15 '''IN PROGRESS'''
* New tickets opened to track issues in publishing appliances on AppDB for fedcloud.egi.eu: https://ggus.eu/index.php?mode=ticket_info&ticket_id=120010
* Issue with OCCI and fedcloud.egi.eu VO at MK-04-FINKICLOUD (NGI_MARGI): https://ggus.eu/index.php?mode=ticket_info&ticket_id=120027


=== New issues ===
=== New issues ===


* New tickets opened to track issues in publishing appliances on AppDB for fedcloud.egi.eu: https://ggus.eu/index.php?mode=ticket_info&ticket_id=120010
* Issue with OCCI and fedcloud.egi.eu VO at MK-04-FINKICLOUD (NGI_MARGI): https://ggus.eu/index.php?mode=ticket_info&ticket_id=120027


=== Actions ===
=== Actions ===
Line 107: Line 99:
** TBD: review the support units associated with FedCloud (in progress)
** TBD: review the support units associated with FedCloud (in progress)


== Decommissioning Debian ==  
== Decommissioning SL5 ==
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki]
* No checks for dCache, DPM, ARC, UNICORE --> ''' Action on NGIs/ROCs to follow up directly with sites'''
 
== NGIs argus server not properly configured ==
 
Some time ago (more than a year I think), EGI ran a campaign to have
NGIs run a "NGI Argus" service. This campaign resulted in new services
being added to goc-db for each NGI.
 
Unfortunately, as explained in the OMB in February, our monitoring is
currently unable to check the deployment of these services:
- For 6 services, our monitoring cannot contact the NGI Argus
- For 18 services, our monitoring is not authorized to get the right
information from the NGI Argus
- For 1 service, our monitoring indicates that the NGI Argus is not
properly configured and does not pull the rules from argus.cern.ch


* Debian support for squeeze (6.0) has been reached (Feb2016) https://www.debian.org/News/2016/20160212
In the end, only 5 services are properly configured and monitored!
* only one service published on BDII and production in GOCDB, but in GOCDB it is indicated as SL5.8, site is UA-MHI (NGI_UA)


<pre>
The changes are rather easy:
* If we can't contact them, the site needs to make sure that there is no firewall blocking 195.251.55.111 from accessing the argus 'pap' port
* If we are not authorized, the site needs to add the right ACE to their argus authorization
pap-admin add-ace 'CN=srv-111.afroditi.hellasgrid.gr,OU=afroditi.hellasgrid.gr,O=HellasGrid, C=GR' 'POLICY_READ_LOCAL|POLICY_READ_REMOTE|CONFIGURATION_READ'
* If the argus server is not properly configured (no rule pulled), the site has to follow http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview#NGI_Argus


dn: GlueSubClusterUniqueID=arc.hpc-mhi.org,GlueClusterUniqueID=arc.hpc-mhi.org
The '''current status''' of the infrastructure can be found:
,Mds-Vo-name=UA-MHI,Mds-Vo-name=local,o=grid
* In the secmon nagios (not sure you have access to this):
GlueHostOperatingSystemName: Debian
https://secmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_ngi.ARGUS&style=detail&sorttype=1&sortoption=3
GlueHostOperatingSystemRelease: 0
* On the security dashboard:
GlueHostOperatingSystemVersion: 0
https://operations-portal.egi.eu/csiDashboard/ngi/any/tab/list/filter/monitoring/page/list?tsid=4


</pre>
On the security dashboard, each NGI should have a "argus-ban" result:
* "Ok" means ok
* "Unknown" means that we can't contact them
* "High" means that we are not authorized
* "Critical" means that argus is not pull rules from argus.cern.ch


== Decommissioning SL5 ==
The parent ticket is https://ggus.eu/?mode=ticket_info&ticket_id=120770
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki]
* No checks for dCache, DPM, ARC, UNICORE --> ''' Action on NGIs/ROCs to follow up directly with sites'''


= AOB  =
= AOB  =
Line 135: Line 148:


* AfricaArabia https://ggus.eu/?mode=ticket_info&ticket_id=117094:
* AfricaArabia https://ggus.eu/?mode=ticket_info&ticket_id=117094:
** EG-ZC-T3: unresponsive since months, must be suspended
** DZ-01-ARN
** EG-ZC-T3: unresponsive since too months, must be suspended
** ZA-UJ
** ZA-UJ
* AsiaPacific:
 
** MY-UM-SIFIR
* AsiaPacific: (since February) https://ggus.eu/index.php?mode=ticket_info&ticket_id=120180
* NGI_DE https://ggus.eu/?mode=ticket_info&ticket_id=117099:
** MY-UM-SIFIR: network and power failure
** LRZ-LMU
 
** UNI-DORTMUND
* NGI_DE: (since February) https://ggus.eu/index.php?mode=ticket_info&ticket_id=120181
* NGI_GRNET:
** LRZ-LMU no feedback
** GR-04-FORTH-ICS
 
* NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=118846:
* NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=120577
** INFN-NAPOLI-PAMELA: in decommissioning
** FBF-Brescia-IT working for improving the behaviour
* NGI_MARGI https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 no monitoring data since January
 
* ROC_LA:
* NGI_MARGI https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 no monitoring data since January  
** UFAL: new site but the monitoring data are missing
 
* NGI_MD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=120578
** the only site MD-02-IMI was suspended in March for security reasons, asked for news
 
* ROC_LA
** UFAL: suspended by the NGI manager


== Next meeting ==
== Next meeting ==


* '''11 Apr 2016''' https://indico.egi.eu/indico/event/2738/
* '''9 May 2016''' https://indico.egi.eu/indico/event/2739/

Latest revision as of 09:58, 18 April 2016


POSTPONED ON April 18th https://wiki.egi.eu/wiki/Agenda-18-04-2016

General information

News from URT

Staged rollout updates

Next releases

Preview repository

On April 1st it was released Preview 2.0.0

The second major release of Preview was created for releasing the products available on CentOS 7 and Scientific Linux 6 platforms that are about to be included in UMD4.

The products available in this first release are only for CentOS 7 platform:

  • ARC
  • Argus
  • dcache
  • fts3
  • site-bdii
  • top-bdii

The Scientific Linux 6 products will be available in one of the next updates.

Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository

Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.

Operational issues

Aligning Fedcloud sites to the A/R procedures

  • EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
    • based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
    • sites will NOT be suspended for a/r performance at least until end of May
  • in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)

The proposed timeline is:

  • February 2016:
    • EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
    • Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
  • June 2016:
    • Starting notification of sites eligible for suspension

FedCloud status

Old issues

Grouped by NGI, please follow up with sites.

New issues

Actions

  • EGI Operations have been asked by user support to contact sites with unresolved technical problems in the support of the fedcloud.egi.eu VO since a long time
    • if issues cannot be fixed quickly, sites will be asked to remove the support to fedcloud.egi.eu
    • they will re-enable the VO support as soon as they are able to fix the issues
    • sites will be contacted directly by EGI Operations

Getting help

Decommissioning SL5

  • Tracked on SL5_retirement wiki
  • No checks for dCache, DPM, ARC, UNICORE --> Action on NGIs/ROCs to follow up directly with sites

NGIs argus server not properly configured

Some time ago (more than a year I think), EGI ran a campaign to have NGIs run a "NGI Argus" service. This campaign resulted in new services being added to goc-db for each NGI.

Unfortunately, as explained in the OMB in February, our monitoring is currently unable to check the deployment of these services: - For 6 services, our monitoring cannot contact the NGI Argus - For 18 services, our monitoring is not authorized to get the right information from the NGI Argus - For 1 service, our monitoring indicates that the NGI Argus is not properly configured and does not pull the rules from argus.cern.ch

In the end, only 5 services are properly configured and monitored!

The changes are rather easy:

  • If we can't contact them, the site needs to make sure that there is no firewall blocking 195.251.55.111 from accessing the argus 'pap' port
  • If we are not authorized, the site needs to add the right ACE to their argus authorization
pap-admin add-ace 'CN=srv-111.afroditi.hellasgrid.gr,OU=afroditi.hellasgrid.gr,O=HellasGrid, C=GR' 'POLICY_READ_LOCAL|POLICY_READ_REMOTE|CONFIGURATION_READ'

The current status of the infrastructure can be found:

  • In the secmon nagios (not sure you have access to this):

https://secmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_ngi.ARGUS&style=detail&sorttype=1&sortoption=3

  • On the security dashboard:

https://operations-portal.egi.eu/csiDashboard/ngi/any/tab/list/filter/monitoring/page/list?tsid=4

On the security dashboard, each NGI should have a "argus-ban" result:

  • "Ok" means ok
  • "Unknown" means that we can't contact them
  • "High" means that we are not authorized
  • "Critical" means that argus is not pull rules from argus.cern.ch

The parent ticket is https://ggus.eu/?mode=ticket_info&ticket_id=120770

AOB

Monthly Availability/Reliability

A/R report on ARGO: http://argo.egi.eu/lavoisier/ngi_reports?accept=html

List of the underperforming RCs for (at least) 3 consecutive months:

  • ROC_LA
    • UFAL: suspended by the NGI manager

Next meeting