Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-10-10-2016"

From EGIWiki
Jump to navigation Jump to search
Line 43: Line 43:
= Operations =
= Operations =


== EGI central monitoring instance (ARGO) ==
* keystone-VOMS update available https://ggus.eu/?mode=ticket_info&ticket_id=124217
 
Since July 1st, the EGI infrastructure is being monitored by two monitoring instances that can be found on these addresses:
 
https://argo-mon.egi.eu/nagios
https://argo-mon2.egi.eu/nagios
 
Both instances are running the same set of tests and results provided are equivalent.
 
Starting from the same date, the central ARGO Web UI (http://argo.egi.eu/lavoisier ) provides information from these two instances and the Operations Portal was reconfigured to raise alarms based on information from ARGO central instances.
 
Results coming from NGI SAM instances are no longer consumed by the central ARGO or Operations Portal so NGIs can eventually decommission them following the standard decommissioning procedures (https://wiki.egi.eu/wiki/PROC12 ).
 
The FedCloud sites will be monitored by the new system starting from Aug 1st.
 
== New set of CREAM probes ==
 
A new set of probes is being used for monitoring the CREAM CEs and the A/R computation: https://wiki.italiangrid.it/twiki/bin/view/CREAM/DjsCreamProbeNew
This set of probe doesn't make use of the BDII, WMS and the messaging infrastructure like instead did the old WN monitoring framework.
 
== RFC proxy will be default ==
 
* moving to RFC proxy instead of legacy proxy
* in production since a while, everybody is using RFC
* we will ask VOMS TP to make a little modification on VOMS client, changing the default
 
== New configuration for DTEAM VO ==
 
The HellasGrid Certification Authority changed its DN from "/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006" to '''"/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016"'''
 
Since it is also changed the certificate of the 2 voms servers hosting dteam VO, the settings of this VO need to be updated accordingly on *ALL THE (grid and cloud) SERVICES*
 
- New yaim settings (for the ''../vo.d/dteam'' file):
 
VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'
VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'"
VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016'"
 
- .lsc files:
 
# cat /etc/grid-security/vomsdir/dteam/voms.hellasgrid.gr.lsc
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016
 
# cat /etc/grid-security/vomsdir/dteam/voms2.hellasgrid.gr.lsc
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016
 
- configuration information:
 
https://voms.hellasgrid.gr:8443/voms/dteam/configuration/configuration.action
https://voms2.hellasgrid.gr:8443/voms/dteam/configuration/configuration.action
 


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
A/R report on ARGO: http://argo.egi.eu/lavoisier/ngi_reports?accept=html
List of the underperforming RCs for (at least) 3 consecutive months:
* AfricaArabia https://ggus.eu/?mode=ticket_info&ticket_id=117094: main problems with the monitoring system, waiting for the release of the central one
** ASRT
** EG-ZC-T3: unresponsive since too months, must be suspended
** ZA-UJ
* AsiaPacific: (since February) https://ggus.eu/index.php?mode=ticket_info&ticket_id=121222
** KR-UOS-SSCC
** MY-UPM-BIRUNI-01
* CERN: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122596 SRM issues
* NGI_DE: https://ggus.eu/?mode=ticket_info&ticket_id=121975
** UNI-SIEGEN-HEP: flapping behaviour
* NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122599 (SOLVED)
** UB-LCG2 site suspended
'* '''NGI_MARGI https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 no monitoring data since January'''
* NGI_NL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122607 (SOLVED)
** EENet: site suspended
* NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122611 (SOLVED) monitoring data missing for:
** ICM
** IFJ-PAN-BG
* NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122612
** RO-14-ITIM miscellaneous issues, site is recovering
* NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122613 (SOLVED)
** UA_KNU: many power cuts
* Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122615
** Ru-Troitsk-INR-LCG2


== Decommissioning SL5 ==
== Decommissioning SL5 ==
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki]
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki]
* Sites still deploying unsupported service end-points risk suspension, unless documented technical reasons prevent a Site Admin from updating these end-points https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software#Escalation_phase see step 7
* Sites still deploying unsupported service end-points risk suspension, unless documented technical reasons prevent a Site Admin from updating these end-points https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software#Escalation_phase see step 7
* '''Status https://wiki.egi.eu/wiki/SL5_retirement#2016-06-13_Overall_status''' reported below.
* tickets track specific status on GGUS
* '''from this week on EGI Operations can suspend sites that host SL5 services in production and not set under downtime'''
** tickets will be opened
 
=== Status and actions (Jul 14th)===
* 1 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_Top-BDII&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 Top-BDII] bdii.hpgcc.finki.ukim.mk
* 11 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_Site-BDII&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 Site-BDII] apel.indiacms.res.in arc.hpc-mhi.org ce-enmr.chemie.uni-frankfurt.de ce.hpgcc.finki.ukim.mk ce01.grid.etf.rtu.lv glite-bdii.scai.fraunhofer.de is.biruni.upm.my sbdii.grid.uni-sofia.bg uagrid.org.ua uosaf0006.sscc.uos.ac.kr west.icmp.lviv.ua
* 4 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_MyProxy&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 MyProxy] kek2-px.cc.kek.jp myproxy.cat.cbpf.br wipp-rb.weizmann.ac.il
* 7 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_WMS&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 WMS] and 6 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_LB&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 LB] glite-wms.scai.fraunhofer.de graspol.nikhef.nl graszode.nikhef.nl graskant.nikhef.nl grasveld.nikhef.nl lb.biruni.upm.my kek2-wms.cc.kek.jp kek2-lb.cc.kek.jp mb-enmr.chemie.uni-frankfurt.de marwmsmr.in2p3.fr (downtime)
* 1 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_VOMS&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 VOMS] glite-io.scai.fraunhofer.de
* 1 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_emi.ARGUS&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 ARGUS] argus.indiacms.res.in
* 7 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_CREAM-CE&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 CREAM-CE] ce-enmr.chemie.uni-frankfurt.de ce.hpgcc.finki.ukim.mk ce2.particles.ipm.ac.ir glite-cream.scai.fraunhofer.de haitham.biruni.upm.my kek2-ce01.cc.kek.jp razi.biruni.upm.my
* 0 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_QCG.Computing&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 QCG Computing]
* 1 [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_SRM&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 STORM] grid-se2.pr.infn.it (downtime)


=== RCs about to be suspended ===
=== RCs about to be suspended ===
{| border=1
| '''Site'''
| '''Hostname'''
| '''Service'''
| '''Downtime'''
| '''Ticket'''
| '''Note'''
|-
| <strike>MK-03-FINKI</strike>
| bdii.hpgcc.finki.ukim.mk, ce.hpgcc.finki.ukim.mk, se.hpgcc.finki.ukim.mk
| Top-BDII, Site-BDII/CREAM-CE, MyProxy/SRM
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122885
| upgrade by Jul 22nd. SOLVED
|-
| <strike>INDIACMS-TIFR</strike>
| apel.indiacms.res.in, argus.indiacms.res.in
| Site-BDII, ARGUS
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122886
| downtime for upgrade for these services for Saturday and Sunday ( 16th and 17th); UPGRADED
|-
| <strike>UA-MHI</strike>
| arc.hpc-mhi.org
| Site-BDII/ARC-CE
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122887
| upgrade in a couple of weeks. Jul 28th: UPGRADED
|-
| <strike>BMRZ-FRANKFURT</strike>
| ce-enmr.chemie.uni-frankfurt.de, mb-enmr.chemie.uni-frankfurt.de
| Site-BDII/CREAM-CE, WMS/LB
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122891
| the site will be suspended and decommissioned. Jul 21st: SUSPENDED. SOLVED
|-
| <strike>RTUEF</strike>
| ce01.grid.etf.rtu.lv
| Site-BDII
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122893
| Site suspended by NGI
|-
| SCAI
| glite-bdii.scai.fraunhofer.de, glite-io.scai.fraunhofer.de, glite-cream.scai.fraunhofer.de, glite-wms.scai.fraunhofer.de
| Site-BDII, VOMS, CREAM-CE, WMS/LB
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122894
| some services will be decommissioned, some service will be upgraded
|-
| <strike>MY-UPM-BIRUNI-01</strike>
| is.biruni.upm.my, haitham.biruni.upm.my and razi.biruni.upm.my, px.biruni.upm.my, lb.biruni.upm.my
| Site-BDII, CREAM-CE, MyProxy, LB
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122895
| the upgrade will last a couple of weeks. Sep 22th: site suspended for running sl5 software.
|-
| <strike>BG05-SUGrid</strike>
| sbdii.grid.uni-sofia.bg
| Site-BDII
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122896
| upgrade scheduled; UPGRADED
|-
| <strike>UA_ICYB_ARC</strike>
| uagrid.org.ua
| Site-BDII/ARC-CE
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122897
| upgrade scheduled for next week. Jul 22nd: UPGRADED
|-
| <strike>KR-UOS-SSCC</strike>
| uosaf0006.sscc.uos.ac.kr
| Site-BDII/CREAM
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122898
| will update the machine
|-
| <strike>UA_ICMP_ARC</strike>
| west.icmp.lviv.ua
| Site-BDII/ARC-CE
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122899
| planning to start upgrade next week and the corresponding downtime is scheduled starting from Sundays evening (Jul 17th 20:00) for 2 weeks. Aug 8th: DONE
|-
| <strike>IR-IPM-HEP</strike>
| ce2.particles.ipm.ac.ir
| CREAM-CE
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122900
| upgrading... Jul 20th: DONE
|-
| <strike>JP-KEK-CRC-02</strike>
| kek2-ce01.cc.kek.jp, kek2-px.cc.kek.jp, kek2-wms.cc.kek.jp, kek2-lb.cc.kek.jp
| CREAM-CE, MyProxy, WMS, LB
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122901
| will be out of service on Friday 5th Augst 2016 at 4:00 UTC. SOLVED
|-
| <strike>CBPF</strike>
| myproxy.cat.cbpf.br
| MyProxy
| yes
| https://ggus.eu/?mode=ticket_info&ticket_id=122902
| will be decommissioned together nagios, downtime created https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=21306 . SOLVED
|-
| <strike>WEIZMANN-LCG2</strike>
| wipp-rb.weizmann.ac.il
| WMS/LB/MyProxy
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122905
| site-admin on vacation; the WMS service is used only by their nagios server, they try to migrate to sl6 but found some issues in the interaction with ARGUS. SOLVED
|-
| NIKHEF-ELPROD
| graspol.nikhef.nl graszode.nikhef.nl, graskant.nikhef.nl grasveld.nikhef.nl
| WMS, LB
| No
| https://ggus.eu/?mode=ticket_info&ticket_id=122903
| they want to keep the services, ticket UNSOLVED
|-
|}


== FedCloud status ==
== FedCloud status ==

Revision as of 16:18, 7 October 2016


General information

UMD/CMD/Preview

  • UMD4/CentOS7 regular update in preparation (October release, 4.3.0)
    • UI/WN for CentOS7 -> ONGOING
    • ARGUS, DPM, lcas/lcas-lcmaps-gt4, davix, glexec, edg-mkgrid, ARC, XROOTD, GFAL2
  • please start using UMD4/SL6 or UMD4/CentOS7 instead of UMD3/SL6
    • schedule for deprecation of UMD3 under preparation (Debian not used anymore, SL5 only security fixes, SL6 available in UMD4)
    • UMD4/SL6 contains products of UMD3/SL6 which give support for the next year at least, all the unsupported products are not in UMD4/SL6 (please let us know if we are missing specific products that we might have skipped!)
  • please don't use anymore EMI3, use Preview instead!
    • EMI3 not supported anymore, new message will soon redirect from EMI3 to UMD/Preview
    • more information will soon appear on http://repository.egi.eu/

Preview repository

Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository

Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.

Operations

Monthly Availability/Reliability

Decommissioning SL5

RCs about to be suspended

FedCloud status

AOB

Next meeting