NGI DE CH Operations Center:Operations Meeting:13042012
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Introduction
- Minutes of last meeting
Announcements
- Meetings/conferences
- Availability/reliability statistics
https://documents.egi.eu/public/ShowDocument?docid=1091
BDII: 92%. but see: https://ggus.eu/ws/ticket_info.php?ticket=81094 (caused by downtime of NGI-DE Nagios, recalculation of values was not done)
NGI_DE: A:92 % R:96 %. first green month.
- Monitoring
ntr
- Staged rollout/updates
sites with CREAM CEs: enabling glexec in GOC-DB
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH
- DESY-ZN
- FZJuelich (Mathilda Romberg)
ntr
- Goegrid
- GSI
- ITWM
- KIT (GridKa, FZK-LCG2)
auger SoftwareManager role.
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU
- RWTH Aachen
- SCAI (Andre Gemuend)
ntr
- Uni Bonn
- Uni Dortmund
- Uni Dresden
- Uni Freiburg
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS (via Email)
All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. We were unable to recover it and today we have rebuilt it from scratch: ARC is still not working, but all gLite/EGI services are up and running. This was an unscheduled downtime that will certainly affect A&R of this month.
Next week the grid cluster at CSCS enters a scheduled downtime. We will move the hardware from the old building to the new datacentre and introduce major changes in the infrastructure: new WNs to replace old Sun Blades and new network design, we'll move from hybrid ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks.
- PSI
- Switch (Alessandro Ussai)
- not much to report, we decommissioned in 2011 our site at SWITCH - we only have a giis which is run for the ARC sites in NGI_CH this is why we are not attending regularly the op meeting anymore as we don't have resources we will attend sporadically within the monitoring tasks though, when necessary
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
Again tickets for ROD.
- Any problematic tickets? Ticket from central COD about alarms that are older than 72 hours. Situation unclear: Who take action? ROD shifter checked the dashboard but alarm dissapeared. Strange behaviour of the dashboard. There was a thread via our email list. We/Dimitri/KIT will report this in the escalated tickets.
- We handle our tickets (user tickets in the NGI-DE helpdesk) really softly. We have to think about escalation procedures/escalation table with expiration dates dependent on the priority of ticket.
- Handover of the ROD shift
- ROD shift schedule was updated from Dimitri: https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.