Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:13042012"
Jump to navigation
Jump to search
imported>Tkoenig |
|
(No difference)
|
Revision as of 10:13, 23 April 2012
Introduction
- Minutes of last meeting
Announcements
- Meetings/conferences
- Availability/reliability statistics
https://documents.egi.eu/public/ShowDocument?docid=1091
BDII: 92%. but see: https://ggus.eu/ws/ticket_info.php?ticket=81094 (caused by downtime of NGI-DE Nagios, recalculation of values was not done)
NGI_DE: A:92 % R:96 %. first green month.
- Monitoring
ntr
- Staged rollout/updates
sites with CREAM CEs: enabling glexec in GOC-DB
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH
- DESY-ZN
- FZJuelich (Mathilda Romberg)
ntr
- Goegrid
- GSI
- ITWM
- KIT (GridKa, FZK-LCG2)
auger SoftwareManager role.
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU
- RWTH Aachen
- SCAI (Andre Gemuend)
ntr
- Uni Bonn
- Uni Dortmund
- Uni Dresden
- Uni Freiburg
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS (via Email)
All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. We were unable to recover it and today we have rebuilt it from scratch: ARC is still not working, but all gLite/EGI services are up and running. This was an unscheduled downtime that will certainly affect A&R of this month.
Next week the grid cluster at CSCS enters a scheduled downtime. We will move the hardware from the old building to the new datacentre and introduce major changes in the infrastructure: new WNs to replace old Sun Blades and new network design, we'll move from hybrid ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks.
- PSI
- Switch (Alessandro Ussai)
- not much to report, we decommissioned in 2011 our site at SWITCH - we only have a giis which is run for the ARC sites in NGI_CH this is why we are not attending regularly the op meeting anymore as we don't have resources we will attend sporadically within the monitoring tasks though, when necessary
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
Again tickets for ROD.
- Any problematic tickets? Ticket from central COD about alarms that are older than 72 hours. Situation unclear: Who take action? ROD shifter checked the dashboard but alarm dissapeared. Strange behaviour of the dashboard. There was a thread via our email list. We/Dimitri/KIT will report this in the escalated tickets.
- We handle our tickets (user tickets in the NGI-DE helpdesk) really softly. We have to think about escalation procedures/escalation table with expiration dates dependent on the priority of ticket.
- Handover of the ROD shift
- ROD shift schedule was updated from Dimitri: https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.