Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:13042012"
Jump to navigation
Jump to search
Line 47: | Line 47: | ||
; SwiNG | ; SwiNG | ||
* CSCS | * CSCS | ||
All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. We were unable to recover it and today we have rebuilt it from scratch: ARC is still not working, but all gLite/EGI services are up and running. This was an unscheduled downtime that will certainly affect A&R of this month. | |||
Next week the grid cluster at CSCS enters a scheduled downtime. We will move the hardware from the old building to the new datacentre and introduce major changes in the infrastructure: new WNs to replace old Sun Blades and new network design, we'll move from hybrid ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks. | |||
* PSI | * PSI | ||
* Switch | * Switch |
Revision as of 13:58, 13 April 2012
Introduction
- Minutes of last meeting
Announcements
- Meetings/conferences
- Availability/reliability statistics
https://documents.egi.eu/public/ShowDocument?docid=1091
BDII: 92%. but see: https://ggus.eu/ws/ticket_info.php?ticket=81094
NGI_DE: A:92 % R:96 %. first green month.
- Monitoring
- Staged rollout/updates
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH
- DESY-ZN
- FZJuelich
- Goegrid
- GSI
- ITWM
- KIT (GridKa, FZK-LCG2)
auger SoftwareManager role.
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU
- RWTH Aachen
- SCAI
- Uni Bonn
- Uni Dortmund
- Uni Dresden
- Uni Freiburg
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS
All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. We were unable to recover it and today we have rebuilt it from scratch: ARC is still not working, but all gLite/EGI services are up and running. This was an unscheduled downtime that will certainly affect A&R of this month. Next week the grid cluster at CSCS enters a scheduled downtime. We will move the hardware from the old building to the new datacentre and introduce major changes in the infrastructure: new WNs to replace old Sun Blades and new network design, we'll move from hybrid ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks.
- PSI
- Switch
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
Again tickets for ROD.
- Any problematic tickets?
- Handover of the ROD shift
- ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.