Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:13042012"
Jump to navigation
Jump to search
(7 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
https://documents.egi.eu/public/ShowDocument?docid=1091 | https://documents.egi.eu/public/ShowDocument?docid=1091 | ||
BDII: 92%. but see: https://ggus.eu/ws/ticket_info.php?ticket=81094 | BDII: 92%. but see: https://ggus.eu/ws/ticket_info.php?ticket=81094 (caused by downtime of NGI-DE Nagios, recalculation of | ||
values was not done) | |||
NGI_DE: A:92 % R:96 %. first green month. | NGI_DE: A:92 % R:96 %. first green month. | ||
* Monitoring | * Monitoring | ||
ntr | |||
* Staged rollout/updates | * Staged rollout/updates | ||
sites with CREAM CEs: enabling glexec in GOC-DB | |||
==Round the sites== | ==Round the sites== | ||
Line 26: | Line 30: | ||
* DESY-HH | * DESY-HH | ||
* DESY-ZN | * DESY-ZN | ||
* FZJuelich | * FZJuelich (Mathilda Romberg) | ||
ntr | |||
* Goegrid | * Goegrid | ||
* GSI | * GSI | ||
Line 37: | Line 42: | ||
* MPPMU | * MPPMU | ||
* RWTH Aachen | * RWTH Aachen | ||
* SCAI | * SCAI (Andre Gemuend) | ||
ntr | |||
* Uni Bonn | * Uni Bonn | ||
* Uni Dortmund | * Uni Dortmund | ||
Line 46: | Line 52: | ||
* Uni Wuppertal | * Uni Wuppertal | ||
; SwiNG | ; SwiNG | ||
* CSCS | * CSCS (via Email) | ||
All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. | All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. | ||
We were unable to recover it and today we have rebuilt it from scratch: | We were unable to recover it and today we have rebuilt it from scratch: | ||
Line 57: | Line 63: | ||
old Sun Blades and new network design, we'll move from hybrid | old Sun Blades and new network design, we'll move from hybrid | ||
ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks. | ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks. | ||
* PSI | * PSI | ||
* Switch | * Switch (Alessandro Ussai) | ||
- not much to report, we decommissioned in 2011 our site at SWITCH | |||
- we only have a giis which is run for the ARC sites in NGI_CH this is why we are not attending regularly the op meeting anymore | |||
as we don't have resources we will attend sporadically within the monitoring tasks though, when necessary | |||
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed. | Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed. | ||
Line 68: | Line 75: | ||
Again tickets for ROD. | Again tickets for ROD. | ||
* Any problematic tickets? | * Any problematic tickets? Ticket from central COD about alarms that are older than 72 hours. Situation unclear: Who take action? ROD shifter checked the dashboard but alarm dissapeared. Strange behaviour of the dashboard. There was a thread via our email list. We/Dimitri/KIT will report this in the escalated tickets. | ||
* We handle our tickets (user tickets in the NGI-DE helpdesk) really softly. We have to think about escalation procedures/escalation table with expiration dates dependent on the priority of ticket. | |||
* Handover of the ROD shift | * Handover of the ROD shift | ||
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table | * ROD shift schedule was updated from Dimitri: https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table | ||
==AOB== | ==AOB== | ||
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list. | If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list. |
Latest revision as of 10:13, 23 April 2012
Introduction
- Minutes of last meeting
Announcements
- Meetings/conferences
- Availability/reliability statistics
https://documents.egi.eu/public/ShowDocument?docid=1091
BDII: 92%. but see: https://ggus.eu/ws/ticket_info.php?ticket=81094 (caused by downtime of NGI-DE Nagios, recalculation of values was not done)
NGI_DE: A:92 % R:96 %. first green month.
- Monitoring
ntr
- Staged rollout/updates
sites with CREAM CEs: enabling glexec in GOC-DB
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH
- DESY-ZN
- FZJuelich (Mathilda Romberg)
ntr
- Goegrid
- GSI
- ITWM
- KIT (GridKa, FZK-LCG2)
auger SoftwareManager role.
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU
- RWTH Aachen
- SCAI (Andre Gemuend)
ntr
- Uni Bonn
- Uni Dortmund
- Uni Dresden
- Uni Freiburg
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS (via Email)
All has been working fine until yesterday, when a network problem caused the GPFS scratch filesystem to die. We were unable to recover it and today we have rebuilt it from scratch: ARC is still not working, but all gLite/EGI services are up and running. This was an unscheduled downtime that will certainly affect A&R of this month.
Next week the grid cluster at CSCS enters a scheduled downtime. We will move the hardware from the old building to the new datacentre and introduce major changes in the infrastructure: new WNs to replace old Sun Blades and new network design, we'll move from hybrid ethernet/infiniband to all-infiniband. The downtime should last no longer than 3 weeks.
- PSI
- Switch (Alessandro Ussai)
- not much to report, we decommissioned in 2011 our site at SWITCH - we only have a giis which is run for the ARC sites in NGI_CH this is why we are not attending regularly the op meeting anymore as we don't have resources we will attend sporadically within the monitoring tasks though, when necessary
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
Again tickets for ROD.
- Any problematic tickets? Ticket from central COD about alarms that are older than 72 hours. Situation unclear: Who take action? ROD shifter checked the dashboard but alarm dissapeared. Strange behaviour of the dashboard. There was a thread via our email list. We/Dimitri/KIT will report this in the escalated tickets.
- We handle our tickets (user tickets in the NGI-DE helpdesk) really softly. We have to think about escalation procedures/escalation table with expiration dates dependent on the priority of ticket.
- Handover of the ROD shift
- ROD shift schedule was updated from Dimitri: https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.