Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:11112011"
Jump to navigation
Jump to search
Line 32: | Line 32: | ||
; NGI-DE | ; NGI-DE | ||
* BMRZ-FRANKFURT (Uni Frankfurt) | * BMRZ-FRANKFURT (Uni Frankfurt) | ||
* DESY-HH | * DESY-HH (Andreas Gellrich, Dmitri Ozerov) | ||
* DESY-ZN | * DESY-ZN | ||
* FZJuelich | * FZJuelich | ||
* Goegrid | * Goegrid | ||
* GSI | * GSI | ||
* ITWM | * ITWM (Martin Braun) | ||
* KIT (GridKa, FZK-LCG2) | * KIT (GridKa, FZK-LCG2) | ||
15-11-2011 08:00 -> 15-11-2011 18:00 | 15-11-2011 08:00 -> 15-11-2011 18:00 | ||
Line 50: | Line 50: | ||
* LRZ | * LRZ | ||
* MPI-K | * MPI-K | ||
* MPPMU | * MPPMU (Cesare Delle Fratte) | ||
- DONE completed downtime and gLite-cream update UPDATE 32 for gLite 3.2 | - DONE completed downtime and gLite-cream update UPDATE 32 for gLite 3.2 | ||
- nothing important to report | - nothing important to report | ||
Line 57: | Line 57: | ||
* Uni Bonn | * Uni Bonn | ||
* Uni Dortmund | * Uni Dortmund | ||
* Uni Dresden | * Uni Dresden (Ralph Müller-Pfefferkorn) | ||
* Uni Freiburg | * Uni Freiburg (Anton Gamel) | ||
* Uni Mainz-Maigrid | * Uni Mainz-Maigrid | ||
* Uni Siegen | * Uni Siegen | ||
* Uni Wuppertal | * Uni Wuppertal | ||
; SwiNG | ; SwiNG | ||
* CSCS | * CSCS (Miguel Gila via Email) | ||
- This week we performed severe maintenance in the cluster: we migrated the shared FS from Lustre to GPFS. So far, the performance | |||
of GPFS is way better than Lustre thanks to the usage of SSDs for the metadata. We also updated the firmware of the controllers and | |||
systems running GPFS. | |||
- Update of Argus packages to latest versions. | |||
- Update of CREAM-CEs to latest versions. | |||
- Upgraded 10 Supermicro WNs to the new AMD 16 core CPU. So, in total, we have 10x32core machines plus all the old Sun WNs. | |||
Unfortunately we were unable to boot them with the SL5.5 kernel and so we're upgrading them to SL5.7 with the latest kernel (this | |||
means recompiling infiniband and gpfs drivers). It's taking more time than expected to deploy these servers, but once done, this | |||
would mean 80 cores more with approximately the same power consumption. | |||
- Fixed some issues with the site bdii publication system (outdated publishing). | |||
Unfortunately we are still seeing issues with CREAM services (mostly scp failed transfers and problem with all pool accounts used) | |||
* PSI | * PSI | ||
* Switch | * Switch |
Revision as of 14:24, 17 November 2011
Introduction
- Minutes of last meeting
Announcements
- Meetings/conferences
- Availability/reliability statistics
https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics
Okt 2011 not yet published
Last: Sep 2011 Av:97 % Re:98 %
UNI-KARLSRUHE N/A N/A N/A 68 % 68 % Severe Lustre FS problems on the cluster over 3 weeks avoided a stable running of the storage element. https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1622
- Monitoring
Update 14. robot certificates in use now.
- Staged rollout/updates
UPDATE 32 for gLite 3.2 is now ready for production use. The priority of the updates is: Normal
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH (Andreas Gellrich, Dmitri Ozerov)
- DESY-ZN
- FZJuelich
- Goegrid
- GSI
- ITWM (Martin Braun)
- KIT (GridKa, FZK-LCG2)
15-11-2011 08:00 -> 15-11-2011 18:00 Upgrade of dCache to 1.9.12-x Affected: atlassrm-fzk.gridka.de 15-11-2011 08:00 -> 16-11-2011 13:00 ATLAS LFC migration to CERN Note: putting DE cloud offline in DDM/Panda and queues draining on Nov 14 already. Affected: All ATLAS users (Local-LFC, atlas-lfc-fzk.gridka.de). Possibly COMPASS.
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU (Cesare Delle Fratte)
- DONE completed downtime and gLite-cream update UPDATE 32 for gLite 3.2 - nothing important to report
- RWTH Aachen
- SCAI
- Uni Bonn
- Uni Dortmund
- Uni Dresden (Ralph Müller-Pfefferkorn)
- Uni Freiburg (Anton Gamel)
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS (Miguel Gila via Email)
- This week we performed severe maintenance in the cluster: we migrated the shared FS from Lustre to GPFS. So far, the performance of GPFS is way better than Lustre thanks to the usage of SSDs for the metadata. We also updated the firmware of the controllers and systems running GPFS. - Update of Argus packages to latest versions. - Update of CREAM-CEs to latest versions. - Upgraded 10 Supermicro WNs to the new AMD 16 core CPU. So, in total, we have 10x32core machines plus all the old Sun WNs. Unfortunately we were unable to boot them with the SL5.5 kernel and so we're upgrading them to SL5.7 with the latest kernel (this means recompiling infiniband and gpfs drivers). It's taking more time than expected to deploy these servers, but once done, this would mean 80 cores more with approximately the same power consumption. - Fixed some issues with the site bdii publication system (outdated publishing).
Unfortunately we are still seeing issues with CREAM services (mostly scp failed transfers and problem with all pool accounts used)
- PSI
- Switch
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
Oktober. complains about ROD metrics No. Tickets expired: 19 No. Alarms older than 72h: 0
- Any problematic tickets?
MaiGrid has very slow response time
- Handover of the ROD shift
45 07.11 13.11 Team6, CSCS/NGI_CH 46 14.11 20.11 Team1. DESY 47 21.11 27.11 Team2, FhG (SCAI) 48 28.11 04.11 Team3, KIT
- ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.