Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:14092012"

From EGIWiki
Jump to: navigation, search
(Round the sites)
(Announcements)
 
(9 intermediate revisions by 2 users not shown)
Line 8: Line 8:
  
 
* Meetings/conferences
 
* Meetings/conferences
  EGI: Technical Forum Prag next week
+
  EGI: next week there is the Technical Forum in Prague
 
* Availability/reliability statistics
 
* Availability/reliability statistics
 
  https://documents.egi.eu/public/RetrieveFile?docid=1332&version=2&filename=EGI_Aug2012.pdf
 
  https://documents.egi.eu/public/RetrieveFile?docid=1332&version=2&filename=EGI_Aug2012.pdf
  
 
  90%
 
  90%
 +
 +
Three sites did not hit the target:
 +
RWTH-Aachen 52%
 +
UNI-SIEGEN-HEP 69%
  
  RWTH-Aachen 68%
+
* Monitoring
  UNI-FREIBURG 59%
+
  update 17 tested. It seems to work fine. Next week we update our production system. Update 17 include sensors for Globus, Unicore
  UNI-SIEGEN-HEP 31%
+
  and EMI 2 WNs
 +
  soem problems wiht monitring WMSs, but should be fixed from now.
  
* Monitoring
 
update 17 tested.
 
 
* Staged rollout/updates
 
* Staged rollout/updates
 
  DN Publishing
 
  DN Publishing
 +
-----Ursprüngliche Nachricht-----
 +
Von: Operations of NGI-DE [mailto:NGI-DE-OPERATIONS@LISTSERV.DFN.DE] Im Auftrag von Dimitri Nilsen
 +
Gesendet: Dienstag, 31. Juli 2012 18:20
 +
An: NGI-DE-OPERATIONS@LISTSERV.DFN.DE
 +
Betreff: Publishing User DNs
 +
Dear Sites,
 +
according to "Grid Policy on the Handling of User-Level Job Accounting Data" sites should publish User DNs for accounting
 +
Please ensure you have publishGlobalUserName="yes" in publisher-config.xml at your apel box
 +
 +
Status of releases
 +
Sites that support WLCG VOs should update to EMI release until 1st October. At least to EMI 1. gLite releases should not longer be 
 +
supported. We at KIT are currently updating our services (WMSs, sBDIIs, CREAMs). WNs will follow. Dimitri will send around a list
 +
of versions and deadlines. 1st October will be a little bit unrealistic.
  
 
==Round the sites==
 
==Round the sites==
Line 29: Line 45:
 
* DESY-HH
 
* DESY-HH
 
* DESY-ZN
 
* DESY-ZN
* FZJuelich
+
* FZJuelich (Mathilda)
 +
ntr
 
* Goegrid
 
* Goegrid
 
* GSI
 
* GSI
* ITWM
+
* ITWM (Martin)
 
   - all WNs and CEs updated to SL6 and EMI-2/UMD-2
 
   - all WNs and CEs updated to SL6 and EMI-2/UMD-2
 
   - one SE node, APEL node, site BDII still running glite 3.2
 
   - one SE node, APEL node, site BDII still running glite 3.2
 
   - What is the status of this [https://operations-portal.egi.eu/broadcast/archive/id/725 EGI broadcast]?
 
   - What is the status of this [https://operations-portal.egi.eu/broadcast/archive/id/725 EGI broadcast]?
 
* KIT (GridKa, FZK-LCG2)
 
* KIT (GridKa, FZK-LCG2)
* KIT (Uni Karlsruhe)
+
* KIT (Uni Karlsruhe, Dimitri, Tobias)
  emi migration. plans to update WN to EMI 2. any experience whit emi-wn?
+
  emi migration. plans to update WN to EMI 2. any experience whith emi-wn?
 
* LRZ
 
* LRZ
 
* MPI-K
 
* MPI-K
* MPPMU
+
* MPPMU (Cesare)
 
  - deployement of CVMFS
 
  - deployement of CVMFS
 
* RWTH Aachen
 
* RWTH Aachen
Line 48: Line 65:
 
* Uni Dortmund
 
* Uni Dortmund
 
* Uni Dresden
 
* Uni Dresden
* Uni Freiburg
+
* Uni Freiburg (Anton)
 +
- one reason for the low performance/avail/relia: SAM test failed over some days->site was offline. This was caused by monitoring
 +
problem. Aachen had the same problem. Now it is working again.
 +
- one of the file system of one of our pools crashed, we lost 15TB of data whitch we were able to partially restore. Interesting:
 +
After this we had to put the site offline from time to time because the dataflow of the restore process was so high that jobs were
 +
blocked.-> Additional downtime to restore the files was needed.
 +
- We need a downtime at the end of the month to update dCache to 1.9.12, to instal CERNVMFS and we did a TORQUE update on CREAMS.
 +
But this TORQUE version blocked proxies. We did downgrade. For the old gLite and EMI versions there is still an old version of a
 +
TORQUE package in the repository. Recommendation from Dimitri: An email to the rollout list should be written. Next week in Prague
 +
Dimitri can also ask the people from EMI.
 +
- added some WNS
 +
- Migration to EMI 2 started, CREAM 3 in test phase
 
* Uni Mainz-Maigrid
 
* Uni Mainz-Maigrid
 
* Uni Siegen
 
* Uni Siegen
 
* Uni Wuppertal
 
* Uni Wuppertal
 
; SwiNG
 
; SwiNG
* CSCS
+
* CSCS (Paulo)
 +
- increased capacity of compute to 2200 cores
 +
- prepare maintance for next Tuesday to fix dCache pool nodes
 
* PSI
 
* PSI
 
* Switch
 
* Switch
Line 64: Line 94:
 
* Handover of the ROD shift
 
* Handover of the ROD shift
 
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
 
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
 +
* bad matrix for rod shifts last months. Problem was handling of tickets in expired state. Please handle tickets more carefully to avoid such situations.
 +
* Rotation table was updated.
  
 
==AOB==
 
==AOB==
 +
* Next meeting will be in two weeks after the Prague meeting on 28 September
 +
  
 
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.
 
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.

Latest revision as of 14:55, 28 September 2012

Operations Meeting Main

Introduction

  • Minutes of last meeting

Announcements

  • Meetings/conferences
EGI: next week there is the Technical Forum in Prague 
  • Availability/reliability statistics
https://documents.egi.eu/public/RetrieveFile?docid=1332&version=2&filename=EGI_Aug2012.pdf
90%

Three sites did not hit the target:
RWTH-Aachen 52%
UNI-SIEGEN-HEP 69%
  • Monitoring
update 17 tested. It seems to work fine. Next week we update our production system. Update 17 include sensors for Globus, Unicore 
and EMI 2 WNs
soem problems wiht monitring WMSs, but should be fixed from now.
  • Staged rollout/updates
DN Publishing
-----Ursprüngliche Nachricht-----
Von: Operations of NGI-DE [1] Im Auftrag von Dimitri Nilsen
Gesendet: Dienstag, 31. Juli 2012 18:20
An: NGI-DE-OPERATIONS@LISTSERV.DFN.DE
Betreff: Publishing User DNs
Dear Sites,
according to "Grid Policy on the Handling of User-Level Job Accounting Data" sites should publish User DNs for accounting
Please ensure you have publishGlobalUserName="yes" in publisher-config.xml at your apel box
Status of releases
Sites that support WLCG VOs should update to EMI release until 1st October. At least to EMI 1. gLite releases should not longer be  
supported. We at KIT are currently updating our services (WMSs, sBDIIs, CREAMs). WNs will follow. Dimitri will send around a list 
of versions and deadlines. 1st October will be a little bit unrealistic.

Round the sites

NGI-DE
  • BMRZ-FRANKFURT (Uni Frankfurt)
  • DESY-HH
  • DESY-ZN
  • FZJuelich (Mathilda)
ntr
  • Goegrid
  • GSI
  • ITWM (Martin)
 - all WNs and CEs updated to SL6 and EMI-2/UMD-2
 - one SE node, APEL node, site BDII still running glite 3.2
 - What is the status of this EGI broadcast?
  • KIT (GridKa, FZK-LCG2)
  • KIT (Uni Karlsruhe, Dimitri, Tobias)
emi migration. plans to update WN to EMI 2. any experience whith emi-wn?
  • LRZ
  • MPI-K
  • MPPMU (Cesare)
- deployement of CVMFS
  • RWTH Aachen
  • SCAI
  • Uni Bonn
  • Uni Dortmund
  • Uni Dresden
  • Uni Freiburg (Anton)
- one reason for the low performance/avail/relia: SAM test failed over some days->site was offline. This was caused by monitoring 
problem. Aachen had the same problem. Now it is working again.
- one of the file system of one of our pools crashed, we lost 15TB of data whitch we were able to partially restore. Interesting: 
After this we had to put the site offline from time to time because the dataflow of the restore process was so high that jobs were 
blocked.-> Additional downtime to restore the files was needed.
- We need a downtime at the end of the month to update dCache to 1.9.12, to instal CERNVMFS and we did a TORQUE update on CREAMS. 
But this TORQUE version blocked proxies. We did downgrade. For the old gLite and EMI versions there is still an old version of a 
TORQUE package in the repository. Recommendation from Dimitri: An email to the rollout list should be written. Next week in Prague 
Dimitri can also ask the people from EMI. 
- added some WNS
- Migration to EMI 2 started, CREAM 3 in test phase
  • Uni Mainz-Maigrid
  • Uni Siegen
  • Uni Wuppertal
SwiNG
  • CSCS (Paulo)
- increased capacity of compute to 2200 cores
- prepare maintance for next Tuesday to fix dCache pool nodes
  • PSI
  • Switch

Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.

Status ROD

AOB

  • Next meeting will be in two weeks after the Prague meeting on 28 September


If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.