NGI DE CH Operations Center:Operations Meeting:06072012

From EGIWiki
Jump to: navigation, search

Operations Meeting Main

Introduction

  • Minutes of last meeting

Announcements

  • Meetings/conferences
EGI Tech.Forum 2012 in Prague
http://tf2012.egi.eu/
  • Availability/reliability statistics
https://documents.egi.eu/public/RetrieveFile?docid=1251&version=2&filename=EGI_Jun2012.pdf
Core services: 100%
NGI_DE 95 % 95 %
all sites hit a/r :)
DESY reported we would have more than 95% for the region if KIT would publish properly number of HepSpecs/cpu. 
  • Monitoring
waiting for update 17 for NGI-DE Nagios probes
  • Staged rollout/updates
Middleware Baseline Versions for WLCG
https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions
@all: We all should try to update to the listed versions. In our region a lot of sites are running older versions.

Wrong value of site location in GOC-DB
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2386
Dimitri from KIT sent an email around. Affected are 
DESY-ZN,
DGI-REF
FZK-LCG2
GSI-LCG2
GoeGrid
MPI-K
RTWH-Aachen 
UNI-BONN
UNI-DORTMUND
ZIB 
Dimitri/KIT: Will send a survey about the baseline versions around before the Technical Forum in Prague

Round the sites

NGI-DE
  • BMRZ-FRANKFURT (Uni Frankfurt)
  • DESY-HH (Dmitri Ozerov)
- running smoothly
- next two weeks we increase # of WNs
  • DESY-ZN
  • FZJuelich
  • Goegrid
  • GSI
  • ITWM (Martin Braun)
 - running smoothly
 - Update of one CE, SE, UI and WNs to SL6 x86_64 and EMI-2
 - Staged rollout of DPM_mysql, WN +TORQUE_client for UMD-2
 - SE (DPM_mysql) without problems, looking at Known Problems and GGUS for
   other products strongly recommended
   (e. g. GGUS #82746, #82899, #83398, #83548, #83692, ...)
  • KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
 - emi update ongoing
 - We plan to have some WNs connected to Grid Engine (batch system)
  • KIT (Uni Karlsruhe)
  • LRZ
  • MPI-K
  • MPPMU (Cesare Delle Fratte)
 - running smoothly
 - decommissioning a few old pools, migrating data to new pools
 - both CREAMS (BDII service) stop to work, this is fixed now
  • RWTH Aachen
  • SCAI (andre Gemuend)
 - problems with DECH VOMS, especially with java dependancy. 
   I apologize for not sending notifications about registration problems 
   @all: Please register again, if someone is missing
  • Uni Bonn
  • Uni Dortmund
  • Uni Dresden
  • Uni Freiburg (Anton Gamel)
 - so far everything is ok
 - updated torque server to version 2.5.12. No new packages in the Apel repository were available so we did by our own 
   and all run smoothly now
  • Uni Mainz-Maigrid
  • Uni Siegen
  • Uni Wuppertal
SwiNG
  • CSCS (Paolo)
 - last Wednesday we had a maintenance:
 -- enabled CERNVMFS for ATLAS
 -- installed third CREAM for rolling update mechanism
 -- updated dCache to SL 5.7
 -- updated TORQUE to latest version 2.4.17
 - rest is fine 
  • PSI
  • Switch
 - be on vacation untill 1st of August, looking with Dimitri (KIT) for a deputy for the ROD shift.

Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.

Status ROD

  • Any problematic tickets?
MPI-K is asking for help (GGUS Ticket 83822). Test for CREAM CEs are failing. Problem is not known. Job is done but 
without exit code 0. In consequence Nagios tries to restart job.

Uni Bonn is suspended, seems to had the similar problem, but it was the wrong setup of the infosystem.
Is there a problem to mask alarms? We have many tickets to the same problem. Conclusion: Tickets were not handled in the 
right way. More comfortable is to mask the alarm.
Performance:
1.number: alarms over 72 hours
2.number: ticket closed, experation date expired 
2012-01 	NGI_DE 	6 	12
2012-02 	NGI_DE 	2 	1
2012-03 	NGI_DE 	0 	4
2012-04 	NGI_DE 	2 	14
2012-05 	NGI_DE 	0 	37
2012-06 	NGI_DE 	0 	8
2012-07 	NGI_DE 	0 	4
Conclusion:  Tickets related to second number/tickets should be monitored more closely.

AOB

Next Meeting at 27.7 or 3.8.

If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.