Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:06072012"

From EGIWiki
Jump to navigation Jump to search
Line 31: Line 31:
  Dimitri from KIT sent an email around. Affected are DESY-Zeuthen, DGI-Referenceinstallation, KIT (already fixed), GSI,   
  Dimitri from KIT sent an email around. Affected are DESY-Zeuthen, DGI-Referenceinstallation, KIT (already fixed), GSI,   
  Maigrid, MPI-K, RWTH Aachen, Uni-Bonn, Uni-Dortmund
  Maigrid, MPI-K, RWTH Aachen, Uni-Bonn, Uni-Dortmund
Dimitri/KIT: Will send a survey about the baseline versions around before the Technical Forum in Prague


==Round the sites==
==Round the sites==

Revision as of 13:05, 13 July 2012

Operations Meeting Main

Introduction

  • Minutes of last meeting

Announcements

  • Meetings/conferences
EGI Tech.Forum 2012 in Prague
http://tf2012.egi.eu/
  • Availability/reliability statistics
https://documents.egi.eu/public/RetrieveFile?docid=1251&version=2&filename=EGI_Jun2012.pdf
Core services: 100%
NGI_DE 95 % 95 %
all sites hit a/r :)
DESY reported we would have more than 95% for the region if KIT did not publish the wrong number of HepSpecs. 
  • Monitoring
waiting for update 17 for NGI-DE Nagios probes
  • Staged rollout/updates
Middleware Baseline Versions for WLCG
https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions
@all: We all should try to update to the listed versions. In our region a lot of sites are running older versions.

Wrong value of site location in GOC-DB
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2386
Dimitri from KIT sent an email around. Affected are DESY-Zeuthen, DGI-Referenceinstallation, KIT (already fixed), GSI,  
Maigrid, MPI-K, RWTH Aachen, Uni-Bonn, Uni-Dortmund
Dimitri/KIT: Will send a survey about the baseline versions around before the Technical Forum in Prague

Round the sites

NGI-DE
  • BMRZ-FRANKFURT (Uni Frankfurt)
  • DESY-HH (Dmitri Ozerov)
- running smoothly
- next two weeks we increase # of WNs
  • DESY-ZN
  • FZJuelich
  • Goegrid
  • GSI
  • ITWM (Martin Braun)
 - running smoothly
 - Update of one CE, SE, UI and WNs to SL6 x86_64 and EMI-2
 - Staged rollout of DPM_mysql, WN +TORQUE_client for UMD-2
 - SE (DPM_mysql) without problems, looking at Known Problems and GGUS for
   other products strongly recommended
   (e. g. GGUS #82746, #82899, #83398, #83548, #83692, ...)
  • KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
 - emi update ongoing
 - We plan to have some WNs connected to Grid Engine (batch system)
  • KIT (Uni Karlsruhe)
  • LRZ
  • MPI-K
  • MPPMU (Cesare Delle Fratte)
 - running smoothly
 - decommissioning a few old pools, migrating data to new pools
 - both CREAMS (BDII service) stop to work, this is fixed now
  • RWTH Aachen
  • SCAI (andre Gemuend)
 - problems with DECH VOMS, especially with java dependancy. 
   I apologize for not sending notifications about registration problems 
   @all: Please register again, if someone is missing
  • Uni Bonn
  • Uni Dortmund
  • Uni Dresden
  • Uni Freiburg (Anton Gamel)
 - so far everything is ok
 - updated torque server to version 2.5.12. No new packages in the Apel repository were available so we did by our own 
   and all run smoothly now
  • Uni Mainz-Maigrid
  • Uni Siegen
  • Uni Wuppertal
SwiNG
  • CSCS (Paolo)
 - last Wednesday we had a maintnenance:
 -- enabled CERNVMFS for ATLAS
 -- installed third CREAM for rolling update mechanism
 -- updated dCache to SL 5.7
 -- updated TORQUE to latest version 2.4.17
 - rest is fine 
  • PSI
  • Switch
 - be on vacation untill 1st of August, looking with Dimitri (KIT) for a deputy for the ROD shift.

Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.

Status ROD

  • Any problematic tickets?
MPI-K is asking for help (GGUS Ticket 83822). Test for CREAM CEs are failing. Problem is not known. Job is done but 
without exit code 0. In consequence Nagios tries to restart job.

Uni Bonn is suspended, seems to had the similar problem, but it was the wrong setup of the infosystem.
Is there a problem to mask alarms? We have many tickets to the same problem. Conclusion: Tickets were not handled in the 
right way. More comfortable is to mask the alarm.
Performance:
1.number: alarms over 72 hours
2.number: ticket closed, experation date expired 
2012-01 	NGI_DE 	6 	12
2012-02 	NGI_DE 	2 	1
2012-03 	NGI_DE 	0 	4
2012-04 	NGI_DE 	2 	14
2012-05 	NGI_DE 	0 	37
2012-06 	NGI_DE 	0 	8
2012-07 	NGI_DE 	0 	4
Conclusion:  Tickets related to second number/tickets should be monitored more closely.

AOB

If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.