NGI DE CH Operations Center:Operations Meeting:06072012
- Minutes of last meeting
EGI Tech.Forum 2012 in Prague http://tf2012.egi.eu/
- Availability/reliability statistics
https://documents.egi.eu/public/RetrieveFile?docid=1251&version=2&filename=EGI_Jun2012.pdf Core services: 100% NGI_DE 95 % 95 % all sites hit a/r :) DESY reported we would have more than 95% for the region if KIT would publish properly number of HepSpecs/cpu.
waiting for update 17 for NGI-DE Nagios probes
- Staged rollout/updates
Middleware Baseline Versions for WLCG https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions @all: We all should try to update to the listed versions. In our region a lot of sites are running older versions. Wrong value of site location in GOC-DB https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2386 Dimitri from KIT sent an email around. Affected are DESY-ZN, DGI-REF FZK-LCG2 GSI-LCG2 GoeGrid MPI-K RTWH-Aachen UNI-BONN UNI-DORTMUND ZIB
Dimitri/KIT: Will send a survey about the baseline versions around before the Technical Forum in Prague
Round the sites
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH (Dmitri Ozerov)
- running smoothly - next two weeks we increase # of WNs
- ITWM (Martin Braun)
- running smoothly - Update of one CE, SE, UI and WNs to SL6 x86_64 and EMI-2 - Staged rollout of DPM_mysql, WN +TORQUE_client for UMD-2 - SE (DPM_mysql) without problems, looking at Known Problems and GGUS for other products strongly recommended (e. g. GGUS #82746, #82899, #83398, #83548, #83692, ...)
- KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
- emi update ongoing - We plan to have some WNs connected to Grid Engine (batch system)
- KIT (Uni Karlsruhe)
- MPPMU (Cesare Delle Fratte)
- running smoothly - decommissioning a few old pools, migrating data to new pools - both CREAMS (BDII service) stop to work, this is fixed now
- RWTH Aachen
- SCAI (andre Gemuend)
- problems with DECH VOMS, especially with java dependancy. I apologize for not sending notifications about registration problems @all: Please register again, if someone is missing
- Uni Bonn
- Uni Dortmund
- Uni Dresden
- Uni Freiburg (Anton Gamel)
- so far everything is ok - updated torque server to version 2.5.12. No new packages in the Apel repository were available so we did by our own and all run smoothly now
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- CSCS (Paolo)
- last Wednesday we had a maintenance: -- enabled CERNVMFS for ATLAS -- installed third CREAM for rolling update mechanism -- updated dCache to SL 5.7 -- updated TORQUE to latest version 2.4.17 - rest is fine
- be on vacation untill 1st of August, looking with Dimitri (KIT) for a deputy for the ROD shift.
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
- Any problematic tickets?
MPI-K is asking for help (GGUS Ticket 83822). Test for CREAM CEs are failing. Problem is not known. Job is done but without exit code 0. In consequence Nagios tries to restart job. Uni Bonn is suspended, seems to had the similar problem, but it was the wrong setup of the infosystem.
- Handover of the ROD shift
- ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
Is there a problem to mask alarms? We have many tickets to the same problem. Conclusion: Tickets were not handled in the right way. More comfortable is to mask the alarm.
Performance: 1.number: alarms over 72 hours 2.number: ticket closed, experation date expired 2012-01 NGI_DE 6 12 2012-02 NGI_DE 2 1 2012-03 NGI_DE 0 4 2012-04 NGI_DE 2 14 2012-05 NGI_DE 0 37 2012-06 NGI_DE 0 8 2012-07 NGI_DE 0 4
Conclusion: Tickets related to second number/tickets should be monitored more closely.
Next Meeting at 27.7 or 3.8.
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.