Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:02122011"

From EGIWiki
Jump to navigation Jump to search
(Undo revision 32798 by Tkoenig (Talk))
 
(19 intermediate revisions by 3 users not shown)
Line 8: Line 8:


* Meetings/conferences
* Meetings/conferences
  report from Integrated Information System workshop
  NGI-DE/NGI-CH/D-Grid Workshop in April
Note: There is also a dCache Workshop in April. Date should be chosen carefully.
 
The EGI Community Forum (http://go.egi.eu/cf12) will be in Munich 26-30th March 2012 and held in conjunction with the 2nd EMI
Technical Conference. Abstract submission was open until 2/12/11.


* Availability/reliability statistics
* Availability/reliability statistics
Last:
https://documents.egi.eu/public/ShowDocument?docid=959
  recomputation done
  recomputation done
  https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1720
  https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1720
Line 16: Line 24:
  https://tomtools.cern.ch/confluence/display/SAMDOC/Update-15
  https://tomtools.cern.ch/confluence/display/SAMDOC/Update-15
* Staged rollout/updates
* Staged rollout/updates
;UMD
https://wiki.egi.eu/wiki/UMD-1:UMD-1.5.0


;EMI
;EMI
Line 26: Line 37:
  http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/updates.asp
  http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/updates.asp


 
==other topics==
*other
  EMI release / possible infosys errors (UNI-SIEGEN)
  EMI release / possible infosys errors (UNI-SIEGEN)
  https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1722
  https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1722


  Responsivity of MaiGrid
  Gstat
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1930
Gstat
Sites with CRYTICAL gstat status: wuppertalprod Uni-Bonn DESY-HH SCAI MaiGRID LRZ-LMU


==Round the sites==
==Round the sites==
Line 38: Line 52:
* BMRZ-FRANKFURT (Uni Frankfurt)
* BMRZ-FRANKFURT (Uni Frankfurt)
* DESY-HH
* DESY-HH
we updated all our wn's to torque 2.5.7-2 (glite-WN-version-3.2.12-1) and this works
we updated all our wn's to torque 2.5.7-2 (glite-WN-version-3.2.12-1) and this works fine with the old torque server (2.3.13-1).
fine with the old torque server (2.3.13-1). Server we didn't update because of the problem with
Server we didn't update because of the problem with memory in new version.
memory in new version.
  This week update of dcache-cms instance to 1.9.12-13 was done
  This week update of dcache-cms instance to 1.9.12-13 was done
* DESY-ZN
* DESY-ZN
Line 46: Line 59:
* Goegrid
* Goegrid
* GSI
* GSI
* ITWM
* ITWM (Martin Braun)
* KIT (GridKa, FZK-LCG2)
ntr
  instable site-BDII/gLite3.2
* KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
  WMS disk full
  gLexec updated at WNs
roled based mapping for glexec was requested by atlas
  WMS disk full: Problems with ngi-de-nagios portal
* KIT (Uni Karlsruhe)
* KIT (Uni Karlsruhe)
* LRZ
* LRZ
* MPI-K
* MPI-K
* MPPMU
* MPPMU (Cesare Delle Fratte)
     - DOWNTIME 29/11 01/12 dcache upgrade from 1.9.5-28 to 1.9.12-13
     - DOWNTIME 29/11 01/12 dcache upgrade from 1.9.5-28 to 1.9.12-13
    - problems with gridftp doors (solved by Java jdk update to latest packages)
    - dCache: number of movers was increased
    - updated one of the two CREAM
    - installed security fix on Apel box
    - strange lfc failures caused by gpfs partition problems
* RWTH Aachen
* RWTH Aachen
* SCAI
* SCAI
Line 60: Line 80:
  services online
  services online
* Uni Dortmund
* Uni Dortmund
* Uni Dresden
* Uni Dresden (Ralph Mueller-Pfefferkorn)
  ntr
  - since about two months problem with our file system, especially with the central nfs file system. The nfs system becomes
* Uni Freiburg
  overloaded. 100s of jobs with 100s of files.
  Paolo/CSCS: We had the same problems. It was fixed by changing the CREAM grubber and we went from Lustre to gpfs and SSD disks
  for the metadata and for the inode's table.
* Uni Freiburg (Anton Gamel)
- problems with gsi ssh -> increased movers
- installed additional dCache servers
* Uni Mainz-Maigrid
* Uni Mainz-Maigrid
* Uni Siegen
* Uni Siegen
* Uni Wuppertal
* Uni Wuppertal
; SwiNG
; SwiNG
* CSCS
* CSCS (Paolo)
- maintenance two days ago: firmware update of the disks, lost 4 disks/CMS pool (in contact with CMS)
- test CERNVMFS in preproduction
* PSI
* PSI
* Switch
* Switch
Line 79: Line 106:
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table


  LRZ from 02.2012
  LRZ from 02.2012. 2*2 Shifts


  ROD Newsletter Nov. 2011
  ROD Newsletter Nov. 2011
  https://documents.egi.eu/secure/RetrieveFile?docid=298&version=1&filename=ROD%20newsletter%2011-2011.pdf
  https://documents.egi.eu/secure/RetrieveFile?docid=298&version=1&filename=ROD%20newsletter%2011-2011.pdf


  Escalation Procedures
  tickets were not mentioned within 10 days. Be aware of the ROD statistics.
Please pay attention to the Escalation Procedures
  https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure#Escalation_for_operational_problem_at_site
  https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure#Escalation_for_operational_problem_at_site



Latest revision as of 17:20, 13 February 2012

Operations Meeting Main

Introduction

  • Minutes of last meeting

Announcements

  • Meetings/conferences
NGI-DE/NGI-CH/D-Grid Workshop in April
Note: There is also a dCache Workshop in April. Date should be chosen carefully.
The EGI Community Forum (http://go.egi.eu/cf12) will be in Munich 26-30th March 2012 and held in conjunction with the 2nd EMI 
Technical Conference. Abstract submission was open until 2/12/11.
  • Availability/reliability statistics
Last:
https://documents.egi.eu/public/ShowDocument?docid=959
recomputation done
https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1720
  • Monitoring
https://tomtools.cern.ch/confluence/display/SAMDOC/Update-15
  • Staged rollout/updates
UMD
https://wiki.egi.eu/wiki/UMD-1:UMD-1.5.0
EMI
http://www.eu-emi.eu/emi-1-kebnekaise-updates
gLite3.1
http://glite.web.cern.ch/glite/packages/R3.1/updates.asp
gLite3.2
http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/updates.asp

other topics

EMI release / possible infosys errors (UNI-SIEGEN)
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1722
Gstat
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1930

Gstat
Sites with CRYTICAL gstat status: wuppertalprod Uni-Bonn DESY-HH SCAI MaiGRID LRZ-LMU

Round the sites

NGI-DE
  • BMRZ-FRANKFURT (Uni Frankfurt)
  • DESY-HH
we updated all our wn's to torque 2.5.7-2 (glite-WN-version-3.2.12-1) and this works fine with the old torque server (2.3.13-1).  
Server we didn't update because of the problem with memory in new version.
This week update of dcache-cms instance to 1.9.12-13 was done
  • DESY-ZN
  • FZJuelich
  • Goegrid
  • GSI
  • ITWM (Martin Braun)
ntr
  • KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
gLexec updated at WNs
roled based mapping for glexec was requested by atlas
WMS disk full: Problems with ngi-de-nagios portal
  • KIT (Uni Karlsruhe)
  • LRZ
  • MPI-K
  • MPPMU (Cesare Delle Fratte)
   - DOWNTIME 29/11 01/12 dcache upgrade from 1.9.5-28 to 1.9.12-13
   - problems with gridftp doors (solved by Java jdk update to latest packages)
   - dCache: number of movers was increased
   - updated one of the two CREAM
   - installed security fix on Apel box
   - strange lfc failures caused by gpfs partition problems
  • RWTH Aachen
  • SCAI
  • Uni Bonn
services online
  • Uni Dortmund
  • Uni Dresden (Ralph Mueller-Pfefferkorn)
- since about two months problem with our file system, especially with the central nfs file system. The nfs system becomes 
  overloaded. 100s of jobs with 100s of files.
  Paolo/CSCS: We had the same problems. It was fixed by changing the CREAM grubber and we went from Lustre to gpfs and SSD disks 
  for the metadata and for the inode's table.
  • Uni Freiburg (Anton Gamel)
- problems with gsi ssh -> increased movers
- installed additional dCache servers
  • Uni Mainz-Maigrid
  • Uni Siegen
  • Uni Wuppertal
SwiNG
  • CSCS (Paolo)
- maintenance two days ago: firmware update of the disks, lost 4 disks/CMS pool (in contact with CMS)
- test CERNVMFS in preproduction
  • PSI
  • Switch

Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.

Status ROD

LRZ from 02.2012. 2*2 Shifts
ROD Newsletter Nov. 2011
https://documents.egi.eu/secure/RetrieveFile?docid=298&version=1&filename=ROD%20newsletter%2011-2011.pdf
tickets were not mentioned within 10 days. Be aware of the ROD statistics.
Please pay attention to the Escalation Procedures
https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure#Escalation_for_operational_problem_at_site

AOB

If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.