Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:02122011"
Jump to navigation
Jump to search
(15 intermediate revisions by 2 users not shown) | |||
Line 8: | Line 8: | ||
* Meetings/conferences | * Meetings/conferences | ||
NGI-DE/NGI-CH/D-Grid Workshop in April | |||
Note: There is also a dCache Workshop in April. Date should be chosen carefully. | |||
The EGI Community Forum (http://go.egi.eu/cf12) will be in Munich 26-30th March 2012 and held in conjunction with the 2nd EMI | |||
Technical Conference. Abstract submission was open until 2/12/11. | |||
* Availability/reliability statistics | * Availability/reliability statistics | ||
Last: | |||
https://documents.egi.eu/public/ShowDocument?docid=959 | |||
recomputation done | recomputation done | ||
https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1720 | https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1720 | ||
Line 29: | Line 37: | ||
http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/updates.asp | http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/updates.asp | ||
==other topics== | |||
EMI release / possible infosys errors (UNI-SIEGEN) | EMI release / possible infosys errors (UNI-SIEGEN) | ||
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1722 | https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1722 | ||
Gstat | Gstat | ||
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1930 | https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1930 | ||
Gstat | |||
Sites with CRYTICAL gstat status: wuppertalprod Uni-Bonn DESY-HH SCAI MaiGRID LRZ-LMU | |||
==Round the sites== | ==Round the sites== | ||
Line 44: | Line 52: | ||
* BMRZ-FRANKFURT (Uni Frankfurt) | * BMRZ-FRANKFURT (Uni Frankfurt) | ||
* DESY-HH | * DESY-HH | ||
we updated all our wn's to torque 2.5.7-2 (glite-WN-version-3.2.12-1) and this works fine with the old torque server (2.3.13-1). Server we didn't update because of the problem with memory in new version. | we updated all our wn's to torque 2.5.7-2 (glite-WN-version-3.2.12-1) and this works fine with the old torque server (2.3.13-1). | ||
Server we didn't update because of the problem with memory in new version. | |||
This week update of dcache-cms instance to 1.9.12-13 was done | This week update of dcache-cms instance to 1.9.12-13 was done | ||
* DESY-ZN | * DESY-ZN | ||
Line 50: | Line 59: | ||
* Goegrid | * Goegrid | ||
* GSI | * GSI | ||
* ITWM | * ITWM (Martin Braun) | ||
* KIT (GridKa, FZK-LCG2) | ntr | ||
* KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig) | |||
gLexec updated at WNs | gLexec updated at WNs | ||
roled based mapping for glexec was requested by atlas | roled based mapping for glexec was requested by atlas | ||
Line 58: | Line 68: | ||
* LRZ | * LRZ | ||
* MPI-K | * MPI-K | ||
* MPPMU | * MPPMU (Cesare Delle Fratte) | ||
- DOWNTIME 29/11 01/12 dcache upgrade from 1.9.5-28 to 1.9.12-13 | - DOWNTIME 29/11 01/12 dcache upgrade from 1.9.5-28 to 1.9.12-13 | ||
- problems with gridftp doors (solved by Java jdk update to latest packages) | |||
- dCache: number of movers was increased | |||
- updated one of the two CREAM | |||
- installed security fix on Apel box | |||
- strange lfc failures caused by gpfs partition problems | |||
* RWTH Aachen | * RWTH Aachen | ||
* SCAI | * SCAI | ||
Line 65: | Line 80: | ||
services online | services online | ||
* Uni Dortmund | * Uni Dortmund | ||
* Uni Dresden | * Uni Dresden (Ralph Mueller-Pfefferkorn) | ||
- since about two months problem with our file system, especially with the central nfs file system. The nfs system becomes | |||
* Uni Freiburg | overloaded. 100s of jobs with 100s of files. | ||
Paolo/CSCS: We had the same problems. It was fixed by changing the CREAM grubber and we went from Lustre to gpfs and SSD disks | |||
for the metadata and for the inode's table. | |||
* Uni Freiburg (Anton Gamel) | |||
- problems with gsi ssh -> increased movers | |||
- installed additional dCache servers | |||
* Uni Mainz-Maigrid | * Uni Mainz-Maigrid | ||
* Uni Siegen | * Uni Siegen | ||
* Uni Wuppertal | * Uni Wuppertal | ||
; SwiNG | ; SwiNG | ||
* CSCS | * CSCS (Paolo) | ||
- maintenance two days ago: firmware update of the disks, lost 4 disks/CMS pool (in contact with CMS) | |||
- test CERNVMFS in preproduction | |||
* PSI | * PSI | ||
* Switch | * Switch | ||
Line 84: | Line 106: | ||
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table | * ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table | ||
LRZ from 02.2012 | LRZ from 02.2012. 2*2 Shifts | ||
ROD Newsletter Nov. 2011 | ROD Newsletter Nov. 2011 | ||
https://documents.egi.eu/secure/RetrieveFile?docid=298&version=1&filename=ROD%20newsletter%2011-2011.pdf | https://documents.egi.eu/secure/RetrieveFile?docid=298&version=1&filename=ROD%20newsletter%2011-2011.pdf | ||
Escalation Procedures | tickets were not mentioned within 10 days. Be aware of the ROD statistics. | ||
Please pay attention to the Escalation Procedures | |||
https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure#Escalation_for_operational_problem_at_site | https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure#Escalation_for_operational_problem_at_site | ||
Latest revision as of 17:20, 13 February 2012
Introduction
- Minutes of last meeting
Announcements
- Meetings/conferences
NGI-DE/NGI-CH/D-Grid Workshop in April Note: There is also a dCache Workshop in April. Date should be chosen carefully.
The EGI Community Forum (http://go.egi.eu/cf12) will be in Munich 26-30th March 2012 and held in conjunction with the 2nd EMI Technical Conference. Abstract submission was open until 2/12/11.
- Availability/reliability statistics
Last: https://documents.egi.eu/public/ShowDocument?docid=959
recomputation done https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1720
- Monitoring
https://tomtools.cern.ch/confluence/display/SAMDOC/Update-15
- Staged rollout/updates
- UMD
https://wiki.egi.eu/wiki/UMD-1:UMD-1.5.0
- EMI
http://www.eu-emi.eu/emi-1-kebnekaise-updates
- gLite3.1
http://glite.web.cern.ch/glite/packages/R3.1/updates.asp
- gLite3.2
http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/updates.asp
other topics
EMI release / possible infosys errors (UNI-SIEGEN) https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1722
Gstat https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=1930 Gstat Sites with CRYTICAL gstat status: wuppertalprod Uni-Bonn DESY-HH SCAI MaiGRID LRZ-LMU
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH
we updated all our wn's to torque 2.5.7-2 (glite-WN-version-3.2.12-1) and this works fine with the old torque server (2.3.13-1). Server we didn't update because of the problem with memory in new version. This week update of dcache-cms instance to 1.9.12-13 was done
- DESY-ZN
- FZJuelich
- Goegrid
- GSI
- ITWM (Martin Braun)
ntr
- KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
gLexec updated at WNs roled based mapping for glexec was requested by atlas WMS disk full: Problems with ngi-de-nagios portal
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU (Cesare Delle Fratte)
- DOWNTIME 29/11 01/12 dcache upgrade from 1.9.5-28 to 1.9.12-13 - problems with gridftp doors (solved by Java jdk update to latest packages) - dCache: number of movers was increased - updated one of the two CREAM - installed security fix on Apel box - strange lfc failures caused by gpfs partition problems
- RWTH Aachen
- SCAI
- Uni Bonn
services online
- Uni Dortmund
- Uni Dresden (Ralph Mueller-Pfefferkorn)
- since about two months problem with our file system, especially with the central nfs file system. The nfs system becomes overloaded. 100s of jobs with 100s of files. Paolo/CSCS: We had the same problems. It was fixed by changing the CREAM grubber and we went from Lustre to gpfs and SSD disks for the metadata and for the inode's table.
- Uni Freiburg (Anton Gamel)
- problems with gsi ssh -> increased movers - installed additional dCache servers
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS (Paolo)
- maintenance two days ago: firmware update of the disks, lost 4 disks/CMS pool (in contact with CMS) - test CERNVMFS in preproduction
- PSI
- Switch
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
- Any problematic tickets?
- Handover of the ROD shift
- ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
LRZ from 02.2012. 2*2 Shifts
ROD Newsletter Nov. 2011 https://documents.egi.eu/secure/RetrieveFile?docid=298&version=1&filename=ROD%20newsletter%2011-2011.pdf
tickets were not mentioned within 10 days. Be aware of the ROD statistics. Please pay attention to the Escalation Procedures https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure#Escalation_for_operational_problem_at_site
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.