Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:30112012"
Jump to navigation
Jump to search
Line 45: | Line 45: | ||
Since August we run cvmfs-nfs version for the atlas software directory. Works rather nice (cvmfs-nfs server is extremely old server, but with SSD) for our case. We had up to 4.500 running atlas jobs. | Since August we run cvmfs-nfs version for the atlas software directory. Works rather nice (cvmfs-nfs server is extremely old server, but with SSD) for our case. We had up to 4.500 running atlas jobs. | ||
The only problem we observe is when too many (more than 5-10) jobs do the setup of atlas software at the same time on the worker node (this doesn't influence jobs on other worker node, | The only problem we observe is when too many (more than 5-10) jobs do the setup of atlas software at the same time on the worker node (this doesn't influence jobs on other worker node, | ||
so it's nfs-client issue) | so it's nfs-client issue), then the setup time increased from 15 seconds to >30 minutes (atlas has an internal timeout on setting the software, so such jobs are killed by the timeout watcher). But | ||
since we have a lot of worker nodes, probability that jobs landed at the same time on some worker node is low, so we have small fraction of jobs killed by the timeout because of setup of software. | |||
* DESY-ZN | * DESY-ZN |
Revision as of 15:00, 30 November 2012
Introduction
- Minutes of last meeting
Announcements
UNI-FREIBURG
still error, update plan requested. https://ggus.eu/ws/ticket_info.php?ticket=87414
MAIGRID
https://ggus.eu/ws/ticket_info.php?ticket=87418 currently downtime
FZJ
update plan requested. https://ggus.eu/ws/ticket_info.php?ticket=88681
GoeGrid
https://ggus.eu/ws/ticket_info.php?ticket=87415
errors: FZK-LCG2, MPPMU, RWTH-Aachen, LRZ-LMU
- Meetings/conferences
- Availability/reliability statistics
- Monitoring
- Staged rollout/updates
Round the sites
- NGI-DE
- BMRZ-FRANKFURT (Uni Frankfurt)
- DESY-HH
all services are running EMI now migrating worker nodes to EMI2 release (sl5) for WLCG and other vo we have test queue with the emi2-sl6 worker nodes recently upgraded/replaced part of the worker nodes most of the recent purchases for the worker nodes are for the Interlagos AMD processors we observe some discrepancy between the HEPSPEC results and the real jobs from the vo's for Intel and AMD processors (does anyone knows about the plan for new benchmark to replace HEPSPEC?) added recently storage space to the SE's. This month there will be another update for storage Since August we run cvmfs-nfs version for the atlas software directory. Works rather nice (cvmfs-nfs server is extremely old server, but with SSD) for our case. We had up to 4.500 running atlas jobs. The only problem we observe is when too many (more than 5-10) jobs do the setup of atlas software at the same time on the worker node (this doesn't influence jobs on other worker node, so it's nfs-client issue), then the setup time increased from 15 seconds to >30 minutes (atlas has an internal timeout on setting the software, so such jobs are killed by the timeout watcher). But since we have a lot of worker nodes, probability that jobs landed at the same time on some worker node is low, so we have small fraction of jobs killed by the timeout because of setup of software.
- DESY-ZN
- FZJuelich
- Goegrid
- GSI
- ITWM
- KIT (GridKa, FZK-LCG2)
- KIT (Uni Karlsruhe)
- LRZ
- MPI-K
- MPPMU
- RWTH Aachen
- SCAI
- Uni Bonn
- Uni Dortmund
- Uni Dresden
- Uni Freiburg
- Uni Mainz-Maigrid
- Uni Siegen
- Uni Wuppertal
- SwiNG
- CSCS
- PSI
- Switch
Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.
Status ROD
- Any problematic tickets?
- Handover of the ROD shift
- ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
AOB
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.