Difference between revisions of "EGI-InSPIRE:Switzerland-QR5"
Jump to navigation
Jump to search
Line 106: | Line 106: | ||
{| border="2" cellspacing="0" cellpadding="2" | {| border="2" cellspacing="0" cellpadding="2" | ||
|- | |- | ||
|| 1. Failure in one of our Lustre scratch servers which made a few jobs to fail immediately and many to get stalled/not queued. | || 1. Failure in one of our Lustre scratch servers which made a few jobs to fail immediately and many to get stalled/not queued. | ||
|| 2. We are starting to see excessive rate of failed disks in our Lustre servers (Sun J4400) | |||
|| 3. Suffering random segfaults of Torque pbs_mom process. | |||
|| 4. Batch system Torque did not fail over when it was supposed to do. | |||
|| 5. Still suffering from user jobs doing excessive IO which slows down other jobs running in the shared Lustre o GPFS filesystem. | |||
|} | |} | ||
|| CSCS | || CSCS | ||
Line 112: | Line 116: | ||
|- | |- | ||
|| 1. Deactivated the Lustre server and removed hung jobs after a long period of time. | || 1. Deactivated the Lustre server and removed hung jobs after a long period of time. | ||
|| 2. Replacing disks as soon as we can, but since all disks are of the same age, sooner or later we will have to replace some machines/disks. | |||
|| 3. Contacted Adaptive Computing but not very responsive on this issue. Working on it trying to make pbs_mom to dump core files. | |||
|| 4. Replaced Torque for a newer version. | |||
|| 5. Penalized excessive use of high amount of certain operations (such as hte commands 'find', 'du', etc.). | |||
|} | |} | ||
|} | |} |
Revision as of 09:10, 28 July 2011
Quarterly Report Number | NGI Name | Partner Name | Author |
---|---|---|---|
5 | NGI_CH | Switzerland | Andres Aeschlimann |
1. MEETINGS AND DISSEMINATION
1.1. CONFERENCES/WORKSHOPS ORGANISED
Date | Location | Title | Participants | Outcome (Short report & Indico URL) |
---|---|---|---|---|
2011-06-10 | Bern, Switzerland | Swiss National Grid Association - Scientific Advisory Council 2011 | 20 | http://www.swing-grid.ch/event/306650-swing-scientific-advisory-council-2011 |
1.2. OTHER CONFERENCES/WORKSHOPS ATTENDED
Date | Location | Title | Participants | Outcome (Short report & Indico URL) |
---|---|---|---|---|
2011-05-09 - 2011-05-12 | Sundvolden, Norway | Annual Nordugrid Conference | 1 | http://indico.hep.lu.se/conferenceDisplay.py?confId=1047 |
1.3. PUBLICATIONS
Publication title | Journal / Proceedings title | Journal references Volume number Issue Pages from - to |
Authors 1. 2. 3. Et al? |
---|---|---|---|
Swiss National Grid Association - Annual Report 2011 | http://www.swing-grid.ch |
2. ACTIVITY REPORT
2.1. Progress Summary
2.2. Main Achievements
CSCS
- Participated in stage rollout of the following components:
- EMI1 ARGUS 1.3.0-6
- EMI1 APEL 1.0.0-0
- EMI1 CREAM 1.3
- EMI1 glexec_wn 1.0.0-1
- gLite glexec_wn 3.2.5-1
- gLite glexec_wn 3.2.6-3
- Deployment of 2 new Argus servers. Replacement of cream installation in 1 machine for a fresh installation of EMI1 cream.
2.3. Issues and mitigation
Issue Description | Mitigation Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
CSCS
|
CSCS
|