Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:Switzerland-QR5"

From EGIWiki
Jump to navigation Jump to search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__
{{Template:EGI-Inspire menubar}}
 
{{Template:Inspire_reports_menubar}}
{{TOC_right}}
{| border="1" cellspacing="0" cellpadding="2"
{| border="1" cellspacing="0" cellpadding="2"
|-
|-
Line 42: Line 45:
||http://www.swing-grid.ch/event/306650-swing-scientific-advisory-council-2011
||http://www.swing-grid.ch/event/306650-swing-scientific-advisory-council-2011
|-
|-
|2011-06-14 - 2011-06-16
||Zurich, Switzerland
||Documentation Task Force meeting
||9
||https://www.egi.eu/indico/conferenceDisplay.py?confId=481
|}
|}


Line 54: Line 62:
||Sundvolden, Norway
||Sundvolden, Norway
||Annual Nordugrid Conference
||Annual Nordugrid Conference
||1
||3
||http://indico.hep.lu.se/conferenceDisplay.py?confId=1047
||http://indico.hep.lu.se/conferenceDisplay.py?confId=1047
|-
|-
|2011-05-30 - 2011-06-01
||Lund, Sweden
||EMI All hands meeting
||4
||http://indico.cern.ch/conferenceDisplay.py?confId=124206
|}
|}


Line 84: Line 97:
===2.2. Main Achievements===
===2.2. Main Achievements===


==== CSCS ====
* Participated in stage rollout of the following components:
** EMI1 ARGUS 1.3.0-6
** EMI1 APEL 1.0.0-0
** EMI1 CREAM 1.3
** EMI1 glexec_wn 1.0.0-1
** gLite glexec_wn 3.2.5-1
** gLite glexec_wn 3.2.6-3
* Deployment of 2 new Argus servers. Replacement of cream installation in 1 machine for a fresh installation of EMI1 cream.
==== UZH ====
* hosted site visit from Comp.chem representer from UniPe: Dr. Alessandro Constantini has been visiting UZH for learning and implementing a work-flow use case using the GC3/UZH developed tool GC3Pie.
* Participation in EMI Early Adopter activites for EMI ARC-CE, ARC-client
==== SWITCH ====
* Two people are now capable of operating a monitoring instance for an NGI.


===2.3. Issues and mitigation===
===2.3. Issues and mitigation===
Line 92: Line 124:
!scope="col"| Mitigation Description
!scope="col"| Mitigation Description
|-
|-
|<br>
|CSCS
||
# Failure in one of our Lustre scratch servers which made a few jobs to fail immediately and many to get stalled/not queued.
# We are starting to see excessive rate of failed disks in our Lustre servers (Sun J4400)
# Suffering random segfaults of Torque pbs_mom process.
# Batch system Torque did not fail over when it was supposed to do.
# Still suffering from user jobs doing excessive IO which slows down other jobs running in the shared Lustre o GPFS filesystem.
|| CSCS
# Deactivated the Lustre server and removed hung jobs after a long period of time.
# Replacing disks as soon as we can, but since all disks are of the same age, sooner or later we will have to replace some machines/disks.
# Contacted Adaptive Computing but not very responsive on this issue. Working on it trying to make pbs_mom to dump core files.
# Replaced Torque for a newer version.
# Penalized excessive use of high amount of certain operations (such as hte commands 'find', 'du', etc.).
|-
|UZH
# Validation of EMI ARC components was more complicated than anticipated; fewer EA did actually participate in the test phase
# Several EMI ARC components did not pass the validation phase
|| UZH
# acknowledged by Mario David
# we had to get back to developers (through several channels) to clarify the issues and bugs.
|}
|}



Latest revision as of 17:38, 22 January 2015

EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



Quarterly Report Number NGI Name Partner Name Author
5 NGI_CH Switzerland Andres Aeschlimann


1. MEETINGS AND DISSEMINATION

1.1. CONFERENCES/WORKSHOPS ORGANISED

Date Location Title Participants Outcome (Short report & Indico URL)
2011-06-10 Bern, Switzerland Swiss National Grid Association - Scientific Advisory Council 2011 20 http://www.swing-grid.ch/event/306650-swing-scientific-advisory-council-2011
2011-06-14 - 2011-06-16 Zurich, Switzerland Documentation Task Force meeting 9 https://www.egi.eu/indico/conferenceDisplay.py?confId=481

1.2. OTHER CONFERENCES/WORKSHOPS ATTENDED

Date Location Title Participants Outcome (Short report & Indico URL)
2011-05-09 - 2011-05-12 Sundvolden, Norway Annual Nordugrid Conference 3 http://indico.hep.lu.se/conferenceDisplay.py?confId=1047
2011-05-30 - 2011-06-01 Lund, Sweden EMI All hands meeting 4 http://indico.cern.ch/conferenceDisplay.py?confId=124206


1.3. PUBLICATIONS

Publication title Journal / Proceedings title Journal references
Volume number
Issue

Pages from - to
Authors
1.
2.
3.
Et al?
Swiss National Grid Association - Annual Report 2011 http://www.swing-grid.ch

2. ACTIVITY REPORT

2.1. Progress Summary

2.2. Main Achievements

CSCS

  • Participated in stage rollout of the following components:
    • EMI1 ARGUS 1.3.0-6
    • EMI1 APEL 1.0.0-0
    • EMI1 CREAM 1.3
    • EMI1 glexec_wn 1.0.0-1
    • gLite glexec_wn 3.2.5-1
    • gLite glexec_wn 3.2.6-3
  • Deployment of 2 new Argus servers. Replacement of cream installation in 1 machine for a fresh installation of EMI1 cream.

UZH

  • hosted site visit from Comp.chem representer from UniPe: Dr. Alessandro Constantini has been visiting UZH for learning and implementing a work-flow use case using the GC3/UZH developed tool GC3Pie.
  • Participation in EMI Early Adopter activites for EMI ARC-CE, ARC-client

SWITCH

  • Two people are now capable of operating a monitoring instance for an NGI.

2.3. Issues and mitigation

Issue Description Mitigation Description
CSCS
  1. Failure in one of our Lustre scratch servers which made a few jobs to fail immediately and many to get stalled/not queued.
  2. We are starting to see excessive rate of failed disks in our Lustre servers (Sun J4400)
  3. Suffering random segfaults of Torque pbs_mom process.
  4. Batch system Torque did not fail over when it was supposed to do.
  5. Still suffering from user jobs doing excessive IO which slows down other jobs running in the shared Lustre o GPFS filesystem.
CSCS
  1. Deactivated the Lustre server and removed hung jobs after a long period of time.
  2. Replacing disks as soon as we can, but since all disks are of the same age, sooner or later we will have to replace some machines/disks.
  3. Contacted Adaptive Computing but not very responsive on this issue. Working on it trying to make pbs_mom to dump core files.
  4. Replaced Torque for a newer version.
  5. Penalized excessive use of high amount of certain operations (such as hte commands 'find', 'du', etc.).
UZH
  1. Validation of EMI ARC components was more complicated than anticipated; fewer EA did actually participate in the test phase
  2. Several EMI ARC components did not pass the validation phase
UZH
  1. acknowledged by Mario David
  2. we had to get back to developers (through several channels) to clarify the issues and bugs.