EGI-InSPIRE:Switzerland-QR9

From EGIWiki
(Redirected from OMB:Switzerland-QR9)
Jump to: navigation, search
EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports


Contents


Quarterly Report Number NGI Name Partner Name Author
9 NGI_CH SWITCH Simon Leinen


1. MEETINGS AND DISSEMINATION

1.1. CONFERENCES/WORKSHOPS ORGANISED

Date Location Title Participants Outcome (Short report & Indico URL)

1.2. OTHER CONFERENCES/WORKSHOPS ATTENDED

Date Location Title Participants Outcome (Short report & Indico URL)
May 30 - June 1 Uppsala, SE NorduGrid 2012 conference Sigve Haug, Gianfranco Sciacca (UNIBE-LHEP), Tyanko Aleksiev, Sergio Maffioletti (UZH) Defined way forward for establishing operational solutions still missing from ARC middleware for: APEL accounting (affects CSCS, Unibe, Unige), information system, full integration with ATLAS operations.
June 25 - June 29 Cetraro, IT Int. Adv. Res. Workshop on High Performance Computing, Grid and Clouds Sergio Maffioletti The main focus was on how to prepare providers and community support for next generation large scale data analysis, with an emphasis on cloud computing.


1.3. PUBLICATIONS

Publication title Journal / Proceedings title Journal references
Volume number
Issue

Pages from - to
Authors
1.
2.
3.
Et al?
AppPot: bridging the Grid and Cloud worlds Proc. EGI Community Forum 2012 R. Murri, S. Maffioletti, T. Aleksiev
A Grid execution model for Computational Chemistry Applications using GC3Pie and AppPot Proc. EGI Community Forum 2012 A. Costantini, A. Laganà, S. Maffioletti, R. Murri, O. Gervasi
Computational workflows with GC3Pie Proc. EGI Community Forum 2012 S. Maffioletti, R. Murri, T. Aleksiev
GC3Pie: A Python framework for high-throughput computing Proc. EGI Community Forum 2012 S. Maffioletti, R. Murri, T. Aleksiev
Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud Proc. CHEP 2012 J. Elmsheuser, R. Medrano Llamas, F. Legger, A. Sciabá, G. Sciacca, M. Ubeda Garcia, D. van der Ster

2. ACTIVITY REPORT

2.1. Progress Summary

CSCS

The entire site moved from its old location in Manno to a newly built centre in Lugano-Cornaredo.  This was used as an opportunity to uprade the T2 cluster, see under "achievements" below.

2.2. Main Achievements

CSCS

  1. Full network redesign, from a mixed environment with 2 networks (IB and ethernet) to fully integrated IB and 10G ethernet network with Voltaire bridges.
  2. Replacement of the remaining old Sun Thumpers storage servers by newer IBM based storage (DS 3500).
  3. Replacement of the remaining old Sun Worker Nodes (96 blade machines) with new Sandy Bridge systems with 32 job slots per host.
  4. Reorganization and simplification of the Scratch filesystem where the jobs at CSCS run. Based on GPFS.
  5. Increased the bandwidth available to VM-based grid services from 1G ethernet to 10GB ethernet.
  6. Replaced 2x VM-based CREAM-CE machines by 3 physical hosts. These systems are now EMI UMD1 latest update.
  7. Replaced 2x VM-based ARGUS servers by 3 physical hosts and integration with the CREAM-CE on the same nodes. These systems are now EMI UMD1 latest update.
  8. Installed CernVMFS for all the VOs supported by CSCS. Currently only used by ATLAS and LHCb.
  9. General upgrade of service nodes from SL 5.4/5 to SL5.7 and latest kernel available at the moment (308).

PSI

  1. Added 11 worker notes (2 * 8 core Xeon "Sandy Bridge" ES-2670 2.6 GHz, 48GB DDR3) and put them into production.

UNIGE

  1. Stable operation
  2. Upgraded network hardware
  3. Preparation of next upgrade: replacement of the oldest CPU nodes.

UNIBE-LHEP

  1. Very stable cluster operation
  2. Successful relocation of T2 hardware from CSCS to Bern (~10500 HEPCSPEC06, storage). Commissioning underway.

UZH

  1. Involvement in NA activities, in particular Virtual Teams, specifically:
    1. EGI Champions
    2. Science gateway primer


2.3. Issues and mitigation

CSCS

  1. Unfortunately, we are seeing soft lockup CPU errors in our old Sun Thors that cause the system (dCache pools) to block the software RAIDs available.  This seems to be related to a known bug by Sun that is not going to be fixed.  Replacement plan is being drafted.
  2. Initially we saw ARP problems with the bridge ethernet/IB. Now this seems to have been fixed.
  3. CSCS status on the WLCG Dashboard is not always green. This seems to be caused by user job related issues. We think it may have to do with CVMFS, but it's difficult to be sure about it at this point. Debugging is in course.

UNIGE

  1. Some disk space management: identifying and removing data we no longer need
  2. Solaris is no longer supported by the DPM software team.  We have such machines in our SE.  The mitigation is not to update the DPM and hope that it will last another year or two.
  3. The DPM software on the Solaris disk servers does not support RFC proxies.  For this reason our SE is no longer a data source for NORDUgrid jobs on other sites.
Issue Description Mitigation Description


Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export