EGI-InSPIRE:Germany-QR4

From EGIWiki
Jump to: navigation, search
EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



Quarterly Report Number NGI Name Partner Name Author
QR 4 NGI_DE Germany Jie Tao


1. MEETINGS AND DISSEMINATION

1.1. CONFERENCES/WORKSHOPS ORGANISED

Date Location Title Participants Outcome (Short report & Indico URL)
April 6-7, 2011 Karlsruhe, Germany CSIRT face-to-face meeting EGI, NGIs security staff Activity update, Security monitoring plan, security training, discussions

https://www.egi.eu/indico/conferenceDisplay.py?confId=438

April 28-29, 2011 Freiburg, Germany 7th BFG Workshop NGI_DE staff talks and hands-on cluster computing. http://www.bfg.uni-freiburg.de/announcements/7th-bfg-workshop

1.2. OTHER CONFERENCES/WORKSHOPS ATTENDED

Date Location Title Participants Outcome (Short report & Indico URL)
February 2-3, 2011 Karlsruhe, Germany LSDF-Kolloquim NGI_DE sites (KIT, DESY-HH, etc.) http://www.scc.kit.edu/forschung/lsdf-kolloquium.php
March 17, 2011 Göttingen, Germany dCache workshop NGI_DE sites (KIT, FZJ, LRZ-LMU, MPPMU, etc.) Discussion of dCache issues
April 11-15, 2011 Vilnius, Lithuania EGI User Forum/EMI technical conference NGI_DE sites(KIT, FZJ, SCAI, TUDresden-ZIH, MPPMU, DESY-HH, etc.) Presentations, e.g. UNICORE Tutorial, and The DESY Grid Centre; participation on various workshops
April 13-14, 2011 Helsinki, Finland DEISA/PRACE symposium NGI_DE staff of site LRZ-LMU http://www.deisa.eu/news_press/symposium

1.3. PUBLICATIONS

Publication title Journal / Proceedings title Journal references
Volume number
Issue

Pages from - to
Authors
1.
2.
3.
Et al?
DESY Grid Centre EGI User Forum poster 1.A.Gellrich
2.D.Ozerov

2. ACTIVITY REPORT

2.1. Progress Summary

In the reported period NGI_DE keeps the Grid running smoothly in the region. Availability and Reliability was kept high (96% in average). Operation problems are discussed in the regular coordination meeting. The ROD team works well. Grid services are maintained up-to-date. Security update was done by all sites. The sites regularly participated in Grid operations meetings as well as GOCDB regionalization and UNICORE integration task force meetings.

Two VO subgroups, /dteam/NGI_DE and /dteam/NGI_CH, were created. Most of the sites already support theses VOs and passed the test. New support unit for NGI_DE monitoring was integrated in helpdesk to cover the case of problems with the common monitoring tools like nagios, dashboard, myegi etc. The regional nagios monitoring instance is now using the VO subgroup "/ops/NGI/Germany". All the NGI-DE and NGI-CH sites enabled the support for this subgroup.

2.2. Main Achievements

The main achievements of NGI_DE in the report period are the successful operation of the Grid infrastructure in the entire region. Some sites participated in Staged Rollout. All sites have patched the security vulnerabilities in time. Regional Nagios instance was updated to the latest version SAM09. New CREAM-CEs were installed. The following are concrete achievements of some sites:

  • MPPMU replaced the monbox, migrated the CREAM CE and sBDII to new HW, added some WNs and storage systems, and installed 240TB disk space. Currently, this site is moving forward towards the glite 3.2 on SL(C)5 infrastructure (until now SLES are used on the CEs). The migration/upgrade is planned to be finished in May.
  • DESY-HH increased storage capacity, migrated one SE (desy) to 1.9.10 release, and performed commissioning of the next part of the storage hardware.
  • In TUDresden-ZIH, the production infrastructure works quite stable; some minor problems with dCache were fixed. This site also migrated monbox to glite-APEL and started to support VO BIOmed in the reported period.
  • Uni-Freibiurg installed a set of new worker nodes of 240 cores and a second CREAM CE. It also extended storage from 280 TB to 670 TB.
  • KIT adopted new CREAM CEs and WMS in production. It also updated successfully the top level BDII and migrated monbox to glite-APEL. This site prepared the annual NGI_DE conference in May.
  • WUPPERTAL successfully performed upgrade to the new dCache release 1.9.12-1 (the new "golden release") and upgrade the Cluster file system Lustre to Lustre 1.8.4 (SFS 3.2-3).
  • FZJ contributed to the UNICORE Integration Task Force and organized the UNICORE Tutorial for resource providers at EGI User Forum. An early adoption of EMI 1.0 UNICORE services is planned for the next quarter.
  • BMRZ-Frankfurt updated CEs, WNs, sBDII etc.
  • SCAI performed BDII site migration to gLite 3.2, decommission of glite-wms2, and the preparation of VOMS, SE and LFC migration. Torque/Maui was also upgraded to 2.5.x.
  • In LRZ-LMU, Dell storage server was brought into production in an inhomogeneous environment (SLES10 / SLES11); dCache was maintained up to date in the current stable release branch (1.9.5). LRZ-LMU also deployed dCache on the new pools and migrated the whole dCache cluster to use BDB based file meta-data. A dCache migration to the next golden release and using new high performance (1Tb RAM) service node for dCache were planned for the near future.
  • Fraunhofer ITWM ITWM replaced 32 old worker nodes with new hardware and installed new hardware for fornax-ce and fornax-se. All nodes have been updated to gLite 3.2. ETWM also performed staged rollout for patch #4198 (glite-WN) and staged rollout for patches #4408-#4410 (glite-TORQUE_server, glite-TORQUE_client, glite-TORQUE_utils). This site did preparations for the global GOCDB4 failover system (going production in next quarter).

2.3. Issues and mitigation

Issue Description Mitigation Description
ITWM encountered a small problem with gLite Apel migration. The problem was solved by replacing node type “apel” in GOCDB for monbox with “glite-apel”.
FZJ: a ticket https://rt.egi.eu/rt/Ticket/Display.html?id=975 is still open. A solution is needed for properly adding UNICORE services to the EGI infrastructure.
SCAI: gLite 3.2 bdii_site sometimes had slapped stuck with 100% CPUs in the default configuration not answering to any requests. As suggested on the LCG-ROLLOUT mailing list, a change in the cache parameters seems to have helped.
MPPMU encountered problems with the work of getting its new CREAM node into a pre-production system. The goal is to first have a system which is able to roll out into a system in which the site can test and get some standard nagios tests and thus can verify that everything works before the system is put into production. Further observing the problem in order to find a solution.
DESY-HH will have problem in the near future with the customers who have 32bits OS on the computers, as well as EMI-1 doesn't support 32bits UI interface.