Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @

NGI International Task Review MS109 Portugal

From EGIWiki
Jump to navigation Jump to search

This page contains the assessment of the NGI International Task at year 1 of the EGI-InSPIRE project (one page per NGI). The NGI representatives are required to fill the tables according to the required information. The content will be integral part of the EGI-InSPIRE milestone MS109 "NGI International Task Review"

  • Main page with instructions
  • In order to edit this page, you need to log in with your EGI SSO account and click 'edit' on the tab (top of this page)

User Services

Human Services (Table 1)

Table 1: NGI Assessment: User Services >> Human Services
EGI-InSPIRE EGI_DS Name Assessment Score How to Improve
NA3.3N U-N-3 U-N-13 Requirements Gathering Requirements collection is done in the Ibergrid context via user support mailing contacts. It is also done via operational channels (weekly contacts, mailing contacts) since operational staff at sites represent the infrastructure / middleware closest frontend for users. 3 Document and enhance the present "gathering requirements" procedure in the WIKI to make it more formal and visible for users. Properly broadcast the communication channels available.
NA3.3N U-N-14 U-N-15 Application Database The EGI AppDB was dissiminated inside the Portuguese and IBERGRID community via emails, via the weekly operations meetings and inside the user communities. The first Portuguese application was registered. 3 The added value of the tool is out of discussion. However, the impact on the user community is not directly observed. Appdb should show a ranking of the most popular applications, and on a weekly basis, highlight some applications from the main web page. We will continue to dissiminate the AppDB inside our national community either through emails or via presentations.
NA3.3N U-N-16 U-N-17 Training A major training event is held by the Portuguese and Spanish teams during the Ibergrid conference on an yearly basis. Other training sessions are performed on request. Worthless CAs are supported in Portugal to allow tutorials in the production infrastructure. Tutorials / training on administration is done on person or using messaging communication systems. 4 Improve the training materials which were inherited from previous activities and projects.
NA3.3N U-N-12 U-N-18 U-N-19 Consultancy Portuguese NGI staff provides consultancy in the Iberic region. Consultancy may be requested via helpdesk, mail or messaging system. A first reply normally comes in less than a couple of hours, during working days. More dedicated consultancy sessions are agended on request. 4 The adopted model ensures that the load is properly shared by all involved staff, and involves redundancy.

Operations Services

Human Services (Table 2)

Table 2: NGI Assessment: Operations Services >> Human Services
EGI-InSPIRE EGI_DS Name Assessment Score How to Improve
SA1.4N O-N-9 Requirements Gathering Operational requirements collection is done in the Ibergrid context via weekly operations meetings and operation mailing contacts. Requirements to EMI 1.0 and 2.0 were already collected using these mechanisms and inserted in the EGI RT system. 3 Document the present "gathering requirements" procedure in the WIKI to make it more formal, and properly broadcast the communication channels available.
SA1.1N O-N-9 Operations Coordination A representative for operations in the Iberic region is present in the OMB meetings and follows all the proposed requests. All actions are forwarded to the regional operation contacts and discussed on weekly meetings. Dependending on the urgency, GGUS tickets may be opened on regional sites. That guarantees that sites react even if they missed emails or participation in regional operational meetings. An operation model has been established within the Ibergrid community with clear escalation steps. 4 The amount of actions and documentation that is request to NGI managers to read and provide feedback is to high given the deadlines that are imposed. This results in overloading the regional and site operation staff with a large number of simultaneous requests. NGI managers have to prioritize according to their own criteria, and most of the times reply rather late. Given the ammount of work, probably the effort in not adequate.
SA1.2N O-N-9 Security A portuguese representative assists the weekly and monthly EGI security meetings and participates on EGI CSIRT activities like shifts as EGI security officer. On a daily basis, the Portuguese NGI security officer monitors the infrastructure searching for security vulnerabilities and coordinates response to incidents in Portugal, if they exist. It also ensures that the EGI security directives are followed in due time by all Portuguese sites. The Portuguese NGI security officer has previleged communication channels with the Spanish teams under the Ibergrid context. 4 The effort on this task has a tendency to be underestimated since the infrastructure tends to increase, the technology used in the incidents tends to be more complicated and the investigation, handling and coordination tends to be more time consuming.

Infrastructure Services (Table 3)

Table 3: NGI Assessment: Operations Services >> Infrastructure Services
EGI-InSPIRE EGI_DS Name Assessment Score How to Improve
SA1.3N O-N-9 Software Rollout The portuguese NGI contributes to SR rollout activities and has deployed several gLite components before they reach productions. Some examples are several versions of gLite top-BDII, site-BDII, CreamCE, SGE_utils and VOMS server. 4
SA1.4N O-N-3 Monitoring The SAM service is shared between Portugal and Spain. SAM serving ops VO is deployed in Spain while SAM for monitoring application VOs is deployed in Portugal. The effort on maintaining such a service is not small specially in at beguinning of the service operation. Although the service is now in a mature state, sites supporting WLCG still tend to continuously compare with the WLCG SAM monitoring, and since sometimes results are not syncronized, this gives origin to complains and trust issues. 4 The general feeling is that the SAM staff is always with such a big load that they take some time to react to SAM regional problems which are normally quite urgent and with direct impact on the infrastructure A/R metrics. Therefore, we tend to suspect that more effort should be put on the support and deployment of SAM instances by the central teams.

SAM is the most important service for NGIs. Failover guidelines for the service itself as well for the services it interoperates with should be provided.

SA1.5N O-N-2 Accounting All Portuguese site have migrated to Active MQ before the end of January 2011. The enforcement of this activity was done through the Ibergrid weekly operation meetings, and tickets were opened to sites to trigger that action. Documentation was built to allow sites to publish their accounting data correctly. The efforcement task of accounting is a daily task performed by the regional operations teams. 4 Regional sites still complain that, after a long failure, the central registry takes long time gathering and processing large apel records sets. The fix of a long accounting issue may take 1-2 days to show successfull results in the monitoring tools which is something anoying. Some times, the NGI staff has to link the two parties involved (site admins and apel registry staff) to understand what is the source of the problem.
SA1.4N O-N-1 O-N-4 Configuration Repository and Operations Portal The Portuguese NGI started to use the Spanish HGSM service as a regional instance. However with the lack of integration of HGSM, its use was dropped and we returned to the central GOCDB instance, that for regional purposes, suffers from some drawbacks. We are participating on a best effort on the regionalization task force with the goal to define use cases for the regional services as GOCDB.

The regional operation dashboard was installed in the Portuguese NGI and serves the Ibergrid operation staff. We are one of the few NGI which decided to test and use its own operation dashboard.

4 We would like to evaluate a release of a regional GOCDB, something that was foreseen since the end of EGEE but never accomplished. It does not make much sense that sites that wish to remain local have to be registered in a central EGI GOCDB instance, even if some filtering is applied afterwards.

Regarding the regional dashboard, its operation is not time consuming except during updates/upgrades. The regional dashboard releases seem to lack enough testing while the documentation does not embraces all the necessary actions and changes. Nevertheless, the developmet team is always very helpfull in solving problems. Some sync problems have also be observed between regional and central dashboard.

SA1.6N SA1.7N O-N-6 O-N-7 Helpdesk Portugal NGI uses GGUS and the Ibergrid helpdesk, integrated with GGUS, deployed in Spain. 3 We expect to migrate to RT as a local Helpdesk. This service is being installed, configured and adapted to GGUS by Ibergrid spanish partner.
SA1.8N O-N-5 O-N-8 Core Services Portuguese NGI offers all kind of central services to national, iberic and international communities: WMS, LB, PX, Top-BDII, LFC, VOMS Server, AMGA, ... Together with Spanish partners, we are developing failover / replication mechanisms to some of these services. 4 The effort on operation such systems may increase periodically, specially for WMS services with some periodic failures when big ammounts of jobs are sent. Guidelines to improve the reliability, available and performance of such services should be available.

Other (Table 4)

Table 4: NGI Assessment: Other
EGI-InSPIRE EGI_DS Name Assessment Score How to Improve
NA2.3N Policy Development Portuguese representatives attend to several policy meetings (OMB, UCB, PMB, Security meetins, EugridPMA,... ), access the status of the directives provided and adapt it to the regional policies as much as possible so that they become EGI compliant. 3
NA2.2N E-N-2 Dissemination The Portuguese NGI main parter (LIP) web pages were adapted to reflect the transition to EGI. The official Portuguese NGI web pages also present a very strong dissimination content. Remaining dissimination activities include national and international presentations (at government and academic level) delivered through the year and production of dissimination materials (flyers, posters, ...) 3