Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "VT MPI within EGI"

From EGIWiki
Jump to navigation Jump to search
 
(37 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{Template:Community_Engagement}}
{{TOC_right}}
[[Category:Virtual_Teams|***]]
{{VirtualTeamProject |  
{{VirtualTeamProject |  
VTP_Leader = Alvaro Simon (CESGA, Spain) and Zdenek Sustr (CESNET, Czech Republic)|
VTP_Leader = Alvaro Simon (CESGA, Spain) and Zdenek Sustr (CESNET, Czech Republic)|
VTP_ML = vt-mpi at mailman.egi.eu|
VTP_ML = vt-mpi at mailman.egi.eu|
VTP_Status = Active |
VTP_Status = FINISHED |
VTP_StartDate = 10/Nov/2011 |
VTP_StartDate = 10/Nov/2011 |
VTP_EndDate = 31/May/2012 |  
VTP_EndDate = 27/Jul/2012 |  
VTP_Motivation =   
VTP_Motivation =   
Despite a dedicated SA3 activity to support MPI there still seem to be significant issues in uptake and satisfaction amongst the user communities. This VT
Despite a dedicated SA3 activity to support MPI there still seem to be significant issues in uptake and satisfaction amongst the user communities. This VT
Line 19: Line 22:
|
|
VTP_Output =  
VTP_Output =  
The VT is expected to produce the following outputs:
 
* Materials (tutorials, white papers, etc) about successful use cases of MPI on EGI that can be used by new communities to use MPI on EGI.  
The output of this project is [https://documents.egi.eu/document/1260 report] that describes the work carried out by the project, the achievements of its activities and captures the issues and actions that have been identified by the project but will be dealt with by EGI members outside of the Virtual Team project.
* An MPI VO that provides:  
 
**dedicated CPUs for MPI jobs
The project and the report covers six main areas of work to improve MPI within EGI:  
**MPI specific test probes can run on all sites using the VO Monitoring services of Ibergrid (EGI-InSPIRE VO Services group)
 
**accounting for MPI jobs
# Documentation: Improved documentation has been prepared in the EGI wiki for site administrators and for application developers. These provide guidance asto to how to configure and to use MPI resources correctly.
**user support
# Nagios probes: New monitoring probes for the EGI Service Availability Monitor  (SAM) has been defined. These will be implemented and put into production by the Heavy User Community and Operations teams.
* Improved communication channels with MPI users
# Information system: The typical problems with the registration of MPI resources have been collected and reported to Operations. The Nagios probes have been designed to be able to detect these problems.
* The above set of resources and feedback to resource centers, user communities and technology providers on how to improve MPI within EGI.
# Accounting: Issues with collecting accounting information about parallel applications have been collected and reported to responsible technology developers and providers with request for addressing.
# Batch system integration: Issues with interfacing MPI applications and some of the local batch job schedulers of EGI have been collected and addressed.
# MPI VO: A new VO which includes only correctly configured MPI sites have been setup on the production infrastructure. The VO can be used to port MPI applications to EGI. During the demo MPI members will show how many MPI resources are available in EGI and how to use them. Real MPI applications will be sent to show the capabilities of the VO.
 
The [[VT_MPI_within_EGI#Open_Actions_after_MPI_VT| list of open actions]] lists those MPI-related issues that have to be followed up by the community outside of this VT project. These actions have been already submitted to the responsible parties in EGI in the form of feedback, recommendations and software bugs. The EGI-InSPIRE SA3 MPI team will supervise overall progress with the actions and will record this in the table as well as in the EGI-InSPIRE project quarterly reports.
 
'''Report: MPI within EGI - https://documents.egi.eu/document/1260'''
 
 
== Open Actions after MPI VT ==
 
'''[[MPI_VT_Open_Actions| Open Actions after MPI VT lifetime]]'''
 
|  
|  
VTP_Tasks =
VTP_Tasks =
Line 45: Line 60:
==== Actions ====
==== Actions ====
*['''Done'''] Action 1.1 (Enol): Check an update MPI wiki to include Zdenek comments the next week.  
*['''Done'''] Action 1.1 (Enol): Check an update MPI wiki to include Zdenek comments the next week.  
*['''In progress'''] Action 1.2 (Alvaro/all): Put current MPI issues and technical information and mitigation plan into MPI VT wiki.
*['''Done'''] Action 1.2 (Alvaro/all): Put current MPI issues and technical information and mitigation plan into MPI VT wiki.
*['''Open'''] Action 1.3 (Enol): Include a MPI users section.
*['''Open'''] Action 1.3 (Enol): Include a MPI users section.


=== Task 2: Nagios probes ===  
=== Task 2: Nagios probes ===  
*'''Assigned to:''' Emir Imamagic / John Walsh / Paschalis Korosoglou
*'''Assigned to:''' Gonçalo Borges / John Walsh / Paschalis Korosoglou
Current nagios probes should be reviewed to test EGI MPI infrastructure.
Current nagios probes should be reviewed to test EGI MPI infrastructure.
*New nagios probes requirements: https://wiki.egi.eu/wiki/Nagios-requirements.html
*New nagios probes requirements: https://wiki.egi.eu/wiki/Nagios-requirements.html
Line 68: Line 83:


==== Actions ====
==== Actions ====
*['''DONE''] Action 2.1 (John W./Enol/Paschalis/Alvaro/Gonçalo): Create a new wiki section to include new MPI nagios probes specifications to be developed by SA3. Follow nagios wiki procedure to include the new probes in production.
*['''Done''] Action 2.1 (John W./Enol/Paschalis/Alvaro/Gonçalo): Create a new wiki section to include new MPI nagios probes specifications to be developed by SA3. Follow nagios wiki procedure to include the new probes in production.
**New Nagios MPI specifications wiki: https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios
**New Nagios MPI specifications wiki: https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios
*['''In progress'''] Action 2.2 (Alvaro/Enol): Create a new GOCDB  requirement, include MPI service in GOCDB. Check if it's needed different mpi services (for each flavour) or not.
*['''In progress'''] Action 2.2 (Alvaro/Enol): Create a new GOCDB  requirement, include MPI service in GOCDB. Check if it's needed different mpi services (for each flavour) or not.
**New requirement created RT: https://rt.egi.eu/rt/Ticket/Display.html?id=3396
**New requirement created RT: https://rt.egi.eu/rt/Ticket/Display.html?id=3396
**New GOCDB MPI tag naming TBD.
**New GOCDB MPI tag naming TBD.
*['''DONE'''] Action 2.3 (Alvaro): Submit a doodle to schedule Nagios MPI probes meeting.
*['''Done'''] Action 2.3 (Alvaro): Submit a doodle to schedule Nagios MPI probes meeting.
*['''Open'''] Action 2.4 (All) ''Deadline 12/03/12'': Review and comment new nagios specifications https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios
*['''Done'''] Action 2.4 (All) ''Deadline 12/03/12'': Review and comment new nagios specifications https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios


=== Task 3: Information system ===   
=== Task 3: Information system ===   
Line 126: Line 141:
                 <MPI flavour>-<MPI version>
                 <MPI flavour>-<MPI version>
                 <MPI flavour>-<MPI version>-<Compiler>
                 <MPI flavour>-<MPI version>-<Compiler>
     - GlueCEPolicyMaxSlotsPerJob is 0 or 1 or the default 9999999 (for a given )
     - GlueCEPolicyMaxSlotsPerJob is 0 or 1 or the default 9999999  
     - GlueCEPolicyMaxWallClockTime is 0 or 1 or the default 9999999
     - GlueCEPolicyMaxWallClockTime is 0 or 1 or the default 9999999
     - GlueCEPolicyMaxCPUTime < GlueCEPolicyMaxWallClockTime
     - GlueCEPolicyMaxCPUTime < GlueCEPolicyMaxWallClockTime
     - GlueCEPolicyMaxCPUTime / GlueCEPolicyMaxWallClockTime < 4
     - GlueCEPolicyMaxCPUTime / GlueCEPolicyMaxWallClockTime < 4
   
   
  The script and the produced outputs were sent to the VT-MPI mailing list.I would say the following steps here are:
  The script and the produced outputs were sent to the VT-MPI mailing list.
I would say the following steps here are:
     1./ Update MPI Wiki page on what should be published under GlueCEPolicyMaxSlotsPerJob.
     1./ Update MPI Wiki page on what should be published under GlueCEPolicyMaxSlotsPerJob.
     2./ Update MPI wiki page on the recommendation for GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime
     2./ Update MPI wiki page on the recommendation for GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime
Line 193: Line 210:


==== Actions ====
==== Actions ====
*['''DONE'''] Action 5.1 (Alvaro): Ask about batch system support issue in EMI. Raise this issue to EGI SA1/2.
*['''Done'''] Action 5.1 (Alvaro): Ask about batch system support issue in EMI. Raise this issue to EGI SA1/2.




Line 207: Line 224:
  User Community under SA3 would also be a good idea.
  User Community under SA3 would also be a good idea.
==== Actions ====
==== Actions ====
*['''In progress'''] Action 6.1 (Zdenek): Distribute and include the new MPI VO endpoint between MPI VT members, ask to MPI sites to support the new VO. Include new VO users to test MPI sites.
*['''Done'''] Action 6.1 (Zdenek): Distribute and include the new MPI VO endpoint between MPI VT members, ask to MPI sites to support the new VO. Include new VO users to test MPI sites.
** Participating resource providers:
** Participating resource providers:
*** NGI_CZ (20 cores) Stauts: configured, untested
*** NGI_CZ (20 cores) Stauts: configured, untested
Line 213: Line 230:
*** NGI_IT Stauts: contacted
*** NGI_IT Stauts: contacted
*Action 6.2 (Zdenek): Inform OMB about MPI VT status and work progress.
*Action 6.2 (Zdenek): Inform OMB about MPI VT status and work progress.
*['''DONE'''] Action 7.1 (Zdenek/Alvaro): Set an estimated end date for MPI VT.
*['''Done'''] Action 7.1 (Zdenek/Alvaro): Set an estimated end date for MPI VT.


|
|
Line 235: Line 252:
==== Useful Links ====
==== Useful Links ====
* '''Home Page''' https://www.metacentrum.cz/en/VO/MPI/index.html
* '''Home Page''' https://www.metacentrum.cz/en/VO/MPI/index.html
* '''Registration''' https://egee.cesnet.cz/mpi/registration/prihlaska_priprav.php
* '''Registration''' https://perun.metacentrum.cz/perun-registrar-cert/?vo=mpi
** _Note_: New registration URL since Feb 2013
* '''Mailing List''' [https://www.metacentrum.cz/mailman/private/mpi-kickstart/ mpi-kickstart at metacentrum.cz]
* '''Mailing List''' [https://www.metacentrum.cz/mailman/private/mpi-kickstart/ mpi-kickstart at metacentrum.cz]
'''Note! VO membership needs to be renewed annually. The nearest renewal will be required on the turn of 2013 and 2014'''
==== Environment Settings ====
==== Environment Settings ====
  VO_MPI_VOMS_SERVERS="'vomss://voms1.egee.cesnet.cz:8443/voms/mpi?/mpi'"
  VO_MPI_VOMS_SERVERS="'vomss://voms1.egee.cesnet.cz:8443/voms/mpi?/mpi'"
Line 249: Line 270:
|
|
VTP_Progress =
VTP_Progress =
=== Task 1: MPI documentation ===
=== Task 1(DONE): MPI documentation ===
* Changed and updated MPI documetation based on sites and MPI VT feedback: https://wiki.egi.eu/wiki/MAN03
* Changed and updated MPI documetation based on sites and MPI VT feedback: [https://wiki.egi.eu/wiki/MPI_User_Guide MPI User Guide] [https://wiki.egi.eu/wiki/MAN03 Admin Guide]
=== Task 2: Nagios probes ===
 
=== Task 2(DONE): Nagios probes ===
* Created new MPI nagios probes specifications available at: https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios
* Created new MPI nagios probes specifications available at: https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios
=== Task 3: Information system ===
* New MPI nagios probes are under development. [https://test23.egi.cesga.es/nagios/ Verification nagios box]
* New RT ticket to add MPI service for cream in GOCDB. [https://rt.egi.eu/rt/Ticket/Display.html?id=3396]
* New RT ticket to request OTAG new SAM MPI probes integration. [https://rt.egi.eu/guest/Ticket/Display.html?id=5022]
=== Task 3(DONE): Information system ===
* Checked current GLUE2 schema to find MPI static values.
* Checked current GLUE2 schema to find MPI static values.
**MaxSlotsPerJobs Can be used for MPI jobs:The maximum number of slots which could be allocated to a single job. This value is not filled by the current LRMS Information Providers.
**MaxSlotsPerJobs Can be used for MPI jobs:The maximum number of slots which could be allocated to a single job. This value is not filled by the current LRMS Information Providers.
**Raise a request to EMI. Include MaxSlotsPerJobs as a new value to be published by batch system IPs.
**Raise a request to EMI. Include MaxSlotsPerJobs as a new value to be published by batch system IPs.
=== Task 4: Accounting system ===
*** Raised GGUS ticket to EMI to improve IP MPI information: https://ggus.eu/ws/ticket_info.php?ticket=82902. linked with Savannah tickets for each batch system:
=== Task 5: Batch system status ===
*** LSF: https://savannah.cern.ch/bugs/index.php?95182
*** SGE: https://savannah.cern.ch/bugs/index.php?95183
*** Torque: https://savannah.cern.ch/bugs/index.php?95184
 
=== Task 4(DONE): Accounting system ===
* EGI accounting system needs improvements to implement parallel job usage records.
** This requirement was raised to EMI/JRA1: https://rt.egi.eu/guest/Ticket/Display.html?id=3328
 
=== Task 5(DONE): Batch system status ===
* MAUI issue (https://ggus.eu/ws/ticket_info.php?ticket=67870) will fixed in the next EMI2 release.
* MAUI issue (https://ggus.eu/ws/ticket_info.php?ticket=67870) will fixed in the next EMI2 release.
** LRMS is a 3th party product not updated directly by EMI members.
** LRMS is a 3th party product not updated directly by EMI members.
=== Task 6: Gather information from MPI sites ===
 
=== Task 6(DONE): Gather information from MPI sites ===
*Created the new MPI kickstart VO
*Created the new MPI kickstart VO
** CESNET and CESGA are providing resources to test the new VO.
** CESNET and CESGA are providing resources to test the new VO.
*Gathered information from NGIs. MPI survey and sites status: https://www.egi.eu/indico/conferenceDisplay.py?confId=828 29/02/2012
*Gathered information from NGIs. MPI survey and sites status:  
}}
**[https://www.egi.eu/indico/conferenceDisplay.py?confId=828 NGIs: Italy and Slovakia]
**[https://indico.egi.eu/indico/conferenceDisplay.py?confId=1075 NGI Bulgaria]
}}  
 
[[Category:Virtual_Teams]]

Latest revision as of 08:04, 18 February 2014

Engagement overview Community requirements Community events Training EGI Webinars Documentations


General Project Information

Motivation

Despite a dedicated SA3 activity to support MPI there still seem to be significant issues in uptake and satisfaction amongst the user communities. This VT

  • Works with user communities and projects that use MPI resources (e.g. ITER, MAPPER, A&A, etc) to demonstrate that MPI can work successfully in EGI.
  • Sets up a VO on EGI with sites committed to support MPI jobs.
  • Improve the communication between MPI users and developers of MPI support within EGI SA3.

Output

The output of this project is report that describes the work carried out by the project, the achievements of its activities and captures the issues and actions that have been identified by the project but will be dealt with by EGI members outside of the Virtual Team project.

The project and the report covers six main areas of work to improve MPI within EGI:

  1. Documentation: Improved documentation has been prepared in the EGI wiki for site administrators and for application developers. These provide guidance asto to how to configure and to use MPI resources correctly.
  2. Nagios probes: New monitoring probes for the EGI Service Availability Monitor (SAM) has been defined. These will be implemented and put into production by the Heavy User Community and Operations teams.
  3. Information system: The typical problems with the registration of MPI resources have been collected and reported to Operations. The Nagios probes have been designed to be able to detect these problems.
  4. Accounting: Issues with collecting accounting information about parallel applications have been collected and reported to responsible technology developers and providers with request for addressing.
  5. Batch system integration: Issues with interfacing MPI applications and some of the local batch job schedulers of EGI have been collected and addressed.
  6. MPI VO: A new VO which includes only correctly configured MPI sites have been setup on the production infrastructure. The VO can be used to port MPI applications to EGI. During the demo MPI members will show how many MPI resources are available in EGI and how to use them. Real MPI applications will be sent to show the capabilities of the VO.

The list of open actions lists those MPI-related issues that have to be followed up by the community outside of this VT project. These actions have been already submitted to the responsible parties in EGI in the form of feedback, recommendations and software bugs. The EGI-InSPIRE SA3 MPI team will supervise overall progress with the actions and will record this in the table as well as in the EGI-InSPIRE project quarterly reports.

Report: MPI within EGI - https://documents.egi.eu/document/1260


Open Actions after MPI VT

Open Actions after MPI VT lifetime

Tasks

Task 1: MPI documentation

  • Assigned to: Enol / Paschalis Korosoglou

This documentation will be reviewded and we will decide what needs updating or extending.

  • Gergely comments:
For users:
https://wiki.egi.eu/wiki/MPI_User_Guide
https://wiki.egi.eu/wiki/MPI_User_manual
https://wiki.egi.eu/wiki/Parallel_Computing_Support_User_Guide
Should be merged in a single wiki page.
For site admins:
https://wiki.egi.eu/wiki/MAN03

Actions

  • [Done] Action 1.1 (Enol): Check an update MPI wiki to include Zdenek comments the next week.
  • [Done] Action 1.2 (Alvaro/all): Put current MPI issues and technical information and mitigation plan into MPI VT wiki.
  • [Open] Action 1.3 (Enol): Include a MPI users section.

Task 2: Nagios probes

  • Assigned to: Gonçalo Borges / John Walsh / Paschalis Korosoglou

Current nagios probes should be reviewed to test EGI MPI infrastructure.

John Walsh comments:

a) A non-critcal test that tests MPI scalability above two nodes.
Ideally, I would like to see this test set to  ceiling(average number of cores) x 2 +1.
This should increase the likelihood that the job runs on multiple nodes.
This test should only be run may once or twice a week and allow at least a day for scheduling
(so as to be non-intrusive on valuable site resources).

b) Improve the baseline MPI tests.
We should test basic MPI API functionality (scatter, gather, etc), rather than the simpler "hello world".
I will try to see whether I assemble a basic test-suite.

c) Following up on https://ggus.eu/ws/ticket_info.php?ticket=76755,  I have suggested that we may not be using GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxObtainableCPUTime  properly, and that site queues may not be correctly set up for MPI jobs.
(i.e not setting torque queue resources_max.cput  and resources_max.pcput values). Perhaps we can develop a (BDII?) sanity "warning" check for this?

Actions

  • ['Done] Action 2.1 (John W./Enol/Paschalis/Alvaro/Gonçalo): Create a new wiki section to include new MPI nagios probes specifications to be developed by SA3. Follow nagios wiki procedure to include the new probes in production.
  • [In progress] Action 2.2 (Alvaro/Enol): Create a new GOCDB requirement, include MPI service in GOCDB. Check if it's needed different mpi services (for each flavour) or not.
  • [Done] Action 2.3 (Alvaro): Submit a doodle to schedule Nagios MPI probes meeting.
  • [Done] Action 2.4 (All) Deadline 12/03/12: Review and comment new nagios specifications https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagios

Task 3: Information system

  • Assigned to: Gonçalo Borges

Problems detecting MPI resources.

  • Checking for MPI availability -- mostly decided by checking installed applications.
  • Not all sites reporting MPI capability correctly

Ivan comments:

For BDII, the metrics portal checks the GlueHostApplicationSoftwareRunTimeEnvironment property for the *MPI* regular expression.

John Walsh comments:

GGUS ticket: https://ggus.eu/ws/ticket_info.php?ticket=76755
the problem seems to be related to the torque settings for pcput an cput on each of the queue.

cput = Maximum amount of CPU time used by all processes in the job.
pcput = Maximum amount of CPU time used by any single process in the job.
walltime = Maximum amount of real time during which the job can be in the running state.

So, for example, on one of the "medium"  queue on
deimos.htc.biggrid.nl, the config is:
set queue medium resources_max.cput = 24:00:00
set queue medium resources_max.pcput = 24:00:00
set queue medium resources_max.walltime = 36:00:00

This would not be sufficient to allow an 6 core job to run for a full
24 hours, and the job is likely to be removed after it has run for 4 hours.
We need to check that these queue settings are sensible for MPI jobs.

This is an interesting ticket that summarizes the handling of torque pcput (or lack off) in the infosys.
https://savannah.cern.ch/bugs/?49653

Goncalo Borges comments:

The idea of the task was to assess the status of the information published while new SAM MPI probes are not around. 
I've developed a simple (non optimized) perl script to check the status of the most important variables published 
for MPI. The algorithm is the following:
   1) Get certified sites from GOCDB
   2) Get GlueClusterUniqueID for the different sites
   3) Check which GlueClusterUniqueIDs support MPI.
       3.1) Inspect the RunTimeEnviroment
   4) Check which CEs are under a given GlueClusterUniqueID supporting MPI
       4.1) Inspect relevant GlueCE information

The script produces two files:
   - info.txt: with the relevant information for MPI per GlueClusterUniqueID / site 
               and per GlueCEInfoHostName / GlueClusterUniqueID
   - warn.txt: with the issues found per GlueClusterUniqueID / site and per GlueCEInfoHostName / GlueClusterUniqueID.

A warning  entry is added to warn.txt following the directives we have agreed to the NAGIOS probes:
   - MPI-START tag is not published for a given GlueClusterUniqueID
   - One MPI flavour tag (OPENMPI or MPICH(2), following any of the proposed formats) is not present
               <MPI flavour>
               <MPI flavour>-<MPI version>
               <MPI flavour>-<MPI version>-<Compiler>
   - GlueCEPolicyMaxSlotsPerJob is 0 or 1 or the default 9999999 
   - GlueCEPolicyMaxWallClockTime is 0 or 1 or the default 9999999
   - GlueCEPolicyMaxCPUTime < GlueCEPolicyMaxWallClockTime
   - GlueCEPolicyMaxCPUTime / GlueCEPolicyMaxWallClockTime < 4

The script and the produced outputs were sent to the VT-MPI mailing list.

I would say the following steps here are:
   1./ Update MPI Wiki page on what should be published under GlueCEPolicyMaxSlotsPerJob.
   2./ Update MPI wiki page on the recommendation for GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime
       -GlueCEPolicyMaxCPUTime > GlueCEPolicyMaxWallClockTime
       -GlueCEPolicyMaxCPUTime / GlueCEPolicyMaxWallClockTime >= 4
       -GlueCEPolicyMaxWallClockTime not equal to 0 , 1 or 9999999
   3./ It seems the right wiki where this information should be available is: 
       https://wiki.egi.eu/wiki/MAN03_MPI-Start_Installation_and_Configuration
   4./ Deliver the list of problems to SA1 together with the pointers to documentation. SA1 should then brought 
       the issues in the right forum.

Actions

  • [DONE] Action 3.1 (John Walsh/Gonçalo Borges): Until we don't have nagios probes for that, Gonçalo will contact with John to open GGUS tickets to MPI sites that are not publishing batch system info correctly.
  • [DONE] Action 3.2 (John Walsh/Enol Fernandez): Check current GLUE2 schema if it includes MPI static values.
    • MaxSlotsPerJobs Can be used for MPI jobs:The maximum number of slots which could be allocated to a single job. This value is not filled by the current LRMS Information Providers.
  • [DONE] Action 3.3 (Roberto Rosende): Raise a request to EMI. Include MaxSlotsPerJobs as a new value to be published by batch system IPs.

Task 4: Accounting system

  • Assigned to: John Gordon, Iván Díaz

Implement MPI accounting system. (JRA1.4) Ivan Comments:

No special accounting support. Only way to recognize MPI jobs is to check jobs with >100 % efficiency
-Still development to be done.
-Apel needs to give data for each batch system.

Enol Comments:

> 100% efficiency may not be true for MPI jobs. What must be checked is the number of slots. 
That would include also other parallel jobs, but I don't think that's a major issue.
Apel should already give the number of slots used by the job, this data is easily available for all batch systems.

John's Comments:

> How many corss/cpus are used by a job is not under the control of the user. The OS may move a job/procss 
between cpus/cores for its own reasons. It may also spawn system threads which run in parallel with the user process. 
By these means a superficially serial job could record in its accounting that it used multiple cores/cpus. 
The requirement below about the accounting record containing the serial/parallel nature of the job begs 
the question 'How does the accounting parser find this information?' Is this recorded in the batch logs 
so that the parser could find it? 

Requirement: #3328

"Accounting system should keep track of the type of the job: MPI or serial.
This should be recorded in the Usage Record in order to be easily queried in
the accounting repository."

Actions

  • 4/1 Create MPI accounting system (APEL and Accounting Portal).

Task 5: Batch system status

  • Assigned to: Roberto Rosende/Enol Fernandez

All batch systems must support MPI jobs. Check the current batch system status and issues. Roberto Rosende comments:

Starting work on MPI support for SGE, to be ready for EMI2
The main problem with the batch system is that it is not receiving reliable info from information system (not truly a batch system matter).

Alvaro Simon comments:

Two bugs were found during the first UMD verification of
WN/Torque + EMI-MPI.1.0. Is a torque/maui problem that affects all MPI jobs. Maui versions
prior to 3.3.4 do not allocate correctly all the nodes for the  job execution. GGUS tickets:
- https://ggus.eu/ws/ticket_info.php?ticket=57828
- https://ggus.eu/ws/ticket_info.php?ticket=67870

Actions

  • [Done] Action 5.1 (Alvaro): Ask about batch system support issue in EMI. Raise this issue to EGI SA1/2.


Task 6: Gather information from MPI sites

  • Assigned to: Zdenek Sustr

After establishing the VO, and contacting sites for resources, more requests for information can be added. Zdenek comments:

- MPI VO -- bring together sites and users interested in MPI
- This VO is NOT intended for everyday use by all users wishing to use MPI
- This VO IS intended for users who wish to cooperate with the VT to make MPI support in EGI better
- The main reason for its establishment is to collect experience that will be later adopted by regular VOs

Ivan comments:

User Community under SA3 would also be a good idea.

Actions

  • [Done] Action 6.1 (Zdenek): Distribute and include the new MPI VO endpoint between MPI VT members, ask to MPI sites to support the new VO. Include new VO users to test MPI sites.
    • Participating resource providers:
      • NGI_CZ (20 cores) Stauts: configured, untested
      • NGI_NL Stauts: contacted
      • NGI_IT Stauts: contacted
  • Action 6.2 (Zdenek): Inform OMB about MPI VT status and work progress.
  • [Done] Action 7.1 (Zdenek/Alvaro): Set an estimated end date for MPI VT.

Members

  • NGIs - confirmed:
    • BG: Aneta Karaivanova
    • CZ: Zdenek Sustr (leader)
    • ES/IBERGRID: Alvaro Simon (leader), Enol Fernandez, Iván Díaz, Alvaro Lopez, Pablo Orviz, Isabel Campos, Roberto Rosende Dopazo
    • GR: Dimitris Dellis, Marios Chatziangelou, Paschalis Korosoglou
    • HR: Emir Imamagic, Luko Gjenero
    • IE: John Walsh
    • IT: Daniele Cesini, Alessandro Costantini, Vania Boccia, Marco Bencivenni
    • PT: Gonçalo Borges
    • SK: Viera Sipkova, Viet Tran, Jan Astalos
    • UK: John Gordon
  • EGI.eu: Gergely Sipos, Karolis Eigelis, Tiziana Ferrari, Peter Solagna

Resources

VO MPI-Kickstart

The MPI-Kicktart Virtual Organization brings together sites and users inetrested in improving MPI reliability across EGI.

Useful Links

Note! VO membership needs to be renewed annually. The nearest renewal will be required on the turn of 2013 and 2014

Environment Settings

VO_MPI_VOMS_SERVERS="'vomss://voms1.egee.cesnet.cz:8443/voms/mpi?/mpi'"
VO_MPI_QUEUES=""
VO_MPI_SW_DIR="$VO_SW_DIR/mpi"
VO_MPI_DEFAULT_SE=""
VO_MPI_STORAGE_DIR=""
VO_MPI_VOMSES="'mpi voms1.egee.cesnet.cz 15030 /DC=cz/DC=cesnet-ca/O=CESNET/CN=voms1.egee.cesnet.cz mpi 24'"
VO_MPI_VOMS_POOL_PATH=""
VO_MPI_VOMS_CA_DN="'/DC=cz/DC=cesnet-ca/O=CESNET CA/CN=CESNET CA 3'"
VO_MPI_WMS_HOSTS="wms1.egee.cesnet.cz wms2.egee.cesnet.cz" 

Progress

Task 1(DONE): MPI documentation

Task 2(DONE): Nagios probes

Task 3(DONE): Information system

Task 4(DONE): Accounting system

Task 5(DONE): Batch system status

Task 6(DONE): Gather information from MPI sites

  • Created the new MPI kickstart VO
    • CESNET and CESGA are providing resources to test the new VO.
  • Gathered information from NGIs. MPI survey and sites status: