Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "VT MPI within EGI"

From EGIWiki
Jump to navigation Jump to search
Line 27: Line 27:
VTP_Tasks =   
VTP_Tasks =   
* '''Task 1:  MPI documentation:''' This documentation will be reviewded and we will decide what needs updating or extending.
* '''Task 1:  MPI documentation:''' This documentation will be reviewded and we will decide what needs updating or extending.
Comments from Gergely
**Gergely comments:
  For users:
  For users:
  https://wiki.egi.eu/wiki/MPI_User_Guide
  https://wiki.egi.eu/wiki/MPI_User_Guide
Line 38: Line 38:
* '''Task 2: Nagios probes:''' Current nagios probes should be reviewed to test EGI MPI infrastructure.
* '''Task 2: Nagios probes:''' Current nagios probes should be reviewed to test EGI MPI infrastructure.
**New nagios probes requirements: https://wiki.egi.eu/wiki/Nagios-requirements.html
**New nagios probes requirements: https://wiki.egi.eu/wiki/Nagios-requirements.html
Comments from John Walsh
** John Walsh comments:
  a) A non-critcal test that tests MPI scalability above two nodes.
  a) A non-critcal test that tests MPI scalability above two nodes.
  Ideally, I would like to see this test set to  ceiling(average number of cores) x 2 +1.
  Ideally, I would like to see this test set to  ceiling(average number of cores) x 2 +1.
Line 44: Line 44:
  This test should only be run may once or twice a week and allow at least a day for scheduling
  This test should only be run may once or twice a week and allow at least a day for scheduling
  (so as to be non-intrusive on valuable site resources).
  (so as to be non-intrusive on valuable site resources).
  b) Improve the baseline MPI tests.
  b) Improve the baseline MPI tests.
  We should test basic MPI API functionality (scatter, gather, etc), rather than the simpler "hello world".
  We should test basic MPI API functionality (scatter, gather, etc), rather than the simpler "hello world".
  I will try to see whether I assemble a basic test-suite.
  I will try to see whether I assemble a basic test-suite.
  c) Following up on https://ggus.eu/ws/ticket_info.php?ticket=76755,  I have suggested that we may not be using GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxObtainableCPUTime  properly, and that site queues may not be correctly set up for MPI jobs.
  c) Following up on https://ggus.eu/ws/ticket_info.php?ticket=76755,  I have suggested that we may not be using GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxObtainableCPUTime  properly, and that site queues may not be correctly set up for MPI jobs.
  (i.e not setting torque queue resources_max.cput  and resources_max.pcput values). Perhaps we can develop a (BDII?) sanity "warning" check for this?
  (i.e not setting torque queue resources_max.cput  and resources_max.pcput values). Perhaps we can develop a (BDII?) sanity "warning" check for this?
* '''Task 3: Information system:'''  Problems detecting MPI resources.
**Checking for MPI availability -- mostly decided by checking installed applications.
**Not all sites reporting  MPI capability correctly
** Ivan comments:
For BDII, the metrics portal checks the GlueHostApplicationSoftwareRunTimeEnvironment property for the *MPI* regular expression.
**John Walsh comments:
GGUS ticket: https://ggus.eu/ws/ticket_info.php?ticket=76755
the problem seems to be related to the torque settings for pcput an cput on each of the queue.
cput = Maximum amount of CPU time used by all processes in the job.
pcput = Maximum amount of CPU time used by any single process in the job.
walltime = Maximum amount of real time during which the job can be in the running state.
So, for example, on one of the "medium"  queue on
deimos.htc.biggrid.nl, the config is:
set queue medium resources_max.cput = 24:00:00
set queue medium resources_max.pcput = 24:00:00
set queue medium resources_max.walltime = 36:00:00
This would not be sufficient to allow an 6 core job to run for a full
24 hours, and the job is likely to be removed after it has run for 4 hours.
We need to check that these queue settings are sensible for MPI jobs.


|
|

Revision as of 14:29, 15 December 2011

General Project Information

  • Leader: Alvaro Simon (CESGA, Spain) and Zdenek Sustr (CESNET, Czech Republic)
  • Mailing List: vt-mpi at mailman.egi.eu
  • Status: Active
  • Start Date: 10/Nov/2011
  • End Date:
  • Meetings: 1. 12/12/2011 - MPI VT Management meeting

Motivation

Despite a dedicated SA3 activity to support MPI there still seem to be significant issues in uptake and satisfaction amongst the user communities. This VT

  • Works with user communities and projects that use MPI resources (e.g. ITER, MAPPER, A&A, etc) to demonstrate that MPI can work successfully in EGI.
  • Sets up a VO on EGI with sites committed to support MPI jobs.
  • Improve the communication between MPI users and developers of MPI support within EGI SA3.

Output

The VT is expected to produce the following outputs:

  • Materials (tutorials, white papers, etc) about successful use cases of MPI on EGI that can be used by new communities to use MPI on EGI.
  • An MPI VO that provides:
    • dedicated CPUs for MPI jobs
    • MPI specific test probes can run on all sites using the VO Monitoring services of Ibergrid (EGI-InSPIRE VO Services group)
    • accounting for MPI jobs
    • user support
  • Improved communication channels with MPI users
  • The above set of resources and feedback to resource centers, user communities and technology providers on how to improve MPI within EGI.

Tasks

  • Task 1: MPI documentation: This documentation will be reviewded and we will decide what needs updating or extending.
    • Gergely comments:
For users:
https://wiki.egi.eu/wiki/MPI_User_Guide
https://wiki.egi.eu/wiki/MPI_User_manual
Should be merged in a single wiki page.
For site admins:
https://wiki.egi.eu/wiki/MAN03
a) A non-critcal test that tests MPI scalability above two nodes.
Ideally, I would like to see this test set to  ceiling(average number of cores) x 2 +1.
This should increase the likelihood that the job runs on multiple nodes.
This test should only be run may once or twice a week and allow at least a day for scheduling
(so as to be non-intrusive on valuable site resources).
b) Improve the baseline MPI tests.
We should test basic MPI API functionality (scatter, gather, etc), rather than the simpler "hello world".
I will try to see whether I assemble a basic test-suite.
c) Following up on https://ggus.eu/ws/ticket_info.php?ticket=76755,  I have suggested that we may not be using GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxObtainableCPUTime  properly, and that site queues may not be correctly set up for MPI jobs.
(i.e not setting torque queue resources_max.cput  and resources_max.pcput values). Perhaps we can develop a (BDII?) sanity "warning" check for this?
  • Task 3: Information system: Problems detecting MPI resources.
    • Checking for MPI availability -- mostly decided by checking installed applications.
    • Not all sites reporting MPI capability correctly
    • Ivan comments:
For BDII, the metrics portal checks the GlueHostApplicationSoftwareRunTimeEnvironment property for the *MPI* regular expression.
    • John Walsh comments:
GGUS ticket: https://ggus.eu/ws/ticket_info.php?ticket=76755
the problem seems to be related to the torque settings for pcput an cput on each of the queue.
cput = Maximum amount of CPU time used by all processes in the job.
pcput = Maximum amount of CPU time used by any single process in the job.
walltime = Maximum amount of real time during which the job can be in the running state.
So, for example, on one of the "medium"  queue on
deimos.htc.biggrid.nl, the config is:
set queue medium resources_max.cput = 24:00:00
set queue medium resources_max.pcput = 24:00:00
set queue medium resources_max.walltime = 36:00:00
This would not be sufficient to allow an 6 core job to run for a full
24 hours, and the job is likely to be removed after it has run for 4 hours.
We need to check that these queue settings are sensible for MPI jobs. 

Members

  • NGIs - confirmed:
    • CZ: Zdenek Sustr (leader)
    • ES/IBERGRID: Alvaro Simon (leader), Enol Fernandez, Iván Díaz, Alvaro Lopez, Pablo Orviz, Isabel Campos.
    • GR: Dimitris Dellis, Marios Chatziangelou, Paschalis Korosoglou
    • HR: Emir Imamagic, Luko Gjenero
    • IE: John Walsh
    • IT: Daniele Cesini, Alessandro Costantini, Marco Bencivenni
    • PT: Gonçalo Borges
    • SK: Viera Sipkova, Viet Tran, Jan Astalos
  • EGI.eu: Gergely Sipos, Karolis Eigelis

Resources

Progress

  • Task 1
  • Task 2
  • ...
  • Task N