Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

QosCosGrid Platform

From EGIWiki
Jump to navigation Jump to search
Technology Software Component Delivery Software Provisioning UMD Middleware Cloud Middleware Distribution Containers Distribution Technology Glossary
Template:EGIPlatforms submenu 


Platform Integrator: PSNC

Technology Provider: PSNC 
More information: http://www.qoscosgrid.org/

The QosCosGrid (QCG) platform provides advanced resource reservation and management capabilities providing users with HPC like performance and scalability. Connecting many local computing resources, QosCosGrid provides advanced monitoring and job execution capabilities for distributed and parallel C, C++, Fortran and Java applications.

Overview

QCG Platform diagram

The QCG platform comprises of four key middleware services, and several community facing services as the user's main entry points to the QCG platform. General integration with the EGI Core Infrastructure platform is provided for Security, Monitoring, Accounting and Service endpoint discovery.

The QCG Computing system forms the heart of the QCG platform providing the main job execution capabilities. When configured and extended accordingly, cross-cluster parallel jobs as well as multi-scale cross-cluster compute jobs provide the HPC like performance without being backed by a HPC system. It is typically deployed fronting one or more local compute clusters that are managed through LRMSs. QCG supports PBS, PBSPro, SLURM, LL, LSF, (S)GE, TORQUE out of the box. QCG Computing is directly integrated with the EGI Core Infrastructure Platform for Security purposes; QCG Computing accepts user authentication from the EGI's X.509v3 PKI system, and authorises user access to computing resources by generating and maintaining Gridmap files.

The QCG Notification system provides asynchronous notification of job progress to any subscribed notification consumer. The notification system is particularly useful (and required) for deployments suitable for cross-cluster compute jobs. The system supports direct end-user notification, provided she is properly subscribed as a notification consumer (out of scope for this documentation); however the typical use case of the QCG Notification system is to support workflow engines and cross-cluster coordinating services (see below) in tracking the progress of individual tasks. The main Notification Producer is the QCG Computing system, exposing a diverse set of notification topics on Compute Jobs and the QCG Computing service itself.

The QCG Broker is responsible for finding and consigning compute jobs to resources that are exposed through QCG Computing systems as per requirements of higher-level services.It does so my monitoring the state of connected QCG Computing services through corresponding QCG Notification services, and then directly submitting the job to the QCG Computing system that matches best the requirements.

The QCG platform also includes several Research Community services, represented by its most prominent member the QCG Science Gateways system. These services typically provide portal services to the consuming end user communities, but also includes a mobile client (QCG Mobile), a Windows based visualisation tool (QCG Icon). among others.

The QCG Accounting system is not a user facing service. It queries the QCG Computing system for accounting information and feeds them as OGC Usage Records into the EGI Core Infrastructure accounting system.

The QCG Platform integrates with the NAGIOS monitoring system by providing Nagios monitoring plugins (not shown). Although allowing individual independent NAGIOS instances, EGI deployments of the QCG platform will integrate with the EGI SAM system by including the QCG Nagios plugins in EGI SAM for NGI-wide deployment.

QCG Computing

Principal QCG Computing architecture

The functionality of the QCG Computing system is provided by six components; their deployment depends on the needed functionality (see below).

The middleware components are typically deployed together into a QCG Computing service. QCG Compute, Gridmapfile and LRMS/DRMAA provide the essential Compute Job and Parallel Job capabilities - QCG Computing is capable of processing parallel jobs out of the box through OpenMPI. QCG Compute provides the core job processing and management functionality. Gridmapfile is used to interface to the EGI Core Infrastructure platform through the means of X.509v3 PKI and gridmap files. the LRMS/DRMAA component provides the the integration with Local Resource Management Systems (LRMS) deployed by the Resource Provider. QCG Compute supports the following LRMSs:

  1. PBS
  2. PBSPro
  3. SLURM
  4. LL
  5. LSF
  6. (S)GE
  7. TORQUE

Multiscale & Cross-cluster Parallel Jobs

Advanced QCG Computing capabilities

A speciality of the QCG Computing platform is its capability of coordinating parallel jobs that span across multiple computing clusters, and even across multiple Resource Providers. Multiscale computing refers to deploying a complex parallel compute job across several compute resources (including HPC, HTC and clusters) that are correlated and parallel. Particularly the MAPPER project makes heavy use of this capability.

  1. . The middleware components components are The core component of the QCG Computing is theQCG Computing computing itself, the service that offers job submission capabilities to the consuming community.

Interfaces & Standards

QCG Computing: Standards based interfaces

About PSNC