Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Workload Manager

From EGIWiki
Revision as of 14:07, 5 June 2018 by Ychen (talk | contribs)
Jump to navigation Jump to search

EGI Workload Manager

Overview

EGI Workload manager (also known as DIRAC4EGI) is a service is provided to the EGI community as

  • a workload management service used to distribute the users' computing tasks among the available resources both HTC and cloud.
  • service for managing massively distributed data.

Main features

Workload Manager provides a Workload Management Service (WMS) for High Throughput Computing resources based on DIRAC, which improves the general job throughput compared with native management of grid computing resources. Cloud computing resources are managed as well in a uniform and transparent way for the users.

  • Workload Manager configuration allows to choose appropriately computing and storage resources maximising their usage efficiency for particular user requirements.
  • Workload Manager File Catalogue includes replica, metadata and provenance functionality simplifying the development of scientific application accessing data in distributed environments.
  • All the Workload Manager functionality is accessible through friendly user interfaces, including a Web Portal. It has an open architecture and allows easy extensions for the needs of particular applications.

DIRAC data and job management systems ensure proven production scalability up to peaks of more than 100 thousand concurrently running jobs for the LHCb experiment. This is by far large enough for the computing requirements of environmental science in a sensible temporal horizon.

Targeting User Groups

The service suits for the established Virtual Organization communities, long tail of users, SMEs and Industry

  • EGI and EGI Federation participants
  • Research communities

This service platform eases scientific computing by overlaying distributed computing resources in a transparent manner to the end-user. For example, WeNMR, a structured biology community, uses DIRAC for a number of community services, and reported an improvement from previous 70% to 99% with DIRAC job submission. The benefits of using this service include but not limited to :

  • Maximize usage efficiency by choosing appropriately computing and storage resources on real-time
  • Large–scale distributed environment to manage and handle data storage, movement, accessing and processing
  • Handle job submission and workload distribution in a transparent way
  • Interoperable, handle different storage supporting both cloud and grid capacity
  • User-friendly interface that allows to choose among different DIRAC services, manage the complete lifecycle from search of data to processing analysis


Technical Service Architecture

The DIRAC framework offers standards rules to create DIRAC extension, large part of the functionality is implemented as plugins and it allows to customize the DIRAC functionality for a particular application with minimal effort. It provides multiple commands usable in a Unix shell giving access to all the DIRAC functionalities. It also provides a RESTful API suitable for use with application portals (e.g. WS-PGRADE portal is interfaced with DIRAC this way) and a Python programming interface is the basic way to access all the DIRAC facilities and to create new extensions. The main components are:

  • Workload Management System (WMS) architecture is composed of multiple loosely coupled components working together in a collaborative manner with the help of a common Configuration Services ensuring reliable service discovery functionality. Modular architecture allows to easily incorporate new types of computing resources as well as new task scheduling algorithms in response to evolving user requirements. DIRAC services can run on multiple geographically distributed servers which increases the overall reliability and excellent scalability properties.
  • Data Management System (DMS) architecture includes plugins for various storage technologies and allows to organize distributed data on uniform logical storage elements from the user perspective. Centralized File Catalogue service keeps track of all the physical copies of data files providing functionality similar to a global distributed file system. Each user community can have individual File Catalogue to minimize undesired interferences and to define specific data access policies for their users.
  • Web portal provides simple and intuitive access to most of the DIRAC functionalities including management of computing tasks and distributed data. It also has a modular architecture designed specially to allow easy extension for the needs of particular applications.


Get Starts

  • Requests for service:


References