Difference between revisions of "UMDQualityCriteria"

From EGIWiki
Jump to navigation Jump to search
Line 78: Line 78:


=== Delegation ===
=== Delegation ===
Services which require to act on behalf of the user must follow a clear and unified delegation protocol that allows in a secure way the delegation of the user proxy, the renewal of that delegation and the revoke of the delegation.
'''Verification''': Complete test suite of the delegation protocol and its features.


the management of VO groups and attributes, assigning and removing users from those groups
=== Policy management ===
Services must allow the creation, management and removal of authorization policies. The policies will define which users can perform a certain action on a given resource.
'''Verification''': test suite must verify all the commands related to the definition of policies.


The Argus authorization service is designed to answer questions in the form of Can user X perform action Y on resource Z at this time?  Not surprisingly, two pieces of information are required to answer this question; the request that describes X, Y, and Z and the policy against which the request is evaluated. The purpose of this introduction is to provide an understanding of the logical contents of the request and the policy. Such an understanding will help in creating appropriate access control policies for a service. This introduction does not cover the command line tools, simple policy language, or underlying XACML policies used by the authorization service. That information can be found in the Policy Administration Point documentation.  
=== Policy enforcement ===
The policies defined and managed by the policy management components must be enforced in the resources.
'''Verification''': Complete test of policy enforcement in the resources that assures the consistent appliance of the policies.


== Computing Services ==
The Computing Services provide a generic interface between the grid users and the local processing resources. They offer clients the possibility of starting, monitoring and managing computational jobs. Examples of such services are glite-CREAM, lcg-CE, UNICORE XJNS or the ARC Grid Manager.


=== Definition of policies ===
=== Support for LRMS ===
Service must allow the creation, management and removal of authorization policies.  
Computing Services should be independent from the Local Resource Management System (LRMS) used at each resource. All the currently used LRMS in the NGIs must be supported by the Computing Services. Support for the LRMS must include accounting of the resources used by each user and providing accurate information about the status and availability of underlying resources.
'''Verification''': test suite must verify all the commands related to the definition of policies
'''Verification''': Test suite for each of the LRMS supported. These ''must'' include Torque/PBS, SGE and LSF.


===  
=== Parallel job support ===
A simple and common interface for jobs requesting more than one process, especially MPI jobs, must be provided. The user should be able to specify the mapping of logical processes to physical resources at the site. Explicit support for multi-core and GPU architectures may be explored.
'''Verification''': Tests that verify the possibility of executing parallel jobs. For advanced functionality (mapping of processes), check the expected behavior is always met.


=== Asynchronous notification of events ===
The Computing Services must be able to notify their clients of the events related to their jobs upon registration to a asynchronous service for notification of events.
'''Verification''': Existence of the asynchronous notification functionality. Complete test suite for this functionality.
=== Standards Compliance ===
'''Not mandatory'''
Job submission standards from OGF such as JSDL and OGSA-BES or DRMAA should be considered as interface for the Computing Services whenever possible.
'''Verification''': Standard based submission of jobs tests.


== Computing Services ==
The Computing Services provide a generic interface between the grid users and the local processing resources. They offer clients the possibility of starting, monitoring and managing computational jobs. Examples of such services are glite-CREAM, lcg-CE, UNICORE XJNS or the ARC Grid Manager.
Computing Services of UMD must meet the following criteria:
Independence from the Local Resource Management System (LRMS). The most common LRMS – Torque/PBS, SGE, LSF and Condor – used currently in the different NGIs must be supported. The services must be extensible to allow the support of new LRMS easily.
Parallel job support. A simple and common interface for jobs requesting more than one process, especially MPI jobs, must be provided. The user should be able to specify the mapping of logical processes to physical resources at the site. Explicit support for multi-core and GPU architectures may be explored.
Standards Compliance. Job submission standards from OGF such as JSDL and OGSA-BES should be considered as interface for the Computing Services whenever possible. Other standards like DRMAA may be also considered.
== Storage Services ==
== Storage Services ==
The Storage Services provide a uniform access interface to data storage resources. These services may control simple disk servers, large disk arrays, or tape-based mass storage systems which may be accessed through different protocols. Storage Services follow the SRM specification. Storm or glite-DPM are examples of Storage Services.
The Storage Services provide a uniform access interface to data storage resources. These services may control simple disk servers, large disk arrays, or tape-based mass storage systems which may be accessed through different protocols. Storage Services follow the SRM specification. Storm or glite-DPM are examples of Storage Services.
These services in UMD must comply with the next criteria:
 
=== ===
 
 
Independence from the underlying storage system and transfer protocols. UMD Storage Services should support all the common available storage systems and transfer protocols in the NGIs. Moreover, they should be extensible to allow new storage systems or protocols.
Independence from the underlying storage system and transfer protocols. UMD Storage Services should support all the common available storage systems and transfer protocols in the NGIs. Moreover, they should be extensible to allow new storage systems or protocols.
== Information Services ==
== Information Services ==
Information Services in UMD contain the resource information of the different sites of each NGI. This information is essential for the operation of the infrastructure and middleware, as resources are discovered using this service.  
Information Services in UMD contain the resource information of the different sites of each NGI. This information is essential for the operation of the infrastructure and middleware, as resources are discovered using this service.  
The Information Services acceptance criteria are:
The Information Services acceptance criteria are:
Scalability. The Information Services contain information of all the resources available in the infrastructure, which can grow considerably. Therefore, Information Services must scale gracefully with the size of the information included and do not create a bottleneck for the rest of services. Decentralized architectures and load balancing mechanisms should be considered for the implementation.
 
Interoperability. The services must be able to handle information originated from several kinds of resources. Preferably the GlueSchema standard should be used for representing the information.
=== Scalability ===
The Information Services contain information of all the resources available in the infrastructure, which can grow considerably. Therefore, Information Services must scale gracefully with the size of the information included and do not create a bottleneck for the rest of services. Decentralized architectures and load balancing mechanisms should be considered for the implementation.
 
=== Interoperability ===
The services must be able to handle information originated from several kinds of resources. Preferably the GlueSchema standard should be used for representing the information.
 
== Data Indexing Services ==
== Data Indexing Services ==
The Data Indexing Services store information about the location of physical files in the grid. They translate user level logical names into SRM locations that are handled by the Storage Services.
The Data Indexing Services store information about the location of physical files in the grid. They translate user level logical names into SRM locations that are handled by the Storage Services.
Line 111: Line 132:
== Workload Management Services ==
== Workload Management Services ==
The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay.
The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay.
The specific acceptance criteria for the Workload Management Services are the following:
 
Management of complex types of jobs. The Workload Management Services must support Collections, where a single submission creates several jobs; parametric jobs, where a single submission creates several jobs that explore a user defined parameter space. Workflows with inter-jobs dependencies should be also supported.
=== Management of complex types of jobs ===
Parallel job support. Submission of parallel jobs to the resources must be possible using the Workload Management Services. Should not be limited to MPI, it should allow submission of any other kind of jobs that exploit the potential parallelism in the resources.
The Workload Management Services must support Collections, where a single submission creates several jobs; parametric jobs, where a single submission creates several jobs that explore a user defined parameter space. Workflows with inter-jobs dependencies should be also supported.
Interoperability. The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported.
 
Management of several sources of information. Information from resources may not be centralized in a single Information Service, the Workload Management Services must be able to fetch and manage correctly the information from several sources of information.
=== Parallel job support ===
Standards Compliance. JSDL should be considered for the job description whenever possible. Job submission standards may be also considered.
Submission of parallel jobs to the resources must be possible using the Workload Management Services. Should not be limited to MPI, it should allow submission of any other kind of jobs that exploit the potential parallelism in the resources.
 
=== Interoperability ===
The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported.
 
=== Management of several sources of information ===
Information from resources may not be centralized in a single Information Service, the Workload Management Services must be able to fetch and manage correctly the information from several sources of information.
 
=== Standards Compliance ===
JSDL should be considered for the job description whenever possible. Job submission standards may be also considered.

Revision as of 10:05, 11 May 2010

Generic acceptance criteria

Documentation

Services in UMD must include a comprehensive documentation written in a uniform and clear style, which reflects all of the following items:

  • Functional description of the software.
  • User documentation, including complete man pages of the commands and user guides.
  • Administrator documentation that includes the installation procedure; detailed configuration of service; starting, stopping and querying service procedures; ports (or port ranges) used and expected connections to those ports.
  • List of processes that are expected to run, giving a typical load of the service. List of how state information is managed and debugging information (e.g.: list of log files, any files or databases containing service information)
  • Notes on the testing procedure and expected tests results.

Verification: existence of the documentation with all the required items.

Testability

UMD software must support testing of its features and functionality. The tests must ensure the correctness, completeness, security and scalability of each service. The software provider must define a test plan for each service and provide a test suite and the results of running it against each new release. Interoperability with the rest of the UMD software must be explicitly tested. A global test suite for the UMD distribution will guarantee the correct behavior of the complete set of services in a controlled environment.

Verification: existence of the test plan, test suite and the results of the test plan.

Interoperability

UMD services should be interoperable between them. All the services should assure their correct functionality within the rest of the UMD middleware before entering the distribution. Interoperability between other middleware distributions is recommended. Compliance to existing and future standards should be priority in all the distributed software.

Verification: interoperability test of the service, assuring the correct behavior within the environment.

Portability

The UMD services should be able to run correctly on multiple computer configurations. Portability should be assured between different hardware and between different operating systems. At least, the most common OS and architectures used in the different NGIs must be supported. Currently, all the services must run in 64 bit SL5 machines. The clients of the services should be supported also in common desktop platforms.

Verification: test suite must run correctly on the requested platforms. Requested platforms: 64bit Scientific Linux 5.

Security

All the UMD software must comply with strict security policies. The authorization and authentication of users must be based on open standards, such as the X.509 public key infrastructures, both coarse- (e.g. VO level) and fine- (e.g. user level) grained policies should be allowed. Whenever a service needs to act on behalf of the user, a proper delegation of the user credentials is required. The number of necessary open ports for the services should be minimized, a list of used ports and inbound/outbound connectivity requirements must be provided. Services must be able to run using non-privileged accounts and should also perform a robust input validation and error handling.

Verification: test suite must include security checks. Clear documentation of any security issues (connectivity, privileges needed to run the component, etc.)

Source Code Quality and Availability

The source code of each component of the UMD middleware should follow a coherent and clear programming style that helps in the readability of the code and eases maintenance, testing, debugging, fixing, modification and portability of the software. Open source components must publicly offer their source code and the license with the binaries.

Verification: for Open Source components, availability of the code and license. Source code quality metrics are desirable.

Availability and Reliability

Services should be available at all times. They should be scalable and be able to handle growing amounts of work in a graceful manner. Mechanisms like replication or load balancing that ensure the high availability and avoid the existence of single point of failures or bottlenecks must be included. All the services should automatically detect possible performance degradation and continue to operate properly in the event of the failure of some of its components. In order to maintain the quality of the service, they may auto disable the acceptance of new requests with a clear message to the clients stating that fact.

Verification: test suite should include scalability and stress tests that assure proper function under high load situations.

Accounting and Traceability

All the services must provide a consistent view of users or VO resource utilization. The accounting information should enable the evaluation of the resource usage and allow tracking of each user actions on the resource. The information gathered by the services must be accessible from the accounting portals deployed by the NGIs. Services must provide clear and coherent error messages that facilitate the traceability of problems. The clients must receive messages that allow a good diagnosis of the problem and they must react accordingly to the answer received.

Verification: Documentation of the accounting and traceability features of the component. Test suite must check common error situations and guarantee that the error messages are the ones expected. Accounting features must also be checked in the test suite.

Remote Management and Monitoring

Not mandatory

The services deployed should include methods of managing and monitoring their status remotely that would allow operators to react timely to problems in the infrastructure. Ideally, all the services should follow a uniform interface in order to achieve this functionality and be easily pluggable to existing monitoring systems such as Nagios.

Verification: existence of remote managing and monitoring mechanisms. Test suite must include tests on this functionality.

Extensibility

Not mandatory

Services in UMD should be designed to consider future growth. Implementation of new features while keeping the core function of the services should take an affordable level of effort and minimize the impact to the architecture of the middleware distribution. The software should include hooks and mechanisms for expanding or enhancing the system with new capabilities without having to make major changes to its architecture.

Verification: existence of hooks or mechanisms for expanding or enhancing the component. Test suite must include any extensions needed to operate the middleware in the testbed.

Specific acceptance criteria

This section will detail the specific acceptance criteria for each of the services that are part of the UMD.

Authentication and Authorization Services

Theses services provide the security infrastructure used for authentication and authorization by the rest of the UMD distribution. They allow the classification of VO members into groups and roles and provide consistent authorization decisions for these members for the services.

VO management

The services must allow the VO managers the definition of VO groups, attributes and roles. The VO managers can classify the members into these groups and roles.

Verification: Test suite for all the VO management features: definition, modification and deletion of groups, attributes and roles, and definition, modification and deletion of users.

VO proxy management

The services must allow the creation of temporary proxies for the users with complete VO information (groups, attributes, roles). Verification: Test suite for all the proxy management features of the services.

Delegation

Services which require to act on behalf of the user must follow a clear and unified delegation protocol that allows in a secure way the delegation of the user proxy, the renewal of that delegation and the revoke of the delegation. Verification: Complete test suite of the delegation protocol and its features.

Policy management

Services must allow the creation, management and removal of authorization policies. The policies will define which users can perform a certain action on a given resource. Verification: test suite must verify all the commands related to the definition of policies.

Policy enforcement

The policies defined and managed by the policy management components must be enforced in the resources. Verification: Complete test of policy enforcement in the resources that assures the consistent appliance of the policies.

Computing Services

The Computing Services provide a generic interface between the grid users and the local processing resources. They offer clients the possibility of starting, monitoring and managing computational jobs. Examples of such services are glite-CREAM, lcg-CE, UNICORE XJNS or the ARC Grid Manager.

Support for LRMS

Computing Services should be independent from the Local Resource Management System (LRMS) used at each resource. All the currently used LRMS in the NGIs must be supported by the Computing Services. Support for the LRMS must include accounting of the resources used by each user and providing accurate information about the status and availability of underlying resources. Verification: Test suite for each of the LRMS supported. These must include Torque/PBS, SGE and LSF.

Parallel job support

A simple and common interface for jobs requesting more than one process, especially MPI jobs, must be provided. The user should be able to specify the mapping of logical processes to physical resources at the site. Explicit support for multi-core and GPU architectures may be explored. Verification: Tests that verify the possibility of executing parallel jobs. For advanced functionality (mapping of processes), check the expected behavior is always met.

Asynchronous notification of events

The Computing Services must be able to notify their clients of the events related to their jobs upon registration to a asynchronous service for notification of events. Verification: Existence of the asynchronous notification functionality. Complete test suite for this functionality.

Standards Compliance

Not mandatory Job submission standards from OGF such as JSDL and OGSA-BES or DRMAA should be considered as interface for the Computing Services whenever possible. Verification: Standard based submission of jobs tests.

Storage Services

The Storage Services provide a uniform access interface to data storage resources. These services may control simple disk servers, large disk arrays, or tape-based mass storage systems which may be accessed through different protocols. Storage Services follow the SRM specification. Storm or glite-DPM are examples of Storage Services.

Independence from the underlying storage system and transfer protocols. UMD Storage Services should support all the common available storage systems and transfer protocols in the NGIs. Moreover, they should be extensible to allow new storage systems or protocols.

Information Services

Information Services in UMD contain the resource information of the different sites of each NGI. This information is essential for the operation of the infrastructure and middleware, as resources are discovered using this service. The Information Services acceptance criteria are:

Scalability

The Information Services contain information of all the resources available in the infrastructure, which can grow considerably. Therefore, Information Services must scale gracefully with the size of the information included and do not create a bottleneck for the rest of services. Decentralized architectures and load balancing mechanisms should be considered for the implementation.

Interoperability

The services must be able to handle information originated from several kinds of resources. Preferably the GlueSchema standard should be used for representing the information.

Data Indexing Services

The Data Indexing Services store information about the location of physical files in the grid. They translate user level logical names into SRM locations that are handled by the Storage Services.

Workload Management Services

The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay.

Management of complex types of jobs

The Workload Management Services must support Collections, where a single submission creates several jobs; parametric jobs, where a single submission creates several jobs that explore a user defined parameter space. Workflows with inter-jobs dependencies should be also supported.

Parallel job support

Submission of parallel jobs to the resources must be possible using the Workload Management Services. Should not be limited to MPI, it should allow submission of any other kind of jobs that exploit the potential parallelism in the resources.

Interoperability

The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported.

Management of several sources of information

Information from resources may not be centralized in a single Information Service, the Workload Management Services must be able to fetch and manage correctly the information from several sources of information.

Standards Compliance

JSDL should be considered for the job description whenever possible. Job submission standards may be also considered.