The wiki is in the process of being deprecated and migrated to other supports.

Difference between revisions of "UMDQualityCriteria"

From EGIWiki
Jump to navigation Jump to search
Line 165: Line 165:
The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay.
The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay.


=== Interoperability ===
=== Support for Computing Services ===
The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported.
The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported.



Revision as of 11:05, 11 May 2010

Generic acceptance criteria

Documentation

Services in UMD must include a comprehensive documentation written in a uniform and clear style, which reflects all of the following items:

  • Functional description of the software.
  • User documentation, including complete man pages of the commands and user guides.
  • Administrator documentation that includes the installation procedure; detailed configuration of service; starting, stopping and querying service procedures; ports (or port ranges) used and expected connections to those ports.
  • List of processes that are expected to run, giving a typical load of the service. List of how state information is managed and debugging information (e.g.: list of log files, any files or databases containing service information)
  • Notes on the testing procedure and expected tests results.

Verification: existence of the documentation with all the required items.

Testability

UMD software must support testing of its features and functionality. The tests must ensure the correctness, completeness, security and scalability of each service. The software provider must define a test plan for each service and provide a test suite and the results of running it against each new release. Interoperability with the rest of the UMD software must be explicitly tested. A global test suite for the UMD distribution will guarantee the correct behavior of the complete set of services in a controlled environment.

Verification: existence of the test plan, test suite and the results of the test plan.

Configuration

Tools for the automatic or semi-automatic configuration of the services must be provided with the software. These tools should allow the unassisted configuration of the services for the most common use casses while being customizable for advanced use cases. Complete manual configuration must be always allowed.

Verification: test suite must include the configuration mechanisms and tools. Yaim is considered as the preferred tool.

Interoperability

UMD services should be interoperable between them. All the services should assure their correct functionality within the rest of the UMD middleware for a given major release before entering the distribution. Ideally, backward compatibility between major releases should be kept. Interoperability between other middleware distributions is recommended, therefore compliance to existing and future standards should be priority in all the distributed software.

Verification: interoperability test of the service, assuring the correct behavior within the environment.

Portability

The UMD services should be able to run correctly on multiple computer configurations. Portability should be assured between different hardware and between different operating systems. At least, the most common OS and architectures used in the different NGIs must be supported. Currently, all the services must run in 64 bit SL5 machines. The clients of the services should be supported also in common desktop platforms.

Verification: test suite must run correctly on the requested platforms. Independent test suite may be considered for clients of the services. Requested platforms: 64bit Scientific Linux 5.

Security

All the UMD software must comply with strict security policies. The authorization and authentication of users must be based on open standards, such as the X.509 public key infrastructures, both coarse- (e.g. VO level) and fine- (e.g. user level) grained policies should be allowed. Whenever a service needs to act on behalf of the user, a proper delegation of the user credentials is required. The number of necessary open ports for the services should be minimized, a list of used ports and inbound/outbound connectivity requirements must be provided. Services must be able to run using non-privileged accounts and should also perform a robust input validation and error handling.

Verification: test suite must include security checks. Clear documentation of any security issues (connectivity, privileges needed to run the component, etc.)

Source Code Quality and Availability

The source code of each component of the UMD middleware should follow a coherent and clear programming style that helps in the readability of the code and eases maintenance, testing, debugging, fixing, modification and portability of the software. Open source components must publicly offer their source code and the license with the binaries.

Verification: for Open Source components, availability of the code and license. Source code quality metrics are desirable.

Availability and Reliability

Services should be available on a 24/7 basis. Moreover, they should be able to handle growing amounts of work in a graceful manner. Mechanisms like replication or load balancing that ensure the high availability and avoid the existence of single point of failures or bottlenecks should be included. All the services should automatically detect possible performance degradation and continue to operate properly in the event of the failure of some of its components. In order to maintain the quality of the service, they may auto disable the acceptance of new requests with a clear message to the clients stating that fact.

Verification: test suite should include scalability and stress tests that assure proper function under high load situations.

Accounting and Traceability

All the services must provide a consistent view of users or VO resource utilization. The accounting information should enable the evaluation of the resource usage and allow tracking of each user actions on the resource. The information gathered by the services must be accessible from the accounting portals deployed by the NGIs. Services must provide clear and coherent error messages that facilitate the traceability of problems. The clients must receive messages that allow a good diagnosis of the problem and they must react accordingly to the answer received.

Verification: documentation of the accounting and traceability features of the component. Test suite must check common error situations and guarantee that the error messages are the ones expected. Accounting features must also be checked in the test suite.

Remote Management and Monitoring

Not mandatory

The services deployed should include methods of managing and monitoring their status remotely that would allow operators to react timely to problems in the infrastructure. Ideally, all the services should follow a uniform interface in order to achieve this functionality and be easily pluggable to existing monitoring systems such as Nagios.

Verification: existence of remote managing and monitoring mechanisms. Test suite must include tests on this functionality.

Extensibility

Not mandatory

Services in UMD should be designed to consider future growth. Implementation of new features while keeping the core function of the services should take an affordable level of effort and minimize the impact to the architecture of the middleware distribution. The software should include hooks and mechanisms for expanding or enhancing the system with new capabilities without having to make major changes to its architecture.

Verification: existence of hooks or mechanisms for expanding or enhancing the component. Test suite must include any extensions needed to operate the middleware in the testbed.

Specific acceptance criteria

This section will detail the specific acceptance criteria for each of the services that are part of the UMD. Here a link to each test plan should be included.

Authentication and Authorization Services

Theses services provide the security infrastructure used for authentication and authorization by the rest of the UMD distribution. They allow the classification of VO members into groups and roles and provide consistent authorization decisions for these members for the services.

VO management

The services must allow the VO managers the definition of VO groups, attributes and roles. The VO managers can classify the members into these groups and roles.

Verification: test suite for all the VO management features: definition, modification and deletion of groups, attributes and roles, and definition, modification and deletion of users.

VO proxy management

The services must allow the creation of temporary proxies for the users with complete VO information (groups, attributes, roles).

Verification: test suite for all the proxy management features of the services.

Delegation

Services which require to act on behalf of the user must follow a clear and unified delegation protocol that allows in a secure way the delegation of the user proxy, the renewal of that delegation and the revoke of the delegation.

Verification: complete test suite of the delegation protocol and its features.

Policy management

Services must allow the creation, management and removal of authorization policies. The policies will define which users can perform a certain action on a given resource.

Verification: test suite must verify all the commands related to the definition of policies.

Policy enforcement

The policies defined and managed by the policy management components must be enforced in the resources.

Verification: complete test of policy enforcement in the resources that assures the consistent appliance of the policies.

Computing Services

The Computing Services provide a generic interface between the grid users and the local processing resources. They offer clients the possibility of starting, monitoring and managing computational jobs. Examples of such services are glite-CREAM, lcg-CE, UNICORE XJNS or the ARC Grid Manager.

Support for LRMS

Computing Services should be independent from the Local Resource Management System (LRMS) used at each resource. All the currently used LRMS in the NGIs must be supported by the Computing Services. Support for the LRMS must include submission, and management of jobs, accounting of the resources used by each user and providing accurate information about the status and availability of underlying resources.

Verification: test suite for each of the LRMS supported. The test must explicitly check the correctness of the usage and status information provided. These must include Torque/PBS, SGE and LSF.

Parallel job support

A simple and common interface for jobs requesting more than one process, especially MPI jobs, must be provided. The user should be able to specify the mapping of logical processes to physical resources at the site. Explicit support for multi-core and GPU architectures may be explored.

Verification: tests that verify the possibility of executing parallel jobs. For advanced functionality (not mandatory), complete checking of the features.

Asynchronous notification of events

The Computing Services must be able to notify their clients of the events related to their jobs upon registration to a asynchronous service for notification of events.

Verification: existence of the asynchronous notification functionality. Complete test suite for this functionality.

Standards Compliance

Not mandatory

Job submission standards from OGF such as JSDL and OGSA-BES or DRMAA should be considered as interface for the Computing Services whenever possible.

Verification: standard based submission of jobs tests.

Storage Services

The Storage Services provide a uniform access interface to data storage resources. These services may control simple disk servers, large disk arrays, or tape-based mass storage systems which may be accessed through different protocols. Storage Services follow the SRM specification. Storm or glite-DPM are examples of Storage Services.

SRM compliance

All the Storage Services must follow the SRM specification.

Verification: complete test of the support of SRM specification.

Transfer protocol support

UMD Storage Services should support all the common available storage systems and transfer protocols in the NGIs. Ideally they should be extensible to allow new protocols.

Verification: test of the support of supported transfer protocols. Minimum required protocols: gridftp and FTS.

Storage system support

Different storage systems should be supported by these services. Accounting of the resources used by each user and accurate information about the status and availability of underlying resources must be provided.

Verification: test of the storage system support that explicitly checks the correctness of the usage and status information provided. Required storage systems support: POSIX filesystems, tape based filesystems.

Information Services

Information Services in UMD contain the resource information of the different sites of each NGI. This information is essential for the operation of the infrastructure and middleware, as resources are discovered using this service.

Scalability

The Information Services contain information of all the resources available in the infrastructure, which can grow considerably. Therefore, Information Services must scale gracefully with the size of the information included and do not create a bottleneck for the rest of services. Decentralized architectures and load balancing mechanisms should be considered for the implementation.

Verification: scalability and stress test of the services.

Correctness

Information published should be correct and accurate.

Verification: test suite for the correctness of the information.

Interoperability

The services must be able to handle information originated from several kinds of resources. Preferably the GlueSchema standard should be used for representing the information.

Verification: interoperability tests with the different kind of services of UMD. Minimum services to test: Computing Services, Storage Services, Data Management Services and Workload Management Services.

Data Management Services

The Data Management Services store information about the location of physical files in the grid. They translate user level logical names into SRM locations that are handled by the Storage Services.

Workload Management Services

The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay.

Support for Computing Services

The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported.

Verification: tests of submission and managing jobs for each of the Computing Services available in UMD. Tests must also assure that the all the available Computing Services can be matched to jobs.

Management of complex types of jobs

Collections (single submission for several related jobs), Parametric jobs (jobs that explore a user defined parameter space) and Workflows (jobs with inter-jobs dependencies) must be supported.

Verification: tests for each of the complex type of jobs supported that guarantee the correct behavior.

Parallel job support

Submission of parallel jobs to the resources must be possible using the Workload Management Services. Should not be limited to MPI, it should allow submission of any other kind of jobs that exploit the potential parallelism in the resources.

Verification: tests for parallel jobs which verify proper support for these jobs.

Management of several sources of information

Information from resources may not be centralized in a single Information Service, the Workload Management Services must be able to fetch and manage correctly the information from several sources of information.

Verification: test of all functionality using different sources of information.

Standards Compliance

Not mandatory

JSDL should be considered for the job description whenever possible. Job submission standards may be also considered.

Verification: standard based submission tests.