UMDQualityCriteria

From EGIWiki
Revision as of 09:01, 11 May 2010 by Enolfc (talk | contribs) (Created page with '= Generic acceptance criteria = == Documentation == Services in UMD must include a comprehensive documentation written in a uniform and clear style, which reflects all of the fol…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Generic acceptance criteria

Documentation

Services in UMD must include a comprehensive documentation written in a uniform and clear style, which reflects all of the following items: Functional description of the software. User documentation, including complete man pages of the commands and user guides. Administrator documentation that includes the installation procedure; detailed configuration of service; starting, stopping and querying service procedures; ports (or port ranges) used and expected connections to those ports. List of processes that are expected to run, giving a typical load of the service. List of how state information is managed and debugging information (e.g.: list of log files, any files or databases containing service information) Notes on the testing procedure and expected tests results.

Interoperability

UMD services must be interoperable between them. All the services should assure their correct functionality within the rest of the UMD middleware before entering the distribution. Interoperability between other middleware distributions is recommended. Compliance to existing and future standards should be priority in all the distributed software.

Portability

The UMD services should be able to run correctly on multiple computer configurations. Portability should be assured between different hardware and between different operating systems. At least, the most common OS and architectures used in the different NGIs must be supported. Currently, all the services must run in 64 bit SL5 machines. The clients of the services should be supported also in common desktop platforms.

Security

All the UMD software must comply with strict security policies. The authorization and authentication of users must be based on open standards, such as the X.509 public key infrastructures, both coarse- (e.g. VO level) and fine- (e.g. user level) grained policies should be allowed. Whenever a service needs to act on behalf of the user, a proper delegation of the user credentials is required. The number of necessary open ports for the services should be minimized, a list of used ports and inbound/outbound connectivity requirements must be provided. Services must be able to run using non-privileged accounts and should also perform a robust input validation and error handling.

Availability and Reliability

Services should be available at all times. They should be scalable and be able to handle growing amounts of work in a graceful manner. Mechanisms like replication or load balancing that ensure the high availability and avoid the existence of single point of failures or bottlenecks must be included. All the services should automatically detect possible performance degradation and continue to operate properly in the event of the failure of some of its components. In order to maintain the quality of the service, they may auto disable the acceptance of new requests with a clear message to the clients stating that fact.

Extensibility

Services in UMD should be designed to consider future growth. Implementation of new features while keeping the core function of the services should take an affordable level of effort and minimize the impact to the architecture of the middleware distribution. The software should include hooks and mechanisms for expanding or enhancing the system with new capabilities without having to make major changes to its architecture.

Accounting and Traceability

All the services must provide a consistent view of users or VO resource utilization. The accounting information should enable the evaluation of the resource usage and allow tracking of each user actions on the resource. The information gathered by the services must be accessible from the accounting portals deployed by the NGIs. Services must provide clear and coherent error messages that facilitate the traceability of problems. The clients must receive messages that allow a good diagnosis of the problem and they must react accordingly to the answer received.

Testability

UMD software must support testing of its features and functionality. The tests must ensure the correctness, completeness, security and scalability of each service. The software provider must define a test plan for each service and provide a test suite and the results of running it against each new release. Interoperability with the rest of the UMD software must be explicitly tested. A global test suite for the UMD distribution will guarantee the correct behavior of the complete set of services in a controlled environment.

Remote Management and Monitoring

The services deployed should include methods of managing and monitoring their status remotely that would allow operators to react timely to problems in the infrastructure. Ideally, all the services should follow a uniform interface in order to achieve this functionality and be easily pluggable to existing monitoring systems such as Nagios.

Source Code Quality and Availability

The source code of each component of the UMD middleware should follow a coherent and clear programming style that helps in the readability of the code and eases maintenance, testing, debugging, fixing, modification and portability of the software. Open source components must publicly offer their source code and the license with the binaries.

Specific acceptance criteria

This section will detail the specific acceptance criteria for each of the services that are part of the UMD.

Authentication and Authorization Services

Theses services provide the security infrastructure used for authentication and authorization by the rest of the UMD distribution. They allow the classification of VO members into groups and roles and provide consistent authorization decisions for these members for the services.

Computing Services

The Computing Services provide a generic interface between the grid users and the local processing resources. They offer clients the possibility of starting, monitoring and managing computational jobs. Examples of such services are glite-CREAM, lcg-CE, UNICORE XJNS or the ARC Grid Manager. Computing Services of UMD must meet the following criteria: Independence from the Local Resource Management System (LRMS). The most common LRMS – Torque/PBS, SGE, LSF and Condor – used currently in the different NGIs must be supported. The services must be extensible to allow the support of new LRMS easily. Parallel job support. A simple and common interface for jobs requesting more than one process, especially MPI jobs, must be provided. The user should be able to specify the mapping of logical processes to physical resources at the site. Explicit support for multi-core and GPU architectures may be explored. Standards Compliance. Job submission standards from OGF such as JSDL and OGSA-BES should be considered as interface for the Computing Services whenever possible. Other standards like DRMAA may be also considered.

Storage Services

The Storage Services provide a uniform access interface to data storage resources. These services may control simple disk servers, large disk arrays, or tape-based mass storage systems which may be accessed through different protocols. Storage Services follow the SRM specification. Storm or glite-DPM are examples of Storage Services. These services in UMD must comply with the next criteria: Independence from the underlying storage system and transfer protocols. UMD Storage Services should support all the common available storage systems and transfer protocols in the NGIs. Moreover, they should be extensible to allow new storage systems or protocols.

Information Services

Information Services in UMD contain the resource information of the different sites of each NGI. This information is essential for the operation of the infrastructure and middleware, as resources are discovered using this service. The Information Services acceptance criteria are: Scalability. The Information Services contain information of all the resources available in the infrastructure, which can grow considerably. Therefore, Information Services must scale gracefully with the size of the information included and do not create a bottleneck for the rest of services. Decentralized architectures and load balancing mechanisms should be considered for the implementation. Interoperability. The services must be able to handle information originated from several kinds of resources. Preferably the GlueSchema standard should be used for representing the information.

Data Indexing Services

The Data Indexing Services store information about the location of physical files in the grid. They translate user level logical names into SRM locations that are handled by the Storage Services.

Workload Management Services

The Workload Management Services are responsible for the distribution and management of computational jobs across all the resources, in such a way that applications are conveniently, efficiently and effectively executed. Typical services into this category include glite-WMS, CrossBroker and GridWay. The specific acceptance criteria for the Workload Management Services are the following: Management of complex types of jobs. The Workload Management Services must support Collections, where a single submission creates several jobs; parametric jobs, where a single submission creates several jobs that explore a user defined parameter space. Workflows with inter-jobs dependencies should be also supported. Parallel job support. Submission of parallel jobs to the resources must be possible using the Workload Management Services. Should not be limited to MPI, it should allow submission of any other kind of jobs that exploit the potential parallelism in the resources. Interoperability. The different Computing Services available as part of UMD must be supported. Optionally, additional Computing Services may be supported. Management of several sources of information. Information from resources may not be centralized in a single Information Service, the Workload Management Services must be able to fetch and manage correctly the information from several sources of information. Standards Compliance. JSDL should be considered for the job description whenever possible. Job submission standards may be also considered.