EGI Quality Criteria Release 2 EMI Review
EMI Review of the EGI QC Documents (Version 2)
Original Document is available at https://twiki.cern.ch/twiki/bin/view/EMI/EmiEgiQcReviewV2
Review on the review
Compute Capabilities Quality Criteria (Marco Cecchi)
- "The test suite must include tests for all the documented functions." maybe EGI itself should define a minimum set of primitives an interface should provide - " Invalid output should throw an exception as documented" It is not clear whether the interface should be accessed directly through the network or via APIs. In both cases, the term exception could not apply everywhere, so I would speak in generic terms of "error". - EMI-ES is not listed among the possible interfaces.
- Working on exact API functionality to test.
- EMI-ES added as a possible interface (should be added to the UMD Roadmap too)
- Changed "exception" to "exception or error".
Typo: Non-empy -> Non empty
- Corrected typo
Testing job cancellation involves different scenarios depending on where the job was found in the submission chain at the moment the cancellation is triggered. This test should possibly be broken down into subtests such as: cancel a job when the job in state (submitted, waiting, scheduled, running, already cancelled, aborted, etc).
- More detail given for the tests, although no specific states given since not every system has the same ones. In the pass/fail criteria, added that is specially relevant to test when the job is running.
I would add: check if the middleware requires to have outbound connectivity on the WN and check if the job requires an identity switch. These criteria should not be blocking, but should be well known.
- Added Inbound and Outbound connectivity
- The pass/fail criteria now specifies that the appliance may be accepted if modifications exist but only if they are documented.
A list of supported BS would be expected here, as for JOBEXEC_EXECMNGR_2.
- Added list as JOBEXEC_EXECMNGR_2
Providing interactive login to remote machine also involves specific configuration on the WN. That can only depend on the site administration policies. Of course we are not speaking of interactive root access to a virtual machine here. In any case, I'd relax the original statement into something like "provide interactive, shell-like access to a worker node". That could simply be done, for example, by redirecting stdin&out over a socket.
- This Criterion is explicitly intended for interactive login (such as what gsissh provides). The case described by EMI can be considered as part of INTERACTIVE_JOB_4.
INTERACTIVE_JOB_4 encompasses what described in INTERACTIVE_JOB_3
- This is intended. There might be middleware that only does one direction communication (INTERACTIVE_JOB_3) and we want to have a criterion for it.
EMI-ES is missing. In general, most of the above comments (JOBEXEC_IFACE_1, JOBEXEC_JOB_*) apply to JOBSCH_EXEC_* items as well.
- Added EMI-ES as a possible interface.
- Corrected JOBSCH_JOB_3 (Cancel of jobs)
Submission of collection should be limited to a determined maximun number of nodes.
- Not sure what this means: maximum number of nodes in the collection? Maximum number of nodes (CEs) to test?
DAG jobs should work for all the CEs supported by the metascheduler, not only for a subset. Also, the ability to support workflows (jobs with cyclic dependencies whose exit condition is to be evaluated at run-time) should be taken into account.
- Added in the pass/fail description to test for all the supported Job Execution Interfaces supported.
- More complex workflows are to be covered by the "workflow capability" not yet covered by the criteria.
JOBSCH_WMS_1 & JOBSCH_WMS_2
it is not clear in this context the meaning of "Input from Technology Provider: Test and for checking resubmission mechanism".
"A test to submit a job and check if it is accepted or rejected, specially for big JDLs", repetead in JOBSCH_WMS_3 maybe it should be something about resubmission, i.e. the sentence in JOBSCH_WMS_1. In fact JOBSCH_WMS_3 correctly reports another: "A test to submit a job and check if it is accepted or rejected, specially for big JDLs"
- C&P error: There was a confusion in the description of criteria
Data Quality Criteria (Patrick Fuhrmann)
It might be just the phrasing but in some cases the section "Input from Technology Providers" refers to "Test to check ..." in other cases the documents reads "Test suite for ...". The technology providers (which translates to PT's in the EMI world I guess) are not supposed to provide test suites. Testing is done within the framework of the PT and is reported in order to get the packages certified by EMI. PT's are happy to provide information on what is tested but a test suite again would be a product and as such would undergo the same procedures as a 'normal' product and would have to be negotiated between EGI and EMI. As an example: 2.1.1 "It must include tests for all the documented functions." That is certainly envisioned but rather naive and an enormous effort. The LFC is just an example. We should make this a medium term goal. We should focus on 'most used' funcationality first. Please keep in mind that we all only have limited efforts which needs to be used in a focused way.
- As discussed with EMI, if test are provided then EGI will use those, otherwise EGI will create its own tests.
- A general rephrasing was done in documents to avoid "complete" API testing. Exceptions should be documented
1.1 and 1.2
The sentence "Data Access Appliances must implement (at least one of) the OGSA-DAI and WS-DAI realizations and support all the functionality included in the interfaces" is not correct. Not all Data Access Appliances are supposed to provide those interfaces. The EMI Storage Elements are data access appliances but don't and won't those interfaces. Please rephrase.
- For EGI, Data Access Appliances are those which implement the *-DAI interfaces. Storage Elements are under another category.
2.2.2 Amga functionality : METADATA_AMGA_FUNC_2
This is a mixing of creating entries and managing attributes in METADATA_AMGA_FUNC_2. We would suggest : Test1: Create a new entry. List entry. Test2: Create a new set of entries. List entries. Test3: Remove existing entry. List entry. (failed)
- Changed tests as suggested
Generic Quality Criteria (Alberto Aimar)
Online help mandatory: Currently the online help or the --h command options is not mandatory but most command have it.
API documentation mandatory: Currently the API documentation is not a mandatory document of our documentation review. But of course all public APIs are documented.
All OK. Functional Description, Release Notes, User Doc, Admin Doc are all also mandatory in the EMI Documentation Policy.
- Documentation has an special status. Although it is mandatory, Products without it may pass verification if they are really needed. This is currently being changed in the QC.
Software Licenses are required and tracked by EMI. But what are the EGI compatible licenses for using them on the EGI Infrastructure? How can the TP know that?
- Already detected issue, we are working on the exact licenses (mostly any OSI accepted ones)
The part about clear and readable code is a bit generic. But all EMI code is publicly available.
- Changed the Pass/Fails criteria to just check if the source is available. The rest left as description.
Similar requirement exists in EMI. The big question here is the granularity of testing. Now V2 contains mention of this comment.
- The verifier will determine if the new features/bug fixes testing is enough for each release. Right now there is no other alternative to make this a more objective criterion.
This is new compared to Version 1. Should be a ticketing submission channel not a "bug tracker" where EGI submits bugs. EGI does not submit to the trackers of EMI but submits GGUS tickets and some are bugs others are request of clarifications, etc.
- This criterion has changed: already existing TP should not worry, since they are enlisted as 3rd line support in GGUSS, this will be turned into a guideline for new TPs.
- Moved to MISC category as GENERIC_MISC_2
Service control and status commands. This is not a specified requirement for EMI services at the moment.
Log files. This is not a specified requirement for EMI services at the moment.
No such requirement is explicitly part of EMI policies. Nevertheless, Product Teams are required to perform scalability and performance tests as part of their product certification.
- EGI needs working services so this must be kept in the criteria.
Not part of the current EMI requirements.But should be added to the EMI requirements.
Information Capabilities Quality Criteria (Laurence Field)
In general these requirement are desribed in a very simplistic way. I would recommend that they are revised and written in more detail.
- The SA2.2 team is finding people with more experience in the info capabilities to come up with better criteria for them. This will be accomplished in the next release.
The description states that "Information exchanged in the EGI Infrastructure must conform to GlueSchema". What information does this refer to? Any information that is exchanged or a subset? In "Input from Technology Provider" it states, "Test that the information published by the product conforms to the GlueSchema v1.3 Technology and v2.0 (optionally)" However in "Pass/Fail Criteria" it states "Information published must be available in GlueSchema v1.3 and GlueSchema v2". This is a contradition.
- Rephrased most of the criterion. Now it should be clearer.
The description states that "Information published by the appliance must be available through LDAPv3 protocol". What is an applicance and what information does this refer to?
- Glossary: EGI_QA_Glossary
- This criterion basically states that we need information discovery appliances to publish using LDAPv3.
INFODISC_AGG_1 & _2
The description states what the software must not do. It should define what the software must do. Filtering out irrelevant information is not really a requirement. Provding only relvant information is a requirement. In general I think that this requirement does not make sense and needs to be revised.
In general I think that this requirement does not make sense and needs to be revised.
Comments: These requirements try to capture the following:
- the information system must be able to collect and publish information from several sources (INFODISC_AGG_2) and,
- if the admin wants, it should be possible to define some pieces of information that must not be published (INFODISC_AGG_1), e.g. if a CE on a site is failing for a specific VO, this specific CE for this VO can be removed, however for any other VO it should remain.
Will rephrase to clarify them.
The description states "Information Discovery appliances must be able to handle big amounts of data". How big is big? How many data sources, how much data, how many changes, in what data format etc. This requirement is too simplistic and needs to be revised.
- Big is as stated in the QC: enough to handle the current information of the whole EGI.eu infrastructure.
Description The description statsthat the "information discovery service should be able to handle load under realistic conditions." What are realistic loads? For the Pass/Fail Criteria, where does the value of 20 come from? Is this realistic?
- Will review this value with operation people input.
In the Pass/Fail Criteria, it statest that either JMS 1.1 or AMQP must be supported. As far as I am aware the recomendation from EMI is to use STOMP.
- The selected APIs were taken from the UMD Roadmap. Will consider STOMP for inclusion in the Roadmap.
Operations Capabilities Quality Criteria (Laurence Field)
In gerneral I have a concern about the style of this document. What is being described? Requirements, Tests , Sepcification etc. The current style seems to mix these concepts and as such makes it difficult to understand. The desciptions need to be improved to clear be inline with what is being described. From the phrasing it should be clear that this is a a test or a requirement. Everything should also be made more abstract. For example it should state that dataset A should be moved to point B by time T rather than requiring that cron is used to publish accounting information.
Comments being digested by the people responsible for the document.
- This is a set of requirements and criterions to asses the EGI quality criteria. Some requirements can be tested or checked by a functional test but not all of them. Maybe this point can be clarified (if a test is needed or not) in each QC to avoid any confusion.
- We need more abstraction in some points. Remove the cron sentence and write a generic description.
More details are required for the Test Description. What information? Which database? How do we know it is working?
- NCG must generate /etc/nagios configuration files after its execution. /etc/nagios/wlcg.d/* files must be generated based on the information gathered from GOCDB database and the grid information system.
- nagios service must be started without errors after each ncg configuration.
The description does not sound like a requirement. The "NGI has to understand", isn't that and NGI issue? More details are required for the Test Description. What information? Which database? Test contains no deatils now to test for redundency which seems to be the perpose of the test. How do we know it is working?
- The first phrase can be changed by "NCG must allow failover configurations" to improve nagios service availability.
- Check if nagios is still submitting tests after a shutdown of a redundant service: WMS, VOMS or myproxy.
Ok, but some phrasing could be improved for clarity.
- Change the phrase to be more clear.
How fast is fast, how soon is soon, how much is too much? Please be more specific.
- No more than 5 seconds, but this is not a fixed value. Depends on the view, ngi, the kind of request and may vary during the course of the project.
Ok, but some phrasing could be improved for clarity.
- Change the phrase to be more clear.
"Pass/Fail" Check sentence.
- Changed to "Probes must exist and behave as expected in the probe documentation."
ACC_JOBEXEC_1 / ACC_JOBSCH_1
All the actions??? Are you sure?
- Removed the "all", clarified the sentence.
Security Capabilities Quality Criteria (John White)
Is this a "Quality" document describing the quality of software or a set of requirements? It reads more like a set of requirements. The test requirements themselves look reasonable and pretty much mirror the test suites that are used in EMI certification of security components.
- They are both functional and non-functional requirements
- Most of the tests come from EMIs test plans.
One question that comes to mind is that is EGI expecting a new set of test-suites to run in a particular framework or will they use the results of EMI certification? Statement that the document as "OK" does not imply acceptance of re-writing/reformatting our tests for EGI.
- EGI will repeat tests if the verifier considers that is needed (major releases mostly), but does not require re-writing/reformatting
- Ideally EMIs test framework can be reused.
This follows the previous (EGEE/LCG) policy documents? If so, OK.
- Will check, but it should.
"PDPs must support the XACML interface" This is a bit general. Internally Argus uses XACML but the WHOLE XACML spec is not used.
- The verifier will determine if the coverage of the interface is enough.
- We are working on better specification of the API QC, to explicitly say which parts of the interface are expected and which not.
Storage Capabilities Quality Criteria (Patrick Fuhrmann)
Same remark as wtih 'DATA' : It might be just the phrasing but in some cases the section "Input from Technology Providers" refers to "Test to check ..." in other cases the documents reads "Test suite for ...". The technology providers (which translates to PT's in the EMI world I guess) are not supposed to provide test suites. Testing is done within the framework of the PT and is reported in order to get the packages certified by EMI. PT's are happy to provide information on what is tested but a test suite again would be a product and as such would undergo the same procedures as a 'normal' product and would have to be negotiated between EGI and EMI. Just to avoid misunderstandings : Different storage software providers in EMI provide different file access mechanisms at different times within EMI-2 and later. (Especially FILETRANS_API_3). Consequently, testing a capability must not be applied before the software provider adds this capability to the release notes as being available for production usage.
- Same comment as earlier, test suite no longer required. Complete API not required.
- Applicability for FILETRANS_API_3 is only to those implementations providing WebDAV, if the SE does not have such feature, it will not be tested.
2 File Access (a remark on POSIX)
Typo : The ID : FILEACC_API_1 is used twice.
Here as well it is important to only check capabilities which are described as available. So some storage elements may not allow modification of existing data (as it is on tape already). Same is true for 'append' as described in the "Input of the Technology Providers" of FILEACC_API_2 (which is actually the second occurrance of FILEACC_API_1) FILETRANS_API_2 : Although we expect FTS to support http(s) at some point, this is not yet agreed and must only be tested if officially declared 'available for production usage'. FILETRANS_API_3 : same as FILETRANS_API_2.
- We take this into account during verification.
5.1 SRM interface STORAGE_API_1
The bit with SRM is tricky. The sentence "Execute a complete test suite for the SRM v2.2 that covers all the specification" has two requirements : Test complete Test Suite: There is a group of storage providers covering CASTOR, DPM, BestMAn, StoRM and dCache which make sure the existing SRM infrastructure doesn't break. This goups agreed to use the S2 Test as a basis for this effort. So I would strongly suggest, EGI uses the official S2 Test Suite provided by that group. (Please talk to Andrea Sciaba, CERN for details) All the specification: That's a bit naive. Each SRM provider will offer a set of tests which the corresponding storage element can be tested against. I don't want to discuss this further in this wiki. Here again, EGI should contact the SRM Test Group leader (Andrea) STORAGE_API_2 "Related Information" already refers to this issue.
- see above, rephrased.
- Recommend S2 for verification
Incomplete sentence in Pass/Fail criteria : Exceptions to the specification may (What ?).
- Removed sentence
Ok if you are talking about BDII and Glue1.3/2.0
- Current infomodel/discovery uses Glue 1.3/2.0 and ldap.