Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

DCH-RP:PoC 1 Belgium

From EGIWiki
Jump to navigation Jump to search
WP5: Proofs of Concept Scenarios PoC Phase 1 PoC Phase 2 DCH Glossary
Proof of Concept 1 | Belgium | Estonia | Hungary | Italy | Poland | Sweden


In Belgium, the first Proof of Concept will involve the following CH institutes:

  1. KIK
  2. KMKG
  3. KB
  4. RA

EGI Resources

EGI, as an e-Infrastructure provider, is supporting the DCH-RP project with resources as follows:

Grid storage Resources

Provider: Vrije Universiteit Brussel (VUB) - EGI site name "begrid-vub-ulb"
Storage: Grid SE storage, based on DPM, with SRM interface(?)

Cloud Storage

Cloud storage will be handled through the EGI Federated Clouds Task Force. The exact provider needs to be determined.

Proof of Concept scenarios

The Belgian partners would like to investigate Scenarios 1 and 2 (coming from DCH-RP Deliverable D3.1) as follows in PoC 1:

For Scenario 1, we should like to do the following: check the integrity/audit the data that is archived locally. In fact this equals a risk analysis of the chosen archiving method. We do not have to invent what needs to be done, all this is described in the document: “Trustworthy Repositories – Audit and Certification” http://www.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf More information is also available at: "Risk-analysis for E-depots:DRAMBORA" http://www.repositoryaudit.eu/

Of course we want to do this with other partners.

Not to find in the current scenarios is a grid storage solution. I think this is mentioned in the project with the use of the eCSGW. In Belgium we are very interested to test the possibilities of such a storage solution. In the sense: how fast can data be retrieved from the grid storage, how easy is it to search for the data, also for the general public (in fact for all those that already use the existing local archive today).

PoC 1 Audit and certification on local data

Combining Scenario 1 and 2 together the following PoC outline may be implemented:

KIK-IRPA, one of the Belgian DCH organisations, has already a local preservation system for their data. They have described their preservation system in a “Best practices” document that is accepted throughout the organisation. However one of their main concerns is to maintain the integrity of their data. Their exist auditing and certification schemes for trustworthy repositories, see: “Trustworthy Repositories – Audit and Certification” http://www.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf and "Risk-analysis for E-depots:DRAMBORA" http://www.repositoryaudit.eu/. Such an audit also equals a risk analysis of the chosen archiving method. How esay this all may sound, real life shows that almost no one ever terminates the whole procedure, hence there is no common “best practices” available.

Doing an audit in a consequent way requests to use the necessary tools. In this scenario we want to use existing tools and document the auditing process. We will do this on the local data that is in the KIK-IRPA preservation scheme and on data of other partners if possible. Such an audit is in fact independent of where the data is stored but it is certainly a “tool” that will be very valuable for preservation done on data stored with e-infrastructures or other storage service providers.

Once done it would be useful to execute the procedure on preservation done on grid and cloud. This could be done in the PoC 2 projects as then experience with grid and cloud in the project will be available.

Suggested test data: KIK-IRPA will use their existing repository to do the audit


SUGGESTED TEST PROCEDURES

Tools

a. Investigate the audit and certification method described inTrustworthy Repositories – Audit and Certification” http://www.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdf b. Check the existing tools available at: "Risk-analysis for E-depots:DRAMBORA" http://www.repositoryaudit.eu/ c. Check with other partners if other audit tools are available

Participation in this PoC

The work to be carried out in this scenario is not to be underestimated. ICCU has already indicated its willingness to join this PoC. Other partners will be asked to join, directly via the project partners involved in WP5 or via the discussion forum. An item on this topic will be started to obtain comments and collaboration from the DCH community.

Execution

• Choice of tools • Repartition of the work among partners • Definition of documentation method of the trial • Description of the local preserved data on which the audit is done • Execution of the audit • Prepare for doing an audit on data stored with grid or cloud

Timeline

  To be discussed with the partners
  The report on the First proof of concept is due in Month 12, this means end of October 2013
  End of May 2013: list of contributing partners is available
  May-August: choice of standards, preparation of the procedure
  September-15 October: execution of the tasks
  15-31 October: preparation of the report	

Threats • Procedure is too long • Not enough of the procedure can be done electronically • Not enough partners are found to contribute

PoC 2 Test out download and access of DCH data on grid storage

Many DCH institutions use a local solution to store their data and/or to do long term preservation. A new possibility is to store data with (existing) e-infrastructures for example store data on the grid or in the cloud. Several questions arise. Do we look at - “e-infrastructures for research” - Commercial e-infrastructures - Grid storage - Cloud storage - Do we know and control where our data is located - Can our users easily and efficiently use the data (as if it was local) In Belgium we have a grid e-infrastructure and the possibility to use grid storage on that e-infrastructure. We are interested to test out the store and access facilities of the grid storage. I other words we want to measure the access times to the grid storage while using a usual web interface. The Italia partners have made the e-Cultural Science gateway available for the project. This tool has been developed for the project Indicate and is being modified for uploading data to the grid without using the gateway storage as an intermediate step. The e-Cultural Science Gateway will be used in this case to test out grid storage.

Suggested test data: Belgian DCH institutions will upload data to the grid

SUGGESTED TEST PROCEDURES

Tools

    e-Cultural Science Gateway

Participation in this PoC

    Partners have to be found to realise this PoC.

Possible scenarios concerning the use of eCSG

1. Use the eCSG installed in Catania with storage in Catania (this solution has the advantage of a working eCSG with possibility to upload data)

2. Use the eCSG installed in Catania with storage on BEgrid (for this scenario there is a need to adapt eCSG an dcache, the storage management system in BEgrid)

3. Install the eCSG on BEgrid and use with BEgrid storage

Execution • E1: Exploit the possibility to have eCSG working with dcache • E2: Dependent on the outcome choose scenario 1 or 2 • E3: Put data on the grid storage • E4: Define the access measurement tools • E5: Define the userfriendliness measurement tool • E6: Exploit the technical requirements to install eCSG on BEgrid • E7: Depending on the outcome of E6, install the eCSG • E8: Repeat E3-E5 on the BEgrid eCSG • E9: Repartition of the work among partners • E10: Definition of documentation method of the trial

Challenges Several challenges ly ahead for this Proof of Concept - The ecSG fails to upload (a large quantity) of data - The ecSG fails to access the data efficiently - The need for belonging to an identity federation may be a major drawback for using the eCSG beyond a trial - The impossibility to install eCSG at other grid infrastructures than the Italian grid

Timeline

       To be discussed with the partners
       The report on the First proof of concept is due in Month 12, this means end of October 2013
       End of May 2013: list of contributing partners is available
       May-August: choice of standards, preparation of the procedure
       September-15 October: execution of the tasks
       15-31 October: preparation of the report	

Extra information that can be obtained during the tests- Definition of the technical requirements for a user interface to access the grid storage - Documentation of DCH data management possibilities on the grid

Extension in PoC 2 - Set up a similar test for using cloud storage