Canadian Advanced Network for Astronomical Research
Community Short Name if any
The Canadian Advanced Network for Astronomical Research (CANFAR) is a computing infrastructure for astronomers. CANFAR aims to provide to its users easy access to very large resources for both storage and processing, using a cloud based framework. CANFAR allows astronomers to run processing jobs on a set of computing clusters, and to store data at a set of data centres. (From http://www.canfar.net/about)
The main objectives of the community include:
- Manage large astronomical and astrophysical data sets,
- Allow users to share the data sets between European and Canadian infrastructures,
- Provide means for data set querying using FITS metadata,
- Enable running computations on large data sets.
Main Contact Institutions
Instituto Nazionale di Astrofisica (INAF), Via G.B. Tiepolo, 11 I-34143 Trieste, Italy - Tel. +39 040 3199 111 - email@example.com
Giuliano Taffoni <firstname.lastname@example.org>
The main problem related to CANFAR case study is that European A&A community has only storage infrastructure, without computing, which is available in the Canadian A&A cloud. The typical observation files are very large, and thus very expensive to transfer to computational sites. After the data is made public (typically after 1 year) it should be replicated between European and Canadian cloud storage.
The main objectives of this community with respect to Open Data Cloud include:
- Establish close collaboration between European and Canadian astronomy and astrophysics (A&A) communities,
- Enable sharing large volumes of astronomical observation and simulation data according to agreed policies (e.g. after 1 year of publication data should be public)
- Enable replication of data between Canadian and European Cloud storage infrastructures.
UC1: User data is made public automatically after 1 year. Actors:
- Principal Investigator who created the original data.
- Access to the data is automatically enabled after 1 year from creation by the Open Data Platform
- Data is replicated between CANFAR and EGI infrastructures
- Data is available through EGI Open Data Platform portal
- Data transfers are initiated manually
UC2: User wants to find publicly available data set. Actors:
- Community user interested in accessing particular observation data set.
- User enters in the community portal query specifying selected FITS metadata key/value pairs. Matching data sets are located, and filtered based on the privacy ACL’s set for data (all public data sets matching the query will be returned to any user.)
Problems to be solved:
- Enable automatic ACL modification based on time of data creation
- Enable public access over specified protocols (e.g. HTTP, FTP) of the public files
- Astronomical and astrophysical observation raw data (FITS format, includes ASCII header and binary CCD data)
- Astronomical and astrophysical observation pre-processed data (e.g. optimized volume)
- Astronomical and astrophysical simulation data
~1TB (one night observation)
Data collection size
FITS (Flexible Image Transport System)
Metadata in the files is located in the headers of FITS files, and also indexed in external SQL database for lookup
Standards in use
Italian sites and Canadian sites
Data management plan
Data is typically owned by Principal Investigator for 1 year, after which it should be made public. The PI can also process the data, pre-process it to reduce its volume.
For 1 year after creation the policy is defined by the Principal Investigator, i.e. she can decide who can access the data. After 1 year the data should be publicly available.
Metadata for FITS files is stored in an ASCII header of each file in a simple list of key/value pairs with optional comments
Metadata is stored in key/value pairs, metadata identifiers are simple abbreviated strings, e.g. ORIGIN, LPKTTIME, NAXIS, etc.
Small in comparison to actual data, typically up to a 100 key/value pairs per file
ASCII text header in the beginning of each FITS file, with key/value pairs with optional comments.
Standards in use
Metadata is located in the header of each FITS file, as well as indexed in relational databases for data discovery.
CANFAR (Canadian Advanced Network for Astronomical Research) is composed of:
- Canadian National Research Network (CANARIE)
- Cloud processing and storage (Cloud Canada)
- Canadian Astronomy Data Centre (CADC)
Together they provide a platform for distribution, processing and storage of astronomical and astrophysical data sets. The cloud infrastructure is based on OpenStack technology. Main service provided to the users include:
- VOSpace – Virtual Observatory user storage,
- VMOD – Virtual Machines on Demand,
- GMS – Batch processing and group management,
All services are based on RESTful protocols maintained by CADC. VOSpace provides a web based user interface for finding datasets based on FITS metadata queries. The metadata from FITS file headers is indexed in a relational database. Users information is stored in an LDAP catalogue.
Community data access protocols
- REST or SOAP for data management control
- HTTP, FTP for data transfers
Data management technology
CANFAR data management system is based on VOSpace which is an implementation of Virtual Observatory Specification Draft (http://www.ivoa.net/documents/VOSpace/20150601/VOSpace.pdf). Data management control is available through a RESTful interface.
Data access control
GMS service provides the role of Policy Information Point during authorization requests, returning information about users groups, capabilities and capacities. VOSpace permissions are similar to POSIX based rights.
|Performance Requirements||Requirement levels||Description|
|Accessibility||High||The public data should be easily accessible to all users.|
|Throughput||High||Data transfers should use all available bandwidth whenever possible. This can be achieved by striping data into blocks and serving them simultaneously from several nodes in the cluster.|
|Response time||Middle||*Response time in terms of metadata queries should be quick in terms of typical user experience
|Security||High||Only data which is publicly available should be accessible by non-authorized users.|
e-Infrastructure in use
CANFAR infrastructure is based on OpenStack cloud platform
Requirements for EGI Testbed Establishments
Does the case include preferences on specific tools and technologies to use?
- Automatic provision of public data sets to users (based on predefined policies, e.g. after 1 year since creation)
- Due to large size of data sets, data transfer from storage site to computation site can be very expensive. Either computation should be moved closed to data, or if not possible, local mount of the remote storage on the computational nodes should be provided
Approximately how much compute and storage capacity and for how long time is needed?
Current data size is over 1PB in size.
Does the user (or those he/she represents) have access to a Certification Authority?
This will be resolved as part of EGI FedCloud project. Authentication will be based on X.509 certificates and in the future possibly based on eduGAIN service.
Meetings and minutes
- CANFAR meeting page: https://indico.egi.eu/indico/categoryDisplay.py?categId=167
- CANFAR VO space: http://www.ivoa.net/deployers/intro_to_vo_concepts.html
- CADC & CANFAR presentation given by Severin Gaudet .ppt
- Requirements notes from 1 Jul meeting: google doc
- EGI Engage Deliverable D4.1 CANFAR Integration Roadmap TOC: .doc
- Requirement extraction using template .doc