LoFAR

From EGIWiki
Jump to: navigation, search
Engagement overview Community requirements Community events Training EGI Webinars Documentations


Community Information

Community Name

The Low Frequency Array

Community Short Name

LOFAR

Community Website

http://www.lofar.org/

Community Description

LoFAR will be the first large radio telescope system wherein a huge amount of small sensors are used to achieve its sensitivity instead of a small number of big dishes. For the astronomy application, LOFAR is an aperture synthesis array composed of phased array stations. The antennas in each station form a phased array, producing one or many station beams on the sky. Multi-beaming is a major advantage of the phased array concept. It is not only used to increase observational efficiency, but may be vital for calibration purposes. The phased array stations are combined into an aperture synthesis array. The Remote Stations are distributed over a large area with a maximum baseline of 100 km within the Netherlands and 1500 km within Europe.

Community Objectives

LOFAR started as a new and innovative effort to force a breakthrough in sensitivity for astronomical observations at radio-frequencies below 250 MHz

Main Contact Institutions

ASTRON, CSIC, BSC, IAA

Main Contact

  • Michael Wise (ASTRON, wise@astron.nl),
  • R.F.Pizzo (ASTRON, pizzo@astron.nl ),
  • Susana Sanchez-Exposito (CSIC, sse@iaa.es)
  • Daniele Lezzi (BSC, daniele.lezzi@bsc.es)
  • Jose Sabater Montes (IAA, jsm@iaa.es)

Prior EGI Collaboration

EGI and LoFAR have been collaborating since Oct 2014 to integrate calibration, analysis and modelling pipelines of radio-astronomy data into a cloud infrastructure. It is developed jointly by users of the [www.lofar.org LOFAR] radio-telescope and members of the AMIGA4GAS project. https://wiki.egi.eu/wiki/FedCloudLOFAR

Science Viewpoint

Scientific challenges

  • In most cases, ingest jobs by the Radio Observatory need to be monitored closely to verify that all files are ingested and to manually recover the situation after a failure. This causes quite some inconvenience for some users, who have to wait for several days to get their data. 

  • Instability of the ingest system can cause long ingest queues and, inevitably, can make CEP2 very full. In extreme cases, the observing schedule needs to be rearranged because there is not enough disk space available on CEP2 to store more data till important ingest jobs are completed and the corresponding data can be removed from the cluster. This obviously limits the observing efficiency.
  • Larger file number/size for staging required 

  • Fully exploit processing resources offered by the LOFAR Long-Term Archive

Objectives

  • Efficient user data retrieval.
    • Optimise the data staging, e.g., using pre-staging technology to move data from tapes to computing facilities to reduce the waiting for staging when a user requests to retrieve data
    • Allowing user to process large amount of data and retrieve results only to avoid downloading the data to their local computer.
  • Elastic disk storage space to allow data ingest jobs to be smoothly executed when handling bust computation

User Stories

LOFAR requests a future system to efficiently support large volumes of data access and burst data access, in particularly, to support the following 2 scenarios:

  • User wants to retrieve large volumes of data from LTA. He finds the desired datasets by using the searching facilities provided by LoFAR data portal. An optimisation mechanism is installed which accelerates data staging process. User also starts data processing/analysing service/application running at LTA HPC/Cloud which are near the datasets. The dataset is injected into the processing/analysing service/application, and produced the results. User examines the results using a visualisation service, and downloads the results on his local PC.
  • LTA encounters a burst access from users, and existing disk space is too small to handle the requests. Since LTA is federated with EGI Cloud, additional resources are immediately assigned to LOFAR LTA to handle the burst access. After that, the additional resources from EGI FedCloud are released.

Information Viewpoint

Data

Data Object Types

  • Imaging data
  • Pulsar data

Data Size

Observational data at rates up to 60 Gbps (650 TB per day), once processed, the amount of data to be kept for a longer time

Data Collection Size

  • Exceeded 19 PB of data in the Long-Term Archive (LTA)
  • Current growth: 3PB per year

Data Format

Datacubes (3D data): two Fourier spatial coordinate axes plus a spectral axis. A datacube can reach several TB. LOFAR telescope allows up to 488 subbands, which can reach several GBs. Each subband is processed independently.

Data Locations

Currently involves sites in the Netherlands and Germany

Privacy policy

  • LoFAR data made public as of March 2nd 2015.
  • Data that has passed the proprietary period becomes public and can be retrieved by anyone.
  • Currently, data are still mainly 
retrieved by project PIs and collaborators 


Data Lifecycle

LOFAR Data Flow

The LOFAR Archive stores data on magnetic tapes. Data cannot be downloaded right away, but has to be copied from tape to disk first. This process is called 'staging’ 
 Current limitations:

  • Stage no more than 10 TB at a time and no more than 20,000 files 

  • Staging data from tape to disk might take drives are shared with all users (also non- LOFAR) and requests are queued 

  • Staging space is limited and shared between all LOFAR users – system might temporarily run low on disk space 

  • Data copy remains on disk for 2 weeks 

  • Maintenance and small outages experienced regularly 


Require efficient solutions for data retrieval

Technology Viewpoint

System Architecture

LOFAR LTA Architecture

Architecture of LoFAR Long-Term Archive (LTA) and Web based download server

  • Distributed information system created to store and process the large data volumes generated by the LoFAR radio telescope
  • Currently involves sites in the Netherlands and Germany
  • Each site involved in the LTA provides storage capacity and optionally processing capabilities.

Community data access protocols

GridFTP

  • Requires grid user certificate
  • More robust, superior performance
  • Requires grid client installation

Public data access protocol

Web based download server

  • ‘LTA enabled’ ASTRON/LOFAR account
  • Low threshold
  • Primarily for few files & smaller volumes

Public authentication mechanism

Requires grid user certificate

Network

Network consisting of light-path connections (utilizing 10 GbE technology) that are shared with LOFAR station connections and with the European eVLBI network.

e-Infrastructure

Grid is in use

Client

Web interface

Other aspects

Interface to query the LTA database and retrieve data to own compute facilities

Non-functional requirements

Performance Requirements Requirement Levels Description
Availability Normal Not essential at this moment
Accessibility Normal Not essential at this moment
Throughput Normal Not essential at this moment
Response time High Request to reduce staging time for large dataset, support of burst access
Security Normal Not essential since LOFAR data are open
Utility Middle LOFAR data shall be used by more users, at the moment main accesses are from PIs.
Reliability High The ingest system is instable which can cause long ingest queues.
Scalability High Request to reduce staging time for large dataset, support of burst access
Efficiency High LOFAR data shall be easily and efficiently accessed
Disaster recovery Normal Not essential at this moment
Flexibility High The LOFAR pipeline framework is not flexible
Decentralisaion High LOFAT LTAs are decentralised, thus need decentralised solutions

Software and applications in use

Standardized LOFAR pipeline software integration with catalog & user interfaces

e-Infrastructure in use

Grid clusters in use

  • SARA
  • NIKHEF
  • RUG
  • FZ-Jülich

Service to LoFAR users

  • Standardized pipelines
  • Integration with catalog & user interfaces
  • Processing where the data is high complexity & inhomogeneity

Expert users can

  • Run custom software
  • Use native protocols
  • Build on integration with catalog
    • Queries
    • Ingest output including data lineage

LOFAR e-Infrastructrue Architecture

Requirements for EGI Testbed Establishments

Preferences on specific resource providers Netherlands, Germany
Does the user (or those he/she represents) have access to a Certification Authority? Yes, requires grid user certificate
Does the user need access to an existing allocation, or does he/she needs a new allocation? Do not need a new VO
Which NGIs are interested in supporting this case? Amsterdam Sara

EGI Contacts

  • Tiziana Ferrari, tiziana.ferrari@egi.eu
  • Diego Scardaci, diego.scardaci@egi.eu
  • Enol Fernández <enol.fernandez@egi.eu>
  • Yin Chen, yin.chen@egi.eu

Meeting and minutes

EGI-LOFAR Meetings :https://indico.egi.eu/indico/categoryDisplay.py?categId=173

References

  • LOFAR LTA and Processing Overview, R.F.Pizzohttps, given at EGI-ATRON meeting 9 Jul 2015, .pdf
  • EGI FedCloud LOFAR Use Case, Susana Sanchez-Exposito, given at EGI-ATRON meeting 9 Jul 2015, .ppt
  • EGI-LOFAR FedCloud: https://wiki.egi.eu/wiki/FedCloudLOFAR