LoFAR
Engagement overview | Community requirements | Community events | Training | EGI Webinars | Documentations |
Community Information
Community Name
The Low Frequency Array
Community Short Name
LOFAR
Community Website
Community Description
LoFAR will be the first large radio telescope system wherein a huge amount of small sensors are used to achieve its sensitivity instead of a small number of big dishes. For the astronomy application, LOFAR is an aperture synthesis array composed of phased array stations. The antennas in each station form a phased array, producing one or many station beams on the sky. Multi-beaming is a major advantage of the phased array concept. It is not only used to increase observational efficiency, but may be vital for calibration purposes. The phased array stations are combined into an aperture synthesis array. The Remote Stations are distributed over a large area with a maximum baseline of 100 km within the Netherlands and 1500 km within Europe.
Community Objectives
LOFAR started as a new and innovative effort to force a breakthrough in sensitivity for astronomical observations at radio-frequencies below 250 MHz
Main Contact Institutions
ASTRON, CSIC, BSC, IAA
Main Contact
- Michael Wise (ASTRON, wise@astron.nl),
- R.F.Pizzo (ASTRON, pizzo@astron.nl ),
- Susana Sanchez-Exposito (CSIC, sse@iaa.es)
- Daniele Lezzi (BSC, daniele.lezzi@bsc.es)
- Jose Sabater Montes (IAA, jsm@iaa.es)
Prior EGI Collaboration
EGI and LoFAR have been collaborating since Oct 2014 to integrate calibration, analysis and modelling pipelines of radio-astronomy data into a cloud infrastructure. It is developed jointly by users of the [www.lofar.org LOFAR] radio-telescope and members of the AMIGA4GAS project. https://wiki.egi.eu/wiki/FedCloudLOFAR
Science Viewpoint
Scientific challenges
- In most cases, ingest jobs by the Radio Observatory need to be monitored closely to verify that all files are ingested and to manually recover the situation after a failure. This causes quite some inconvenience for some users, who have to wait for several days to get their data.
- Instability of the ingest system can cause long ingest queues and, inevitably, can make CEP2 very full. In extreme cases, the observing schedule needs to be rearranged because there is not enough disk space available on CEP2 to store more data till important ingest jobs are completed and the corresponding data can be removed from the cluster. This obviously limits the observing efficiency.
- Larger file number/size for staging required
- Fully exploit processing resources offered by the LOFAR Long-Term Archive
Objectives
- Efficient user data retrieval.
- Optimise the data staging, e.g., using pre-staging technology to move data from tapes to computing facilities to reduce the waiting for staging when a user requests to retrieve data
- Allowing user to process large amount of data and retrieve results only to avoid downloading the data to their local computer.
- Elastic disk storage space to allow data ingest jobs to be smoothly executed when handling bust computation
User Stories
LOFAR requests a future system to efficiently support large volumes of data access and burst data access, in particularly, to support the following 2 scenarios:
- User wants to retrieve large volumes of data from LTA. He finds the desired datasets by using the searching facilities provided by LoFAR data portal. An optimisation mechanism is installed which accelerates data staging process. User also starts data processing/analysing service/application running at LTA HPC/Cloud which are near the datasets. The dataset is injected into the processing/analysing service/application, and produced the results. User examines the results using a visualisation service, and downloads the results on his local PC.
- LTA encounters a burst access from users, and existing disk space is too small to handle the requests. Since LTA is federated with EGI Cloud, additional resources are immediately assigned to LOFAR LTA to handle the burst access. After that, the additional resources from EGI FedCloud are released.
Information Viewpoint
Data
Data Object Types
- Imaging data
- Pulsar data
Data Size
Observational data at rates up to 60 Gbps (650 TB per day), once processed, the amount of data to be kept for a longer time
Data Collection Size
- Exceeded 19 PB of data in the Long-Term Archive (LTA)
- Current growth: 3PB per year
Data Format
Datacubes (3D data): two Fourier spatial coordinate axes plus a spectral axis. A datacube can reach several TB. LOFAR telescope allows up to 488 subbands, which can reach several GBs. Each subband is processed independently.
Data Locations
Currently involves sites in the Netherlands and Germany
Privacy policy
- LoFAR data made public as of March 2nd 2015.
- Data that has passed the proprietary period becomes public and can be retrieved by anyone.
- Currently, data are still mainly retrieved by project PIs and collaborators
Data Lifecycle
The LOFAR Archive stores data on magnetic tapes. Data cannot be downloaded right away, but has to be copied from tape to disk first. This process is called 'staging’ Current limitations:
- Stage no more than 10 TB at a time and no more than 20,000 files
- Staging data from tape to disk might take drives are shared with all users (also non- LOFAR) and requests are queued
- Staging space is limited and shared between all LOFAR users – system might temporarily run low on disk space
- Data copy remains on disk for 2 weeks
- Maintenance and small outages experienced regularly
Require efficient solutions for data retrieval
Technology Viewpoint
System Architecture
Architecture of LoFAR Long-Term Archive (LTA) and Web based download server
- Distributed information system created to store and process the large data volumes generated by the LoFAR radio telescope
- Currently involves sites in the Netherlands and Germany
- Each site involved in the LTA provides storage capacity and optionally processing capabilities.
Community data access protocols
GridFTP
- Requires grid user certificate
- More robust, superior performance
- Requires grid client installation
Public data access protocol
Web based download server
- ‘LTA enabled’ ASTRON/LOFAR account
- Low threshold
- Primarily for few files & smaller volumes
Public authentication mechanism
Requires grid user certificate
Network
Network consisting of light-path connections (utilizing 10 GbE technology) that are shared with LOFAR station connections and with the European eVLBI network.
e-Infrastructure
Grid is in use
Client
Web interface
Other aspects
Interface to query the LTA database and retrieve data to own compute facilities
Non-functional requirements
Performance Requirements | Requirement Levels | Description |
---|---|---|
Availability | Normal | Not essential at this moment |
Accessibility | Normal | Not essential at this moment |
Throughput | Normal | Not essential at this moment |
Response time | High | Request to reduce staging time for large dataset, support of burst access |
Security | Normal | Not essential since LOFAR data are open |
Utility | Middle | LOFAR data shall be used by more users, at the moment main accesses are from PIs. |
Reliability | High | The ingest system is instable which can cause long ingest queues. |
Scalability | High | Request to reduce staging time for large dataset, support of burst access |
Efficiency | High | LOFAR data shall be easily and efficiently accessed |
Disaster recovery | Normal | Not essential at this moment |
Flexibility | High | The LOFAR pipeline framework is not flexible |
Decentralisaion | High | LOFAT LTAs are decentralised, thus need decentralised solutions |
Software and applications in use
Standardized LOFAR pipeline software integration with catalog & user interfaces
e-Infrastructure in use
Grid clusters in use
- SARA
- NIKHEF
- RUG
- FZ-Jülich
Service to LoFAR users
- Standardized pipelines
- Integration with catalog & user interfaces
- Processing where the data is high complexity & inhomogeneity
Expert users can
- Run custom software
- Use native protocols
- Build on integration with catalog
- Queries
- Ingest output including data lineage
Requirements for EGI Testbed Establishments
Preferences on specific resource providers | Netherlands, Germany |
Does the user (or those he/she represents) have access to a Certification Authority? | Yes, requires grid user certificate |
Does the user need access to an existing allocation, or does he/she needs a new allocation? | Do not need a new VO |
Which NGIs are interested in supporting this case? | Amsterdam Sara |