From EGIWiki
(Redirected from SA1.5-QR11)
Jump to: navigation, search
EGI Inspire Main page

Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports

1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
28/11/2012 FedCloud TF Face to Face Meeting Usage and Use Case review. Agreement on EGI Community Forum Demo.

2. Main Achievements

NIKHEF site have migrated their publishing and are now sending Job Records to the new APEL server.

The Cloud Accounting Usage Record has been revised to enable us to summarise cloud accounting data more efficiently. A corresponding cloud message format has also been implemented and, along with the latest version of SSM (2.0) has been tested with two of the Federated Cloud Task Force sites (CESGA and CESNET). We have now also successfully sent cloud records on to the Accounting Portal at CESGA so work on the visualisation of cloud accounting data can begin.

A test storage accounting database is in place, along with the new version of SSM ready to receive test StAR from storage clients.

Participated in OGF UR WG fortnightly phone conferences.

The "Fomalhaut" version of the Accounting Portal was released, with many improvements on InterNGI usage, custom VOs, local job filtering and many fixes and improvements. There were further improvements, like the automatic normalization of UserDNs to a common format, and remedial actions for some UserDN processing (see Issues below).

3. Issues and Mitigation

Issue Description Mitigation Description
There were 2 major power outages in November resulting in the APEL systems being down for a day each time (7th/8th November and 20th/21st November). Systems were rebooted when power was restored and database tables checked. There was no loss of data on either occasion.
The republishing of UserDNs proved too much for the existing consumer, the volume of data republished interrupted the daily processing. It was agreed that sites would ensure they were publishing UserDNs going forward.
A failure with the summarization of site SiGNET caused a 300%+ increase in the size of the UserCPU table, slowing down the portal, and triggering failures on the UserDN decrypting process (the process took more than 24h before failing). The decrypting process was made to exclude the SiGNET site (cutting down the size of the table) and some fine tuning and optimization of the Java process slimmed down the decryption time to <1h. We are watching the size of the table regularly to avoid downtimes.

4. Plans for the next period

Test Regional APEL server with external sites.

Create Summary Cloud Accounting record format to be sent on to the Accounting Portal for visualisation.

Migrate IN2P3 sites to new APEL server.