Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

APEL/Gaps-March2015

From EGIWiki
< APEL
Revision as of 23:01, 23 April 2015 by Gordon (talk | contribs)
Jump to navigation Jump to search

Missing Accounting Data – Need to Republish

There have been a couple of incidents recently where APEL accounting data has gone missing between the SSM client and the APEL repository. We are working with the message broker team to find the cause and prevent reoccurrences but it looks as if a couple of days publishing in March and April have been lost.

This didn’t affect all sites. Those who publish summaries are not affected as they repeatedly publish revised summaries for the whole of the current month. Some sites may not have published at all during the period in question and so there was nothing to go astray. Some sites may have noticed the problem themselves and republished already. It affects all sites publishing job records via SSM – ie The APEL client, ARC-JURA, and anyone who wrote their own client which uses SSM.

The signature of the March data loss is that your site has ERROR status for the APEL_Sync Test for March 2015 eg. http://bit.ly/1PiuQfS and CRITICAL for the APEL_Sync Nagios test starting on 3rd April eg. http://bit.ly/1OiEfaa If this is the case for your site then please follow the instructions below to republish. This note is not concerned with sites who appear Critical before 3rd April.

We have not yet identified a signature for any lost data in April as the client and repository cannot be guaranteed to be in step until after the end of the month. If you need to do a republish for March then follow it by republishing accounting data for April while the steps are fresh in your mind.

Instructions to Republish with the APEL Client for March 

1. Edit client.cfg and change the section

interval = latest

##only used if interval = gap

#gap_start = 2012-01-01

#gap_end = 2012-01-31

to

interval = gap

##only used if interval = gap

gap_start = 2015-03-11

gap_end = 2015-03-16

2. Run the apel-client once and then reset the client.cfg to its original values.

3. Wait until the next day to check the sync test.

4. If March is still showing an ERROR then update the GGUS ticket if one was assigned to you. If you have not received a ticket please open on in GGUS and assign to APEL Team.for the APEL Team.

Instructions to Republish with the ARC CE+JURA Client for March 

1. Follow the instructions on the NorduGrid wiki here http://wiki.nordugrid.org/wiki/Accounting#Most_important_problems_and_solutions selecting jobs for the time window 2015-03-11 to 2015-03-16

2. Wait until the next day to check the sync test.

3. If March is still showing an ERROR then open a GGUS ticket for the APEL Team.

Instructions to Republish with other publishers

1. If you are extracting jobs from your own database and sending job records (not summaries) to APEL then do a custom database query to cover the period 2015-03-11 to 2015-03-16

2. Wait until the next day to check the sync test.

3. If March is still showing an ERROR then open a GGUS ticket for the APEL Team.

April

We suspect possible data losses around 18th April so you may wish proactively to republish this period while you are dealing with March.

1. Repeat the appropriate method as outlined above but for the period 2015-04-15 to 2015-04-20