APEL/Gaps-April2015

From EGIWiki
< APEL
Revision as of 10:12, 4 May 2015 by Gordon (talk | contribs) (Duplicate of March2015 edited for April)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Missing Accounting Data (April 2015) – Need to Republish

There have been a couple of incidents recently where APEL accounting data has gone missing between the SSM client and the APEL repository. We are working with the message broker team to find the cause and prevent reoccurrences but it looks as if a couple of days publishing in March and April have been lost.

This didn’t affect all sites. Those who publish summaries are not affected as they repeatedly publish revised summaries for the whole of the current month. Some sites may not have published at all during the period in question and so there was nothing to go astray. Some sites may have noticed the problem themselves and republished already. It affects all sites publishing job records via SSM – ie The APEL client, ARC-JURA, and anyone who wrote their own client which uses SSM.

The signature of the March data loss is that your site has ERROR status for the APEL_Sync Test for April 2015 eg. http://bit.ly/1PiuQfS and CRITICAL for the APEL_Sync Nagios test starting on 3rd May eg. http://bit.ly/1OiEfaa If this is the case for your site then please follow the instructions below to republish. This note is not concerned with sites who appear Critical before 3rd May. If you had problems in March you will already have had a GGUS ticket assigned to you.

We have not yet identified a signature for any lost data in April as the client and repository cannot be guaranteed to be in step until after the end of the month. If you need to do a republish for March then follow it by republishing accounting data for April while the steps are fresh in your mind.

Instructions to Republish with the APEL Client for April 

1. Edit /etc/apel/client.cfg and change the section

interval = latest
##only used if interval = gap
#gap_start = 2012-01-01
#gap_end = 2012-01-31

to

interval = gap
##only used if interval = gap
gap_start = 2015-04-15
gap_end = 2015-04-20

2. Run /usr/bin/apelclient once and then reset the client.cfg to its original values.

3. Wait until the next day to check the sync test.

4. If March is still showing an ERROR then update the GGUS ticket if one was assigned to you. If you have not received a ticket please open on in GGUS and assign to APEL Team.for the APEL Team.

Instructions to Republish with the ARC CE+JURA Client for March 

1. Follow the instructions on the NorduGrid wiki here http://wiki.nordugrid.org/wiki/Accounting#Most_important_problems_and_solutions selecting jobs for the time window 2015-04-15 to 2015-04-20

2. Wait until the next day to check the sync test.

3.  If Aprilis still showing an ERROR then update the GGUS ticket if one was assigned to you. If you have not received a ticket please open one in GGUS and assign to APEL Team.

Instructions to Republish with other publishers

1. If you are extracting jobs from your own database and sending job records (not summaries) to APEL then do a custom database query to cover the period 2015-04-15 to 2015-04-20

2. Wait until the next day to check the sync test.

3. If March is still showing an ERROR then update the GGUS ticket if one was assigned to you. If you have not received a ticket please open one in GGUS and assign to APEL Team.


Category:Accounting