Difference between revisions of "APEL/MessageFormat"
(→Notes) |
|||
(64 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
= APEL Message Format = | = APEL Message Format = | ||
'''Please note:''' we have changed the formats of the job record message and the summary record message, to bring them in line with the EMI Compute Accounting Record. You can find a definition of the EMI CAR here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccountingRecord. The document includes the definition of the Aggregated Usage Record (AUR) which is equivalent to our summary records. | |||
This describes a new message format for getting data between the APEL clients and the server. | This describes a new message format for getting data between the APEL clients and the server. | ||
Line 10: | Line 12: | ||
* The ''header'' in each message tells the server which type of records are in that message. '''You need one header per message, so one header per file.''' | * The ''header'' in each message tells the server which type of records are in that message. '''You need one header per message, so one header per file.''' | ||
* ''Epoch time'', also known as a Unix timestamp, is an integer number of seconds since 1st January 1970. For example, the epoch time now is 1311675474. The command <code>date +%s</code> will give you the current epoch time on linux. | * ''Epoch time'', also known as a Unix timestamp, is an integer number of seconds since 1st January 1970. For example, the epoch time now is 1311675474. The command <code>date +%s</code> will give you the current epoch time on linux. | ||
Optional fields: | |||
* If you do not have a value for a field in a record, use "null" or "none" (not case-sensitive) as the value for this attribute OR leave the field out of the record completely. | |||
* This applies to all the optional fields in JobRecords and Summary records. | |||
== Job Records == | == Job Records == | ||
Line 17: | Line 24: | ||
=== Description === | === Description === | ||
'''Header''' APEL-individual-job-message: v0. | * See [[APEL/MessageFormatV01]] for version 0.1 of the message formats. | ||
'''Header''' APEL-individual-job-message: v0.3 | |||
The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version. | The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version. | ||
{| cellspacing="1" cellpadding="1" border="1 | The table shows the equivalent field in the CAR, under the container element <code>urf:UsageRecord</code>. If not specified, it refers to the text value of <code>urf:Key</code>, where the element is a direct child of <code>urf:UsageRecord</code>. | ||
{| cellspacing="1" cellpadding="1" border="1" width="1000" | |||
! scope="col" | Key | ! scope="col" | Key | ||
! scope="col" | Value | ! scope="col" | Value | ||
Line 29: | Line 39: | ||
! scope="col" | CAR equivalent (if different) | ! scope="col" | CAR equivalent (if different) | ||
|- | |- | ||
| Site | | Site || String || GOCDB sitename || Yes || | ||
| String | |- | ||
| GOCDB sitename | | SubmitHost || String || The CE-ID (see example) || Yes || | ||
| Yes | |- | ||
| | | MachineName || String || LRMS hostname || || | ||
|- | |||
| Queue || String || Batch system queue || || | |||
|- | |- | ||
| | | LocalJobId || String || Batch System Job ID || Yes || urf:JobIdentity/urf:LocalJobId | ||
| String | |||
| | |||
| Yes | |||
| | |||
|- | |- | ||
| | | LocalUserId || String || Local username || || urf:UserIdentity/urf:LocalUserId | ||
| String | |||
| | |||
| | |||
| | |||
|- | |- | ||
| | | GlobalUserName || String || User's X509 DN || || urf:UserIdentity/urf:GlobalUserName | ||
| String | |||
| | |||
| | |||
| | |||
|- | |- | ||
| | | FQAN || String || User's VOMS attributes || || urf:UserIdentity/urf:GroupAttribute[@type="FQAN"] | ||
| String | |||
| User's | |||
| | |||
| | |||
|- | |- | ||
| | | WallDuration || int || Wallclock time for the job (seconds) || Yes || CAR has ISO 8601 time duration | ||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| | | CpuDuration || int || CPU time for the job (seconds) || Yes || CAR has ISO 8601 time duration | ||
| int | |||
| | |||
| Yes | |||
| | |||
|- | |- | ||
| | | Processors || int || Number of processors || || urf:Processors[@metric="max"] | ||
| int | |||
| | |||
| | |||
| | |||
|- | |- | ||
| | | NodeCount || int || Number of nodes || || | ||
| int | |||
| Number of | |||
| | |||
| | |||
|- | |- | ||
| | | StartTime || int || Start time of the job (epoch time) || Yes || CAR has ISO 8601 datetime | ||
| int | |- | ||
| | | EndTime || int || Stop time of the job (epoch time) || Yes || CAR has ISO 8601 datetime | ||
| | |||
| | |||
|- | |- | ||
| | | InfrastructureDescription || String || <accounting client>-<CE type>-<batch system type> eg. "APEL-CREAM-PBS" || || | ||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| | | InfrastructureType || String || grid OR local || || | ||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| MemoryReal | | MemoryReal || int || Memory consumed by job (kbytes) || || urf:Memory[@metric="max" and @type="Physical" and @storageUnit="KB"] | ||
| int | |||
| Memory consumed by job (kbytes) | |||
| | |||
| | |||
|- | |- | ||
| MemoryVirtual | | MemoryVirtual || int || Virtual memory consumed by job (kbytes) || || urf:Memory[@metric="max" and @type="Shared" and @storageUnit="KB"] | ||
| int | |||
| Virtual memory consumed by job (kbytes) | |||
| | |||
| | |||
|- | |- | ||
| | | ServiceLevelType || String || Si2k OR HEPSPEC || Yes || urf:ServiceLevel[@type] | ||
| String | |||
| | |||
| Yes | |||
| | |||
|- | |- | ||
| | | ServiceLevel || double || Value of either HepSpec06 or SpecInt2000 || Yes || urf:ServiceLevel | ||
| double | |||
| Value of either | |||
| Yes | |||
| ServiceLevel | |||
|- | |- | ||
|} | |} | ||
'''End of record:''' %% | |||
=== Changes since version 0.2 === | |||
* InfrastructureType field (optional) | |||
* InfrastructureDescription field (optional) | |||
* SubmitHostType field (optional) | |||
=== Changes from version 0.1 to version 0.2 === | |||
* LocalJobID has changed to LocalJobId | |||
* LocalUserID has changed to LocalUserId | |||
* UserFQAN has changed to FQAN | |||
* ScalingFactorUnit has changed to ServiceLevelType | |||
* The possible values of ScalingFactorType have changed from ["SpecInt2000", "HepSpec06", "custom"] to ["Si2k"], ["HEPSPEC"] | |||
* ScalingFactor has changed to ServiceLevel | |||
=== Notes === | |||
If GlobalUserName or UserFQAN is not published, the value for these fields on the server will be set to 'None'. | |||
Jobs are assumed to be grid jobs. To specify local jobs: | |||
* InfrastructureType: local | |||
* SubmitHostType: LRMS | |||
* SubmitHost: <LRMS-hostname> | |||
< | |||
'''The Group value specified for local jobs must be different to equivalent grid jobs''', or you will not be able to differentiate them in the accounting portal. Suggestion: | |||
* <code>Group: atlas</code> - grid job | |||
* <code>Group: local-atlas</code> - local job | |||
This advice may change as we get more sites publishing local jobs. | |||
=== Example Message === | === Example Message === | ||
<pre>APEL-individual-job-message: v0. | <pre>APEL-individual-job-message: v0.2 | ||
Site: RAL-LCG2 | Site: RAL-LCG2 | ||
SubmitHost: ce01.ncg.ingrid.pt:2119/jobmanager-lcgsge-atlasgrid | SubmitHost: ce01.ncg.ingrid.pt:2119/jobmanager-lcgsge-atlasgrid | ||
LocalJobId: 31564872 | |||
LocalUserId: atlasprd019 | |||
GlobalUserName: /C=whatever/D=someDN | GlobalUserName: /C=whatever/D=someDN | ||
FQAN: /voname/Role=NULL/Capability=NULL | |||
WallDuration: 234256 | WallDuration: 234256 | ||
CpuDuration: 2345 | CpuDuration: 2345 | ||
Line 188: | Line 130: | ||
MemoryReal: 1000 | MemoryReal: 1000 | ||
MemoryVirtual: 2000 | MemoryVirtual: 2000 | ||
ServiceLevelType: Si2k | |||
ServiceLevel: 1000 | |||
%% | %% | ||
...another job record... | ...another job record... | ||
Line 201: | Line 143: | ||
=== Description === | === Description === | ||
'''Header''': APEL-summary-job-message: v0. | '''Header''': APEL-summary-job-message: v0.3 | ||
The header only appears once at the top of each message. It defines the type of record and the schema version. | The header only appears once at the top of each message. It defines the type of record and the schema version. | ||
{| cellspacing="1" cellpadding="1" border="1 | The table shows the equivalent field in the AUR, under the container element <code>aur:SummaryRecord</code>. If not specified, it refers to the text value of <code>urf:Key</code>, where the element is a direct child of <code>aur:SummaryRecord</code>. | ||
{| cellspacing="1" cellpadding="1" border="1" width="1000" | |||
|- | |||
! Key | |||
! Value | |||
! Description | |||
! Mandatory | |||
! AUR equivalent | |||
|- | |||
| Site || String || GOCDB sitename || Yes || | |||
|- | |||
| Month || int || Month of summary (see notes) || Yes || | |||
|- | |- | ||
| | | Year || int || Year of summary (see notes) || Yes || | ||
| | |||
| | |||
| | |||
|- | |- | ||
| | | GlobalUserName || String || User's X509 DN || || aur:UserIdentity/urf:GlobalUserName | ||
| String | |||
| | |||
| | |||
|- | |- | ||
| | | VO || String || User's VO || || aur:UserIdentity/urf:Group | ||
| | |||
| | |||
| | |||
|- | |- | ||
| | | VOGroup || String || User's VOMS group || || aur:UserIdentity/urf:GroupAttribute[@type="vo-group"] | ||
| | |||
| | |||
| | |||
|- | |- | ||
| | | VORole || String || User's VOMS role || || aur:UserIdentity/urf:GroupAttribute[@type="vo-role"] | ||
| String | |||
| User's | |||
| | |||
|- | |- | ||
| | | SubmitHost || String || The CE-ID or LRMS hostname || || | ||
| String | |||
| | |||
| | |||
|- | |- | ||
| | | Infrastructure || String || grid OR local || || | ||
| String | |||
| | |||
| | |||
|- | |- | ||
| | | Processors || int || Number of processors || || | ||
| | |||
| | |||
| | |||
|- | |- | ||
| EarliestEndTime | | NodeCount || int || Number of nodes || || | ||
| int | |- | ||
| End time of the first job in the month (epoch time) | | EarliestEndTime || int || End time of the first job in the month (epoch time) || || AUR has dates in ISO 8601 format | ||
| | |||
|- | |- | ||
| LatestEndTime | | LatestEndTime || int || End time of the last job in the month (epoch time) || || AUR has dates in ISO 8601 format | ||
| int | |||
| End time of the last job in the month (epoch time) | |||
| | |||
|- | |- | ||
| WallDuration | | WallDuration || int || Sum of wall clock times for all jobs in the month (in seconds) || Yes || AUR has durations in ISO 8601 format | ||
| int | |||
| Sum of wall clock times for all jobs in the month (in | |||
| Yes | |||
|- | |- | ||
| CpuDuration | | CpuDuration || int || Sum of CPU time for all jobs in the month (in seconds) || Yes || AUR has durations in ISO 8601 format | ||
| int | |||
| Sum of CPU time for all jobs in the month (in | |||
| Yes | |||
|- | |- | ||
| NormalisedWallDuration | | NormalisedWallDuration || int || Sum of normalised wall clock time for all jobs (in seconds; normalised by HEPSPEC06) || Yes || AUR has durations in ISO 8601 format | ||
| int | |||
| Sum of normalised wall clock time for all jobs (in | |||
| Yes | |||
|- | |- | ||
| NormalisedCpuDuration | | NormalisedCpuDuration || int || Sum of normalised CPU times for all jobs (in seconds; normalised by HEPSPEC06) || Yes || AUR has durations in ISO 8601 format | ||
| int | |||
| Sum of normalised CPU times for all jobs (in | |||
| Yes | |||
|- | |- | ||
| NumberOfJobs | | NumberOfJobs || int || Total number of jobs || Yes || | ||
| int | |||
| Total number of jobs | |||
| Yes | |||
|} | |} | ||
'''End of record:''' %% | |||
'''Notes:''' | |||
If GlobalUserName, VO, Group or Role are not published, the value for these fields on the server will be set to 'None'. | |||
The job records are included in months according to the month and year of their EndTime. The month and year should be in UTC. Only completed jobs are accounted for by APEL. | |||
All durations are in hours. Normalised durations should be multiplied by HEPSPEC06. All figures should be rounded to the nearest integer. | |||
=== Example Message === | === Example Message === | ||
<pre>APEL-summary-job-message: v0. | <pre>APEL-summary-job-message: v0.3 | ||
Site: RAL-LCG2 | Site: RAL-LCG2 | ||
Month: 3 | Month: 3 | ||
Line 330: | Line 211: | ||
GlobalUserName: /C=whatever/D=someDN | GlobalUserName: /C=whatever/D=someDN | ||
VO: atlas | VO: atlas | ||
VOGroup: /atlas | |||
VORole: Role=production | |||
SubmitHost: test06.ral.ac.uk:8443/cream-pbs-GRID_ops | |||
Infrastructure: grid | |||
Processors: 1 | |||
NodeCount: 1 | |||
EarliestEndTime: 1267527463 | EarliestEndTime: 1267527463 | ||
LatestEndTime: 1269773863 | LatestEndTime: 1269773863 | ||
Line 345: | Line 230: | ||
%% | %% | ||
</pre> | </pre> | ||
=== Description === | |||
* See [[APEL/MessageFormatV01]] for version 0.1 of the message formats. | |||
* See [[APEL/MessageFormatV02]] for version 0.2 of the message formats. | |||
== Summary Sync Records == | == Summary Sync Records == | ||
Line 366: | Line 257: | ||
| String | | String | ||
| GOCDB sitename | | GOCDB sitename | ||
| Yes | |||
|- | |||
| SubmitHost | |||
| String | |||
| CE ID | |||
| Yes | | Yes | ||
|- | |- | ||
Line 403: | Line 299: | ||
<pre>APEL-sync-message: v0.1 | <pre>APEL-sync-message: v0.1 | ||
Site: RAL-LCG2 | Site: RAL-LCG2 | ||
SubmitHost: raltest.rl.ac.uk:8443/cream-pbs-demo | |||
NumberOfJobs: 3479 | NumberOfJobs: 3479 | ||
Month: 1 | Month: 1 | ||
Line 412: | Line 309: | ||
%% | %% | ||
</pre> | </pre> | ||
[[Category:Accounting]] |
Latest revision as of 13:31, 2 July 2014
APEL Message Format
Please note: we have changed the formats of the job record message and the summary record message, to bring them in line with the EMI Compute Accounting Record. You can find a definition of the EMI CAR here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccountingRecord. The document includes the definition of the Aggregated Usage Record (AUR) which is equivalent to our summary records.
This describes a new message format for getting data between the APEL clients and the server.
It also describes the mapping between its fields and those of the EMI Compute Accounting Record (CAR), described here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting.
Terminology:
- A message is one file which is sent and received by the SSM. Usually a message will contain a number of records (eg 1000)
- A record corresponds to one row in the database. It contains a number of key-value pairs as specified by the tables below
- The header in each message tells the server which type of records are in that message. You need one header per message, so one header per file.
- Epoch time, also known as a Unix timestamp, is an integer number of seconds since 1st January 1970. For example, the epoch time now is 1311675474. The command
date +%s
will give you the current epoch time on linux.
Optional fields:
- If you do not have a value for a field in a record, use "null" or "none" (not case-sensitive) as the value for this attribute OR leave the field out of the record completely.
- This applies to all the optional fields in JobRecords and Summary records.
Job Records
A message is one file. It can contain multiple records. Different records must be separated by the end of record marker (%%).
Description
- See APEL/MessageFormatV01 for version 0.1 of the message formats.
Header APEL-individual-job-message: v0.3
The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version.
The table shows the equivalent field in the CAR, under the container element urf:UsageRecord
. If not specified, it refers to the text value of urf:Key
, where the element is a direct child of urf:UsageRecord
.
Key | Value | Description | Mandatory | CAR equivalent (if different) |
---|---|---|---|---|
Site | String | GOCDB sitename | Yes | |
SubmitHost | String | The CE-ID (see example) | Yes | |
MachineName | String | LRMS hostname | ||
Queue | String | Batch system queue | ||
LocalJobId | String | Batch System Job ID | Yes | urf:JobIdentity/urf:LocalJobId |
LocalUserId | String | Local username | urf:UserIdentity/urf:LocalUserId | |
GlobalUserName | String | User's X509 DN | urf:UserIdentity/urf:GlobalUserName | |
FQAN | String | User's VOMS attributes | urf:UserIdentity/urf:GroupAttribute[@type="FQAN"] | |
WallDuration | int | Wallclock time for the job (seconds) | Yes | CAR has ISO 8601 time duration |
CpuDuration | int | CPU time for the job (seconds) | Yes | CAR has ISO 8601 time duration |
Processors | int | Number of processors | urf:Processors[@metric="max"] | |
NodeCount | int | Number of nodes | ||
StartTime | int | Start time of the job (epoch time) | Yes | CAR has ISO 8601 datetime |
EndTime | int | Stop time of the job (epoch time) | Yes | CAR has ISO 8601 datetime |
InfrastructureDescription | String | <accounting client>-<CE type>-<batch system type> eg. "APEL-CREAM-PBS" | ||
InfrastructureType | String | grid OR local | ||
MemoryReal | int | Memory consumed by job (kbytes) | urf:Memory[@metric="max" and @type="Physical" and @storageUnit="KB"] | |
MemoryVirtual | int | Virtual memory consumed by job (kbytes) | urf:Memory[@metric="max" and @type="Shared" and @storageUnit="KB"] | |
ServiceLevelType | String | Si2k OR HEPSPEC | Yes | urf:ServiceLevel[@type] |
ServiceLevel | double | Value of either HepSpec06 or SpecInt2000 | Yes | urf:ServiceLevel |
End of record: %%
Changes since version 0.2
- InfrastructureType field (optional)
- InfrastructureDescription field (optional)
- SubmitHostType field (optional)
Changes from version 0.1 to version 0.2
- LocalJobID has changed to LocalJobId
- LocalUserID has changed to LocalUserId
- UserFQAN has changed to FQAN
- ScalingFactorUnit has changed to ServiceLevelType
- The possible values of ScalingFactorType have changed from ["SpecInt2000", "HepSpec06", "custom"] to ["Si2k"], ["HEPSPEC"]
- ScalingFactor has changed to ServiceLevel
Notes
If GlobalUserName or UserFQAN is not published, the value for these fields on the server will be set to 'None'.
Jobs are assumed to be grid jobs. To specify local jobs:
- InfrastructureType: local
- SubmitHostType: LRMS
- SubmitHost: <LRMS-hostname>
The Group value specified for local jobs must be different to equivalent grid jobs, or you will not be able to differentiate them in the accounting portal. Suggestion:
Group: atlas
- grid jobGroup: local-atlas
- local job
This advice may change as we get more sites publishing local jobs.
Example Message
APEL-individual-job-message: v0.2 Site: RAL-LCG2 SubmitHost: ce01.ncg.ingrid.pt:2119/jobmanager-lcgsge-atlasgrid LocalJobId: 31564872 LocalUserId: atlasprd019 GlobalUserName: /C=whatever/D=someDN FQAN: /voname/Role=NULL/Capability=NULL WallDuration: 234256 CpuDuration: 2345 Processors: 2 NodeCount: 2 StartTime: 1234567890 EndTime: 1234567899 MemoryReal: 1000 MemoryVirtual: 2000 ServiceLevelType: Si2k ServiceLevel: 1000 %% ...another job record... %% ... %%
Summary Job Records
Description
Header: APEL-summary-job-message: v0.3
The header only appears once at the top of each message. It defines the type of record and the schema version.
The table shows the equivalent field in the AUR, under the container element aur:SummaryRecord
. If not specified, it refers to the text value of urf:Key
, where the element is a direct child of aur:SummaryRecord
.
Key | Value | Description | Mandatory | AUR equivalent |
---|---|---|---|---|
Site | String | GOCDB sitename | Yes | |
Month | int | Month of summary (see notes) | Yes | |
Year | int | Year of summary (see notes) | Yes | |
GlobalUserName | String | User's X509 DN | aur:UserIdentity/urf:GlobalUserName | |
VO | String | User's VO | aur:UserIdentity/urf:Group | |
VOGroup | String | User's VOMS group | aur:UserIdentity/urf:GroupAttribute[@type="vo-group"] | |
VORole | String | User's VOMS role | aur:UserIdentity/urf:GroupAttribute[@type="vo-role"] | |
SubmitHost | String | The CE-ID or LRMS hostname | ||
Infrastructure | String | grid OR local | ||
Processors | int | Number of processors | ||
NodeCount | int | Number of nodes | ||
EarliestEndTime | int | End time of the first job in the month (epoch time) | AUR has dates in ISO 8601 format | |
LatestEndTime | int | End time of the last job in the month (epoch time) | AUR has dates in ISO 8601 format | |
WallDuration | int | Sum of wall clock times for all jobs in the month (in seconds) | Yes | AUR has durations in ISO 8601 format |
CpuDuration | int | Sum of CPU time for all jobs in the month (in seconds) | Yes | AUR has durations in ISO 8601 format |
NormalisedWallDuration | int | Sum of normalised wall clock time for all jobs (in seconds; normalised by HEPSPEC06) | Yes | AUR has durations in ISO 8601 format |
NormalisedCpuDuration | int | Sum of normalised CPU times for all jobs (in seconds; normalised by HEPSPEC06) | Yes | AUR has durations in ISO 8601 format |
NumberOfJobs | int | Total number of jobs | Yes |
End of record: %%
Notes:
If GlobalUserName, VO, Group or Role are not published, the value for these fields on the server will be set to 'None'.
The job records are included in months according to the month and year of their EndTime. The month and year should be in UTC. Only completed jobs are accounted for by APEL.
All durations are in hours. Normalised durations should be multiplied by HEPSPEC06. All figures should be rounded to the nearest integer.
Example Message
APEL-summary-job-message: v0.3 Site: RAL-LCG2 Month: 3 Year: 2010 GlobalUserName: /C=whatever/D=someDN VO: atlas VOGroup: /atlas VORole: Role=production SubmitHost: test06.ral.ac.uk:8443/cream-pbs-GRID_ops Infrastructure: grid Processors: 1 NodeCount: 1 EarliestEndTime: 1267527463 LatestEndTime: 1269773863 WallDuration: 23425 CpuDuration: 2345 NormalisedWallDuration: 244435 NormalisedCpuDuration: 2500 NumberOfJobs: 100 %% ...another summary job record... %% ... %%
Description
- See APEL/MessageFormatV01 for version 0.1 of the message formats.
- See APEL/MessageFormatV02 for version 0.2 of the message formats.
Summary Sync Records
The summary Sync records are used for the creation of the apel-sync Nagios test. It is a mechanism for the central APEL server to know the number of records that each site is storing locally. It is in general only used by sites which publish via the standard APEL client.
Description
Header: APEL-sync-message: v0.1
The header only appears once at the top of each message. It defines the type of record and the schema version.
Key | Value | Description | Mandatory |
---|---|---|---|
Site | String | GOCDB sitename | Yes |
SubmitHost | String | CE ID | Yes |
NumberOfJobs | int | Total number of jobs for that month | Yes |
Month | int | Month | Yes |
Year | int | Year | Yes |
End of record: %%
Notes:
Each record indicates the number of jobs run on the site per month. This data is used to create the Nagios apel-sync test.
Example Message
APEL-sync-message: v0.1 Site: RAL-LCG2 SubmitHost: raltest.rl.ac.uk:8443/cream-pbs-demo NumberOfJobs: 3479 Month: 1 Year: 2010 %% ...another sync record... %% ... %%