Difference between revisions of "APEL/MessageFormat"

From EGIWiki
Jump to: navigation, search
(Notes)
 
(38 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
= APEL Message Format  =
 
= APEL Message Format  =
  
'''Please note:''' we have changed the formats of the job record message and the summary record message, to bring them more in line with the EMI Compute Accounting Record.  You can find a definition of the EMI CAR here: https://twiki.cern.ch/twiki/pub/EMI/ComputeAccounting/CAR-EMI-tech-doc-1.0.doc.
+
'''Please note:''' we have changed the formats of the job record message and the summary record message, to bring them in line with the EMI Compute Accounting Record.  You can find a definition of the EMI CAR here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccountingRecord. The document includes the definition of the Aggregated Usage Record (AUR) which is equivalent to our summary records.
  
 
This describes a new message format for getting data between the APEL clients and the server.  
 
This describes a new message format for getting data between the APEL clients and the server.  
Line 12: Line 12:
 
* The ''header'' in each message tells the server which type of records are in that message.  '''You need one header per message, so one header per file.'''
 
* The ''header'' in each message tells the server which type of records are in that message.  '''You need one header per message, so one header per file.'''
 
* ''Epoch time'', also known as a Unix timestamp, is an integer number of seconds since 1st January 1970.  For example, the epoch time now is 1311675474.  The command <code>date +%s</code> will give you the current epoch time on linux.
 
* ''Epoch time'', also known as a Unix timestamp, is an integer number of seconds since 1st January 1970.  For example, the epoch time now is 1311675474.  The command <code>date +%s</code> will give you the current epoch time on linux.
 +
 +
Optional fields:
 +
* If you do not have a value for a field in a record, use "null" or "none" (not case-sensitive) as the value for this attribute OR leave the field out of the record completely.
 +
* This applies to all the optional fields in JobRecords and Summary records.
 +
  
 
== Job Records  ==
 
== Job Records  ==
Line 21: Line 26:
 
* See [[APEL/MessageFormatV01]] for version 0.1 of the message formats.
 
* See [[APEL/MessageFormatV01]] for version 0.1 of the message formats.
  
'''Header''' APEL-individual-job-message: v0.2
+
'''Header''' APEL-individual-job-message: v0.3
  
 
The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version.  
 
The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version.  
Line 27: Line 32:
 
The table shows the equivalent field in the CAR, under the container element <code>urf:UsageRecord</code>.  If not specified, it refers to the text value of <code>urf:Key</code>, where the element is a direct child of <code>urf:UsageRecord</code>.
 
The table shows the equivalent field in the CAR, under the container element <code>urf:UsageRecord</code>.  If not specified, it refers to the text value of <code>urf:Key</code>, where the element is a direct child of <code>urf:UsageRecord</code>.
  
{| cellspacing="1" cellpadding="1" border="1" width="800"
+
{| cellspacing="1" cellpadding="1" border="1" width="1000"
 
! scope="col" | Key  
 
! scope="col" | Key  
 
! scope="col" | Value  
 
! scope="col" | Value  
Line 34: Line 39:
 
! scope="col" | CAR equivalent (if different)
 
! scope="col" | CAR equivalent (if different)
 
|-
 
|-
| Site              || String || GOCDB sitename                              || Yes ||  
+
| Site              || String || GOCDB sitename                              || Yes ||
 +
|-
 +
| SubmitHost        || String || The CE-ID (see example)      || Yes ||         
 +
|-
 +
| MachineName    || String || LRMS hostname                                ||    ||  
 
|-
 
|-
| SubmitHost        || String || The CE-ID (see example)                      || Yes ||        
+
| Queue      || String || Batch system queue || ||  
 
|-
 
|-
 
| LocalJobId        || String || Batch System Job ID                          || Yes || urf:JobIdentity/urf:LocalJobId
 
| LocalJobId        || String || Batch System Job ID                          || Yes || urf:JobIdentity/urf:LocalJobId
Line 50: Line 59:
 
| CpuDuration      || int    || CPU&nbsp;time for the job (seconds)          || Yes || CAR has ISO 8601 time duration
 
| CpuDuration      || int    || CPU&nbsp;time for the job (seconds)          || Yes || CAR has ISO 8601 time duration
 
|-
 
|-
| Processors        || int    || Number of processors                        ||    ||  
+
| Processors        || int    || Number of processors                        ||    || urf:Processors[@metric="max"]
 
|-
 
|-
 
| NodeCount        || int    || Number of nodes                              ||    ||  
 
| NodeCount        || int    || Number of nodes                              ||    ||  
Line 58: Line 67:
 
| EndTime          || int    || Stop time of the job (epoch time)            || Yes || CAR has ISO 8601 datetime
 
| EndTime          || int    || Stop time of the job (epoch time)            || Yes || CAR has ISO 8601 datetime
 
|-
 
|-
| MemoryReal        || int    || Memory consumed by job (kbytes)              ||    ||
+
| InfrastructureDescription    || String || <accounting client>-<CE type>-<batch system type> eg. "APEL-CREAM-PBS"                                ||    ||
 +
|-
 +
| InfrastructureType    || String || grid OR local                                ||    ||
 +
|-
 +
| MemoryReal        || int    || Memory consumed by job (kbytes)              ||    || urf:Memory[@metric="max" and @type="Physical" and @storageUnit="KB"]
 
|-
 
|-
| MemoryVirtual    || int    || Virtual memory consumed by job&nbsp;(kbytes) ||    ||  
+
| MemoryVirtual    || int    || Virtual memory consumed by job&nbsp;(kbytes) ||    || urf:Memory[@metric="max" and @type="Shared" and @storageUnit="KB"]
 
|-
 
|-
 
| ServiceLevelType  || String || Si2k OR HEPSPEC                              || Yes || urf:ServiceLevel[@type]
 
| ServiceLevelType  || String || Si2k OR HEPSPEC                              || Yes || urf:ServiceLevel[@type]
Line 69: Line 82:
  
  
'''End of record:''' &nbsp;%%  
+
'''End of record:''' &nbsp;%%
  
=== Changes since version 0.1 ===
+
=== Changes since version 0.2 ===
 +
 
 +
* InfrastructureType field (optional)
 +
* InfrastructureDescription field (optional)
 +
* SubmitHostType field (optional)
 +
 
 +
=== Changes from version 0.1 to version 0.2 ===
  
 
* LocalJobID has changed to LocalJobId
 
* LocalJobID has changed to LocalJobId
Line 83: Line 102:
 
=== Notes ===
 
=== Notes ===
  
If&nbsp;!GlobalUserName or&nbsp;!UserFQAN is not published, the value for these fields on the server will be set to 'None'.
+
If GlobalUserName or UserFQAN is not published, the value for these fields on the server will be set to 'None'.
 +
 
 +
Jobs are assumed to be grid jobs.  To specify local jobs:
 +
* InfrastructureType: local
 +
* SubmitHostType: LRMS
 +
* SubmitHost: <LRMS-hostname>
 +
 
 +
'''The Group value specified for local jobs must be different to equivalent grid jobs''', or you will not be able to differentiate them in the accounting portal.  Suggestion:
 +
* <code>Group: atlas</code> - grid job
 +
* <code>Group: local-atlas</code> - local job
 +
This advice may change as we get more sites publishing local jobs.
  
 
=== Example Message  ===
 
=== Example Message  ===
Line 114: Line 143:
 
=== Description  ===
 
=== Description  ===
  
 
+
'''Header''': APEL-summary-job-message: v0.3
* See [[APEL/MessageFormatV01]] for version 0.1 of the message formats.
 
 
 
'''Header''': APEL-summary-job-message: v0.2
 
  
 
The header only appears once at the top of each message. It defines the type of record and the schema version.  
 
The header only appears once at the top of each message. It defines the type of record and the schema version.  
Line 131: Line 157:
 
! AUR equivalent
 
! AUR equivalent
 
|-
 
|-
| Site                  || String || GOCDB sitename                                                                    || Yes ||
+
| Site                  || String || GOCDB sitename                                                                    || Yes ||
 
|-
 
|-
 
| Month                  || int    || Month of summary (see notes)                                                      || Yes ||
 
| Month                  || int    || Month of summary (see notes)                                                      || Yes ||
Line 139: Line 165:
 
| GlobalUserName        || String || User's X509 DN                                                                    ||    || aur:UserIdentity/urf:GlobalUserName
 
| GlobalUserName        || String || User's X509 DN                                                                    ||    || aur:UserIdentity/urf:GlobalUserName
 
|-
 
|-
| Group                  || String || User's VO                                                                          ||    || aur:UserIdentity/urf:Group
+
| VO                    || String || User's VO                                                                          ||    || aur:UserIdentity/urf:Group
 
|-
 
|-
 
| VOGroup                || String || User's VOMS group                                                                  ||    || aur:UserIdentity/urf:GroupAttribute[@type="vo-group"]
 
| VOGroup                || String || User's VOMS group                                                                  ||    || aur:UserIdentity/urf:GroupAttribute[@type="vo-group"]
Line 145: Line 171:
 
| VORole                || String || User's VOMS&nbsp;role                                                              ||    || aur:UserIdentity/urf:GroupAttribute[@type="vo-role"]
 
| VORole                || String || User's VOMS&nbsp;role                                                              ||    || aur:UserIdentity/urf:GroupAttribute[@type="vo-role"]
 
|-
 
|-
 +
| SubmitHost || String || The CE-ID or LRMS hostname ||                ||
 +
|-
 +
| Infrastructure || String || grid OR local ||                ||
 +
|-
 +
| Processors            || int    || Number of processors ||  ||
 +
|-
 +
| NodeCount || int   || Number of nodes ||  ||
 +
|-
 
| EarliestEndTime        || int    || End time of the first job in the month (epoch time)                                ||    || AUR has dates in ISO 8601 format
 
| EarliestEndTime        || int    || End time of the first job in the month (epoch time)                                ||    || AUR has dates in ISO 8601 format
 
|-
 
|-
 
| LatestEndTime          || int    || End time of the last job in the month (epoch time)                                ||    || AUR has dates in ISO 8601 format
 
| LatestEndTime          || int    || End time of the last job in the month (epoch time)                                ||    || AUR has dates in ISO 8601 format
 
|-
 
|-
| WallDuration          || int    || Sum of wall clock times for all jobs in the month (in hours)                      || Yes || AUR has durations in ISO 8601 format
+
| WallDuration          || int    || Sum of wall clock times for all jobs in the month (in seconds)                      || Yes || AUR has durations in ISO 8601 format
 
|-
 
|-
| CpuDuration            || int    || Sum of CPU&nbsp;time for all jobs in the month (in hours)                          || Yes || AUR has durations in ISO 8601 format
+
| CpuDuration            || int    || Sum of CPU&nbsp;time for all jobs in the month (in seconds)                          || Yes || AUR has durations in ISO 8601 format
 
|-
 
|-
| NormalisedWallDuration || int    || Sum of normalised wall clock time for all jobs (in hours; normalised by HEPSPEC06) || Yes || AUR has durations in ISO 8601 format; aur specifies normalisation factor
+
| NormalisedWallDuration || int    || Sum of normalised wall clock time for all jobs (in seconds; normalised by HEPSPEC06) || Yes || AUR has durations in ISO 8601 format
 
|-
 
|-
| NormalisedCpuDuration  || int    || Sum of normalised CPU&nbsp;times for all jobs  (in hours; normalised by HEPSPEC06) || Yes || AUR has durations in ISO 8601 format; aur specifies normalisation factor
+
| NormalisedCpuDuration  || int    || Sum of normalised CPU&nbsp;times for all jobs  (in seconds; normalised by HEPSPEC06) || Yes || AUR has durations in ISO 8601 format
 
|-
 
|-
 
| NumberOfJobs          || int    || Total number of jobs                                                              || Yes ||
 
| NumberOfJobs          || int    || Total number of jobs                                                              || Yes ||
 
|}
 
|}
  
'''End of record:'''&nbsp;%%  
+
'''End of record:'''&nbsp;%%
  
 +
'''Notes:'''
  
=== Changes since version 0.1 ===
+
If&nbsp;GlobalUserName, VO, Group or Role are not published, the value for these fields on the server will be set to 'None'.
 
 
* VO has changed to Group
 
* Group has changed to VOGroup
 
* Role has changed to VORole
 
 
 
=== Notes ===
 
  
If&nbsp;GlobalUserName, Group, VORole or VOGroup are not published, the value for these fields on the server will be set to 'None'.  
+
The job records are included in months according to the month and year of their EndTime. The month and year should be in UTC. Only completed jobs are accounted for by APEL.  
  
A single job record must only be included in one summary record to avoid duplication of data. The job records are included in months according to the month and year of their EndTime.  Only completed jobs are accounted for by APEL.
+
All durations are in hours. Normalised durations should be multiplied by HEPSPEC06. All figures should be rounded to the nearest integer.
  
 
=== Example Message  ===
 
=== Example Message  ===
<pre>APEL-summary-job-message: v0.2
+
<pre>APEL-summary-job-message: v0.3
 
Site: RAL-LCG2
 
Site: RAL-LCG2
 
Month: 3
 
Month: 3
 
Year: 2010
 
Year: 2010
 
GlobalUserName: /C=whatever/D=someDN
 
GlobalUserName: /C=whatever/D=someDN
Group: atlas
+
VO: atlas
 
VOGroup: /atlas
 
VOGroup: /atlas
 
VORole: Role=production
 
VORole: Role=production
 +
SubmitHost:  test06.ral.ac.uk:8443/cream-pbs-GRID_ops
 +
Infrastructure: grid
 +
Processors: 1
 +
NodeCount: 1
 
EarliestEndTime: 1267527463
 
EarliestEndTime: 1267527463
 
LatestEndTime: 1269773863
 
LatestEndTime: 1269773863
Line 197: Line 230:
 
%%
 
%%
 
</pre>
 
</pre>
 +
 +
=== Description  ===
 +
 +
* See [[APEL/MessageFormatV01]] for version 0.1 of the message formats.
 +
 +
* See [[APEL/MessageFormatV02]] for version 0.2 of the message formats.
  
 
== Summary Sync Records  ==
 
== Summary Sync Records  ==
Line 218: Line 257:
 
| String  
 
| String  
 
| GOCDB&nbsp;sitename  
 
| GOCDB&nbsp;sitename  
 +
| Yes
 +
|-
 +
| SubmitHost
 +
| String
 +
| CE ID
 
| Yes
 
| Yes
 
|-
 
|-
Line 255: Line 299:
 
<pre>APEL-sync-message: v0.1
 
<pre>APEL-sync-message: v0.1
 
Site: RAL-LCG2
 
Site: RAL-LCG2
 +
SubmitHost: raltest.rl.ac.uk:8443/cream-pbs-demo
 
NumberOfJobs: 3479
 
NumberOfJobs: 3479
 
Month: 1
 
Month: 1

Latest revision as of 13:31, 2 July 2014

APEL Message Format

Please note: we have changed the formats of the job record message and the summary record message, to bring them in line with the EMI Compute Accounting Record. You can find a definition of the EMI CAR here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccountingRecord. The document includes the definition of the Aggregated Usage Record (AUR) which is equivalent to our summary records.

This describes a new message format for getting data between the APEL clients and the server.

It also describes the mapping between its fields and those of the EMI Compute Accounting Record (CAR), described here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting.

Terminology:

  • A message is one file which is sent and received by the SSM. Usually a message will contain a number of records (eg 1000)
  • A record corresponds to one row in the database. It contains a number of key-value pairs as specified by the tables below
  • The header in each message tells the server which type of records are in that message. You need one header per message, so one header per file.
  • Epoch time, also known as a Unix timestamp, is an integer number of seconds since 1st January 1970. For example, the epoch time now is 1311675474. The command date +%s will give you the current epoch time on linux.

Optional fields:

  • If you do not have a value for a field in a record, use "null" or "none" (not case-sensitive) as the value for this attribute OR leave the field out of the record completely.
  • This applies to all the optional fields in JobRecords and Summary records.


Job Records

A message is one file. It can contain multiple records. Different records must be separated by the end of record marker (%%).

Description

Header APEL-individual-job-message: v0.3

The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version.

The table shows the equivalent field in the CAR, under the container element urf:UsageRecord. If not specified, it refers to the text value of urf:Key, where the element is a direct child of urf:UsageRecord.

Key Value Description Mandatory CAR equivalent (if different)
Site String GOCDB sitename Yes
SubmitHost String The CE-ID (see example) Yes
MachineName String LRMS hostname
Queue String Batch system queue
LocalJobId String Batch System Job ID Yes urf:JobIdentity/urf:LocalJobId
LocalUserId String Local username urf:UserIdentity/urf:LocalUserId
GlobalUserName String User's X509 DN urf:UserIdentity/urf:GlobalUserName
FQAN String User's VOMS attributes urf:UserIdentity/urf:GroupAttribute[@type="FQAN"]
WallDuration int Wallclock time for the job (seconds) Yes CAR has ISO 8601 time duration
CpuDuration int CPU time for the job (seconds) Yes CAR has ISO 8601 time duration
Processors int Number of processors urf:Processors[@metric="max"]
NodeCount int Number of nodes
StartTime int Start time of the job (epoch time) Yes CAR has ISO 8601 datetime
EndTime int Stop time of the job (epoch time) Yes CAR has ISO 8601 datetime
InfrastructureDescription String <accounting client>-<CE type>-<batch system type> eg. "APEL-CREAM-PBS"
InfrastructureType String grid OR local
MemoryReal int Memory consumed by job (kbytes) urf:Memory[@metric="max" and @type="Physical" and @storageUnit="KB"]
MemoryVirtual int Virtual memory consumed by job (kbytes) urf:Memory[@metric="max" and @type="Shared" and @storageUnit="KB"]
ServiceLevelType String Si2k OR HEPSPEC Yes urf:ServiceLevel[@type]
ServiceLevel double Value of either HepSpec06 or SpecInt2000 Yes urf:ServiceLevel


End of record:  %%

Changes since version 0.2

  • InfrastructureType field (optional)
  • InfrastructureDescription field (optional)
  • SubmitHostType field (optional)

Changes from version 0.1 to version 0.2

  • LocalJobID has changed to LocalJobId
  • LocalUserID has changed to LocalUserId
  • UserFQAN has changed to FQAN
  • ScalingFactorUnit has changed to ServiceLevelType
  • The possible values of ScalingFactorType have changed from ["SpecInt2000", "HepSpec06", "custom"] to ["Si2k"], ["HEPSPEC"]
  • ScalingFactor has changed to ServiceLevel


Notes

If GlobalUserName or UserFQAN is not published, the value for these fields on the server will be set to 'None'.

Jobs are assumed to be grid jobs. To specify local jobs:

  • InfrastructureType: local
  • SubmitHostType: LRMS
  • SubmitHost: <LRMS-hostname>

The Group value specified for local jobs must be different to equivalent grid jobs, or you will not be able to differentiate them in the accounting portal. Suggestion:

  • Group: atlas - grid job
  • Group: local-atlas - local job

This advice may change as we get more sites publishing local jobs.

Example Message

APEL-individual-job-message: v0.2
Site: RAL-LCG2
SubmitHost: ce01.ncg.ingrid.pt:2119/jobmanager-lcgsge-atlasgrid
LocalJobId: 31564872
LocalUserId: atlasprd019
GlobalUserName: /C=whatever/D=someDN
FQAN: /voname/Role=NULL/Capability=NULL
WallDuration: 234256
CpuDuration: 2345
Processors: 2
NodeCount: 2
StartTime: 1234567890
EndTime: 1234567899
MemoryReal: 1000
MemoryVirtual: 2000
ServiceLevelType: Si2k
ServiceLevel: 1000
%%
...another job record...
%%
...
%%

Summary Job Records

Description

Header: APEL-summary-job-message: v0.3

The header only appears once at the top of each message. It defines the type of record and the schema version.

The table shows the equivalent field in the AUR, under the container element aur:SummaryRecord. If not specified, it refers to the text value of urf:Key, where the element is a direct child of aur:SummaryRecord.

Key Value Description Mandatory AUR equivalent
Site String GOCDB sitename Yes
Month int Month of summary (see notes) Yes
Year int Year of summary (see notes) Yes
GlobalUserName String User's X509 DN aur:UserIdentity/urf:GlobalUserName
VO String User's VO aur:UserIdentity/urf:Group
VOGroup String User's VOMS group aur:UserIdentity/urf:GroupAttribute[@type="vo-group"]
VORole String User's VOMS role aur:UserIdentity/urf:GroupAttribute[@type="vo-role"]
SubmitHost String The CE-ID or LRMS hostname
Infrastructure String grid OR local
Processors int Number of processors
NodeCount int Number of nodes
EarliestEndTime int End time of the first job in the month (epoch time) AUR has dates in ISO 8601 format
LatestEndTime int End time of the last job in the month (epoch time) AUR has dates in ISO 8601 format
WallDuration int Sum of wall clock times for all jobs in the month (in seconds) Yes AUR has durations in ISO 8601 format
CpuDuration int Sum of CPU time for all jobs in the month (in seconds) Yes AUR has durations in ISO 8601 format
NormalisedWallDuration int Sum of normalised wall clock time for all jobs (in seconds; normalised by HEPSPEC06) Yes AUR has durations in ISO 8601 format
NormalisedCpuDuration int Sum of normalised CPU times for all jobs (in seconds; normalised by HEPSPEC06) Yes AUR has durations in ISO 8601 format
NumberOfJobs int Total number of jobs Yes

End of record: %%

Notes:

If GlobalUserName, VO, Group or Role are not published, the value for these fields on the server will be set to 'None'.

The job records are included in months according to the month and year of their EndTime. The month and year should be in UTC. Only completed jobs are accounted for by APEL.

All durations are in hours. Normalised durations should be multiplied by HEPSPEC06. All figures should be rounded to the nearest integer.

Example Message

APEL-summary-job-message: v0.3
Site: RAL-LCG2
Month: 3
Year: 2010
GlobalUserName: /C=whatever/D=someDN
VO: atlas
VOGroup: /atlas
VORole: Role=production
SubmitHost:  test06.ral.ac.uk:8443/cream-pbs-GRID_ops
Infrastructure: grid
Processors: 1
NodeCount: 1
EarliestEndTime: 1267527463
LatestEndTime: 1269773863
WallDuration: 23425
CpuDuration: 2345
NormalisedWallDuration: 244435
NormalisedCpuDuration: 2500
NumberOfJobs: 100
%%
...another summary job record...
%%
...
%%

Description

Summary Sync Records

The summary Sync records are used for the creation of the apel-sync Nagios test. It is a mechanism for the central APEL server to know the number of records that each site is storing locally. It is in general only used by sites which publish via the standard APEL client.

Description

Header: APEL-sync-message: v0.1

The header only appears once at the top of each message. It defines the type of record and the schema version.

Key Value Description Mandatory
Site String GOCDB sitename Yes
SubmitHost String CE ID Yes
NumberOfJobs int Total number of jobs for that month Yes
Month int Month Yes
Year int Year Yes







End of record: %%

Notes:

Each record indicates the number of jobs run on the site per month. This data is used to create the Nagios apel-sync test.

Example Message

APEL-sync-message: v0.1
Site: RAL-LCG2
SubmitHost: raltest.rl.ac.uk:8443/cream-pbs-demo
NumberOfJobs: 3479
Month: 1
Year: 2010
%%
...another sync record...
%%
...
%%