APEL/MessageFormat

From EGIWiki
< APEL
Revision as of 17:14, 29 November 2011 by Ap (talk | contribs) (Description)
Jump to: navigation, search

APEL Message Format

This describes a new message format for getting data between the APEL clients and the server.

It also describes the mapping between its fields and those of the EMI Compute Accounting Record (CAR), described here: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting.

Terminology:

  • A message is one file which is sent and received by the SSM. Usually a message will contain a number of records (eg 1000)
  • A record corresponds to one row in the database. It contains a number of key-value pairs as specified by the tables below
  • The header in each message tells the server which type of records are in that message. You need one header per message, so one header per file.
  • Epoch time, also known as a Unix timestamp, is an integer number of seconds since 1st January 1970. For example, the epoch time now is 1311675474. The command date +%s will give you the current epoch time on linux.

Job Records

A message is one file. It can contain multiple records. Different records must be separated by the end of record marker (%%).

Description

Header APEL-individual-job-message: v0.1

The header only appears once at the top of each message (that is once at the top of each file). It defines the type of record and the schema version.

Key Value Description Mandatory CAR equivalent (if different)
Site String GOCDB sitename Yes SiteName
SubmitHost String The CE-ID (see example) Yes
LocalJobID String Batch System Job ID Yes
LocalUserID String Local username
GlobalUserName String User's X509 DN
UserFQAN String User's VOMS attributes FQAN
WallDuration int Wallclock time for the job (seconds) Yes
CpuDuration int CPU time for the job (seconds) Yes
Processors int Number of processors
NodeCount int Number of nodes
StartTime int Start time of the job (epoch time) Yes
EndTime int Stop time of the job (epoch time) Yes
MemoryReal int Memory consumed by job (kbytes)
MemoryVirtual int Virtual memory consumed by job (kbytes)
ScalingFactorUnit String HepSpec | SpecInt Yes ServiceLevelType
ScalingFactor double Value of either HepSpec or SpecInt Yes ServiceLevel


End of record:  %%

Notes: If !ScalingFactorUnit/Value is not available it should be set to:

ScalingFactorUnit = 'custom'
ScalingFactor = 1

If !GlobalUserName or !UserFQAN is not published, the value for these fields on the server will be set to 'None'.

Example Message

APEL-individual-job-message: v0.1
Site: RAL-LCG2
SubmitHost: ce01.ncg.ingrid.pt:2119/jobmanager-lcgsge-atlasgrid
LocalJobID: 31564872
LocalUserID: atlasprd019
GlobalUserName: /C=whatever/D=someDN
UserFQAN: /voname/Role=NULL/Capability=NULL
WallDuration: 234256
CpuDuration: 2345
Processors: 2
NodeCount: 2
StartTime: 1234567890
EndTime: 1234567899
MemoryReal: 1000
MemoryVirtual: 2000
ScalingFactorUnit: SpecInt2000
ScalingFactor: 1000
%%
...another job record...
%%
...
%%

Summary Job Records

Description

Header: APEL-summary-job-message: v0.1

The header only appears once at the top of each message. It defines the type of record and the schema version.

Key Value Description Mandatory
Site String GOCDB sitename Yes
Month int Month of summary (see notes) Yes
Year int Year of summary (see notes) Yes
GlobalUserName String User's X509 DN
VO String User's VO
Group String User's VOMS group
Role String User's VOMS role
EarliestEndTime int End time of the first job in the month (epoch time)
LatestEndTime int End time of the last job in the month (epoch time)
WallDuration int Sum of wall clock times for all jobs in the month (in hours) Yes
CpuDuration int Sum of CPU time for all jobs in the month (in hours) Yes
NormalisedWallDuration int Sum of normalised wall clock time for all jobs (in hours; normalised by HEPSPEC06) Yes
NormalisedCpuDuration int Sum of normalised CPU times for all jobs (in hours; normalised by HEPSPEC06) Yes
NumberOfJobs int Total number of jobs Yes

End of record: %%

Notes:

If GlobalUserName, VO, Role or Group are not published, the value for these fields on the server will be set to 'None'.

A single job record must only be included in one summary record to avoid duplication of data. The job records are included in months according to the month and year of their EndTime. Only completed jobs are accounted for by APEL.

Example Message

APEL-summary-job-message: v0.1
Site: RAL-LCG2
Month: 3
Year: 2010
GlobalUserName: /C=whatever/D=someDN
VO: atlas
Group: /atlas
Role: Role=production
EarliestEndTime: 1267527463
LatestEndTime: 1269773863
WallDuration: 23425
CpuDuration: 2345
NormalisedWallDuration: 244435
NormalisedCpuDuration: 2500
NumberOfJobs: 100
%%
...another summary job record...
%%
...
%%

Summary Sync Records

The summary Sync records are used for the creation of the apel-sync Nagios test. It is a mechanism for the central APEL server to know the number of records that each site is storing locally. It is in general only used by sites which publish via the standard APEL client.

Description

Header: APEL-sync-message: v0.1

The header only appears once at the top of each message. It defines the type of record and the schema version.

Key Value Description Mandatory
Site String GOCDB sitename Yes
NumberOfJobs int Total number of jobs for that month Yes
Month int Month Yes
Year int Year Yes







End of record: %%

Notes:

Each record indicates the number of jobs run on the site per month. This data is used to create the Nagios apel-sync test.

Example Message

APEL-sync-message: v0.1
Site: RAL-LCG2
NumberOfJobs: 3479
Month: 1
Year: 2010
%%
...another sync record...
%%
...
%%