Revision as of 12:04, 12 October 2011

this page will contain information about UNKNOWN status issue

Present situation

Availability and Reliability calculations formulas:

Availability = Uptime / (Total time - Time_status_was_UNKNOWN)
Reliability = Uptime / (Total time - Scheduled Downtime - Time_status_was_UNKNOWN)

How to read in context of UNKNOWN status:

Period in which site is in status UNKNOWN is not taken into calculation.
During this period EGI doesn’t know what is happening with the infrastructure.

Problems

No policy for test developers when test should return UNKNOWN status. What does UNKNOWN status mean?
Some of NGIs reach ~0% for all their sites and some reach even ~40%, sometimes disproporsions are even within one NGI. What/where is the reason for so high values and disproporsions?

Solution proposals

Strict policy for the developers how to use UNKNOWN status

Advantage: we will be sure that all problems will be properly addressed as ERROR not UNKNOWN
Disadvantages: someone has to write the policy and check whether it is respected

Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h

Advantage: we will be notified if the UNKNOWN status takes too long
Disadvantages: it means an extra work for ROD which will be look not only after ERRORs but also UNKNOWNs

Threshold for UNKNOWN status

Advantage: it is easy and fast to implement and automate
Disadvantages: there is a possibility that overlook an important problem

@@ Line 18: / Line 18: @@
 #Some of NGIs reach ~0% for all their sites and some reach even ~40%, sometimes disproporsions are even within one NGI. What/where is the reason for so high values and disproporsions?
-= Solution proposal  =
+= Solution proposals  =
-== Strict policy for the developers how to use UNKNOWN status ==
+== Strict policy for the developers how to use UNKNOWN status  ==
 '''Advantage''': we will be sure that all problems will be properly addressed as ERROR not UNKNOWN<br>'''Disadvantages''': someone has to write the policy and check whether it is respected
-== Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h ==
+== Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h  ==
 '''Advantage''': we will be notified if the UNKNOWN status takes too long<br>'''Disadvantages''': it means an extra work for ROD which will be look not only after ERRORs but also UNKNOWNs
-== Threshold for UNKNOWN status ==
+== Threshold for UNKNOWN status  ==
 '''Advantage''': it is easy and fast to implement and automate<br>'''Disadvantages''': there is a possibility that overlook an important problem

Difference between revisions of "Unknown issue"

Revision as of 12:04, 12 October 2011

Contents

Present situation

Problems

Solution proposals

Strict policy for the developers how to use UNKNOWN status

Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h

Threshold for UNKNOWN status

Navigation menu

Difference between revisions of "Unknown issue"

Revision as of 12:04, 12 October 2011

Present situation

Problems

Solution proposals

Strict policy for the developers how to use UNKNOWN status

Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h

Threshold for UNKNOWN status

Navigation menu

Search