Difference between revisions of "Unknown issue"
Jump to navigation
Jump to search
Present situation
Line 18: | Line 18: | ||
#Some of NGIs reach ~0% for all their sites and some reach even ~40%, sometimes disproporsions are even within one NGI. What/where is the reason for so high values and disproporsions? | #Some of NGIs reach ~0% for all their sites and some reach even ~40%, sometimes disproporsions are even within one NGI. What/where is the reason for so high values and disproporsions? | ||
= Solution | = Solution proposals = | ||
== Strict policy for the developers how to use UNKNOWN status == | == Strict policy for the developers how to use UNKNOWN status == | ||
'''Advantage''': we will be sure that all problems will be properly addressed as ERROR not UNKNOWN<br>'''Disadvantages''': someone has to write the policy and check whether it is respected | '''Advantage''': we will be sure that all problems will be properly addressed as ERROR not UNKNOWN<br>'''Disadvantages''': someone has to write the policy and check whether it is respected | ||
== Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h == | == Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h == | ||
'''Advantage''': we will be notified if the UNKNOWN status takes too long<br>'''Disadvantages''': it means an extra work for ROD which will be look not only after ERRORs but also UNKNOWNs | '''Advantage''': we will be notified if the UNKNOWN status takes too long<br>'''Disadvantages''': it means an extra work for ROD which will be look not only after ERRORs but also UNKNOWNs | ||
== Threshold for UNKNOWN status == | == Threshold for UNKNOWN status == | ||
'''Advantage''': it is easy and fast to implement and automate<br>'''Disadvantages''': there is a possibility that overlook an important problem | '''Advantage''': it is easy and fast to implement and automate<br>'''Disadvantages''': there is a possibility that overlook an important problem |
Revision as of 12:04, 12 October 2011
this page will contain information about UNKNOWN status issue
Present situation
Availability and Reliability calculations formulas:
Availability = Uptime / (Total time - Time_status_was_UNKNOWN) Reliability = Uptime / (Total time - Scheduled Downtime - Time_status_was_UNKNOWN)
How to read in context of UNKNOWN status:
- Period in which site is in status UNKNOWN is not taken into calculation.
- During this period EGI doesn’t know what is happening with the infrastructure.
Problems
- No policy for test developers when test should return UNKNOWN status. What does UNKNOWN status mean?
- Some of NGIs reach ~0% for all their sites and some reach even ~40%, sometimes disproporsions are even within one NGI. What/where is the reason for so high values and disproporsions?
Solution proposals
Strict policy for the developers how to use UNKNOWN status
Advantage: we will be sure that all problems will be properly addressed as ERROR not UNKNOWN
Disadvantages: someone has to write the policy and check whether it is respected
Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h
Advantage: we will be notified if the UNKNOWN status takes too long
Disadvantages: it means an extra work for ROD which will be look not only after ERRORs but also UNKNOWNs
Threshold for UNKNOWN status
Advantage: it is easy and fast to implement and automate
Disadvantages: there is a possibility that overlook an important problem