Difference between revisions of "Unknown issue"
Line 24: | Line 24: | ||
== When test can return UNKNOWN status? == | == When test can return UNKNOWN status? == | ||
UNKNOWN status is documented in the Nagios plugins developer guidelines (http://nagiosplug.sourceforge.net/developer-guidelines.html): "Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states.". | |||
There is no plugins review process so we cannot be absolutely sure that plugin developers actually follow guideline. | |||
== What can cause UNKNOWN status disproporsions between sites within one NGI? == | == What can cause UNKNOWN status disproporsions between sites within one NGI? == |
Revision as of 15:03, 10 November 2011
this page will contain information about UNKNOWN status issue
Present situation
Availability and Reliability calculations formulas:
Availability = Uptime / (Total time - Time_status_was_UNKNOWN) Reliability = Uptime / (Total time - Scheduled Downtime - Time_status_was_UNKNOWN)
How to read in context of UNKNOWN status:
- Period in which site is in status UNKNOWN is not taken into calculation.
- During this period EGI doesn’t know what is happening with the infrastructure.
Problems & Questions
- No policy for test developers when test should return UNKNOWN status. What does UNKNOWN status mean?
- Some of NGIs reach ~0% for all their sites and some reach even ~40%, sometimes disproporsions are even within one NGI. What/where is the reason for so high values and disproporsions?
What can cause UNKNOWN status?
tbd
When test can return UNKNOWN status?
UNKNOWN status is documented in the Nagios plugins developer guidelines (http://nagiosplug.sourceforge.net/developer-guidelines.html): "Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states.".
There is no plugins review process so we cannot be absolutely sure that plugin developers actually follow guideline.
What can cause UNKNOWN status disproporsions between sites within one NGI?
tbd
Solution proposals
Strict policy for the developers how to use UNKNOWN status
Advantage: we will be sure that all problems will be properly addressed as ERROR not UNKNOWN
Disadvantages: someone has to write the policy and check whether it is respected
Alarms for UNKNOWN status should be created when UNKNOWN status is longer than 4h
Advantage: we will be notified if the UNKNOWN status takes too long
Disadvantages: it means an extra work for ROD which will be look not only after ERRORs but also UNKNOWNs
Threshold for UNKNOWN status
Advantage: it is easy and fast to implement and automate
Disadvantages: there is a possibility that overlook an important problem
Revision history
Version | Authors | Date | Comments |
---|---|---|---|
1.0 | Malgorzata Krakowian | 2011-10-12 | First draft |