Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "FAQ Regional Operator on Duty"

From EGIWiki
Jump to navigation Jump to search
Line 3: Line 3:
= Handling issues during weekends and public holidays  =
= Handling issues during weekends and public holidays  =


Due to the fact that weekends and public holidays are not considered working days it is noted that '''ROD teams do not have any responsibilities during these days.''' RODs should ensure that in these days tickets do not expire and alarms will not aged above 72h.
Due to the fact that weekends and public holidays are not considered working days it is noted that '''ROD teams do not have any responsibilities during these days.''' RODs should ensure that in these days tickets do not expire and alarms will not aged above 72h.  


= Alarms when node is not in production and is part of production site  =
= Alarms when node is not in production and is part of production site  =


= Sites with multiple tickets open  =
= Sites with multiple tickets open  =


= Site/node in downtime  =
= Site/node in downtime  =
Line 21: Line 17:
Sites that are in downtime will still have monitoring switched on and therefore may appear to be failing tests but '''no alarms on Operations Portal will be raised '''against them. ROD must take care that when opening tickets to ensure that they don't open tickets against sites in downtime.  
Sites that are in downtime will still have monitoring switched on and therefore may appear to be failing tests but '''no alarms on Operations Portal will be raised '''against them. ROD must take care that when opening tickets to ensure that they don't open tickets against sites in downtime.  


== Site in downtime for more than a month ==  
== Handling alarms for site/node in downime  ==


If a site is in DOWNTIME for more than a month then it is advised that the site should go to the uncertified state.
It often happens that a failure occurred generating a lot of alarms and then site manager decided to put site in Downtime. Getting these alarms OK may take more than 72h when the issue is escalated to COD.<br>ROD should not create a ticket for sites/nodes in Downtime and is not obligated to deal with such alarms but it is recommended to close these alarms to avoid being escalated to COD. In such case as a reson of closing NON-OK alarm ROD should put link to the downtime in GOC&nbsp;DB.  


== Handling alarms for site/node in downime ==
== Site in downtime for more than a month  ==


It often happens that a failure occurred generating a lot of alarms and then site manager decided to put site in Downtime. Getting these alarms OK may take more than 72h when the issue is escalated to COD.<br>ROD should not create a ticket for sites/nodes in Downtime and is not obligated to deal with such alarms but it is recommended to close these alarms to avoid being escalated to COD. In such case as a reson of closing NON-OK alarm ROD should put link to the downtime in GOC&nbsp;DB.
If a site is in DOWNTIME for more than a month then it is advised that the site should go to the uncertified state.  


= Accounting issue  =
= Accounting issue  =


= Security matters =
= Security matters =

Revision as of 09:57, 31 May 2011

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Handling issues during weekends and public holidays

Due to the fact that weekends and public holidays are not considered working days it is noted that ROD teams do not have any responsibilities during these days. RODs should ensure that in these days tickets do not expire and alarms will not aged above 72h.

Alarms when node is not in production and is part of production site

Sites with multiple tickets open

Site/node in downtime

Handling tickets for site/node in downime

When a ticket has been raised against a site that subsequently enters downtime time, the expiry date on the ticket can be extended.

Sites that are in downtime will still have monitoring switched on and therefore may appear to be failing tests but no alarms on Operations Portal will be raised against them. ROD must take care that when opening tickets to ensure that they don't open tickets against sites in downtime.

Handling alarms for site/node in downime

It often happens that a failure occurred generating a lot of alarms and then site manager decided to put site in Downtime. Getting these alarms OK may take more than 72h when the issue is escalated to COD.
ROD should not create a ticket for sites/nodes in Downtime and is not obligated to deal with such alarms but it is recommended to close these alarms to avoid being escalated to COD. In such case as a reson of closing NON-OK alarm ROD should put link to the downtime in GOC DB.

Site in downtime for more than a month

If a site is in DOWNTIME for more than a month then it is advised that the site should go to the uncertified state.

Accounting issue

Security matters