Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "MAN05 top-BDII and site-BDII High Availability"

From EGIWiki
Jump to navigation Jump to search
(Created page with '= Introduction = = Method = = Implementation examples = == Authors ==')
 
Line 1: Line 1:
= Introduction =
= Introduction =
* recommended hardware setup;
* why DNS round robin is a good technique to adopt for top-bdii load balancing;
* what to check to verify availability of a top-bdii instance;


= Method =
= Method =


= Implementation examples =
= Implementation examples =
Basically, our setup is also based on dns round robin (for load balancing) and we use nagios to check each top-bdii instance and update the dns records (a nagios event handler runs a script that add/delete the "A" record using nsupdate).
Primary DNS and nagios are clearly single points of failure, but we prefer to keep the setup very simple, avoiding for example the inconsistency of DNS information using more than one primary DNS (as you reported) or issues about incoherent results if more than one server check to the top-bdii instances. To mitigate these spof, we check (via another nagios instance) the DNS server and the Nagios used to update the DNS records and a sms notification is sent in case of problem to the people on duty for H24 support.
About a best practice document, i think it should explain:
    * recommended hardware setup;
    * why DNS round robin is a good technique to adopt for top-bdii load balancing;
    * what to check to verify availability of a top-bdii instance;
Other issues, like the use of virtual machines, how to configure the DNS, how to check the top-bdii instances (using nagios or a cron, for example) and how to update the DNS are implementation details: they highly depend on the configuration, experiences and policies adopted at each resource center and ngi. Of course, the best practice documentation could be integrated with some use cases.


== Authors ==
== Authors ==

Revision as of 12:51, 2 June 2011

Introduction

  • recommended hardware setup;
  • why DNS round robin is a good technique to adopt for top-bdii load balancing;
  • what to check to verify availability of a top-bdii instance;

Method

Implementation examples

Basically, our setup is also based on dns round robin (for load balancing) and we use nagios to check each top-bdii instance and update the dns records (a nagios event handler runs a script that add/delete the "A" record using nsupdate).

Primary DNS and nagios are clearly single points of failure, but we prefer to keep the setup very simple, avoiding for example the inconsistency of DNS information using more than one primary DNS (as you reported) or issues about incoherent results if more than one server check to the top-bdii instances. To mitigate these spof, we check (via another nagios instance) the DNS server and the Nagios used to update the DNS records and a sms notification is sent in case of problem to the people on duty for H24 support.

About a best practice document, i think it should explain:

   * recommended hardware setup;
   * why DNS round robin is a good technique to adopt for top-bdii load balancing;
   * what to check to verify availability of a top-bdii instance;

Other issues, like the use of virtual machines, how to configure the DNS, how to check the top-bdii instances (using nagios or a cron, for example) and how to update the DNS are implementation details: they highly depend on the configuration, experiences and policies adopted at each resource center and ngi. Of course, the best practice documentation could be integrated with some use cases.


Authors