Revision as of 12:51, 2 June 2011

Introduction

recommended hardware setup;
why DNS round robin is a good technique to adopt for top-bdii load balancing;
what to check to verify availability of a top-bdii instance;

Method

Implementation examples

Basically, our setup is also based on dns round robin (for load balancing) and we use nagios to check each top-bdii instance and update the dns records (a nagios event handler runs a script that add/delete the "A" record using nsupdate).

Primary DNS and nagios are clearly single points of failure, but we prefer to keep the setup very simple, avoiding for example the inconsistency of DNS information using more than one primary DNS (as you reported) or issues about incoherent results if more than one server check to the top-bdii instances. To mitigate these spof, we check (via another nagios instance) the DNS server and the Nagios used to update the DNS records and a sms notification is sent in case of problem to the people on duty for H24 support.

About a best practice document, i think it should explain:

   * recommended hardware setup;
   * why DNS round robin is a good technique to adopt for top-bdii load balancing;
   * what to check to verify availability of a top-bdii instance;

Other issues, like the use of virtual machines, how to configure the DNS, how to check the top-bdii instances (using nagios or a cron, for example) and how to update the DNS are implementation details: they highly depend on the configuration, experiences and policies adopted at each resource center and ngi. Of course, the best practice documentation could be integrated with some use cases.

Difference between revisions of "MAN05 top-BDII and site-BDII High Availability"

Revision as of 12:51, 2 June 2011

Contents

Introduction

Method

Implementation examples

Authors

Navigation menu

@@ Line 1: / Line 1: @@
 = Introduction =
+* recommended hardware setup;
+* why DNS round robin is a good technique to adopt for top-bdii load balancing;
+* what to check to verify availability of a top-bdii instance;
 = Method =
 = Implementation examples =
+Basically, our setup is also based on dns round robin (for load balancing) and we use nagios to check each top-bdii instance and update the dns records (a nagios event handler runs a script that add/delete the "A" record using nsupdate).
+Primary DNS and nagios are clearly single points of failure, but we prefer to keep the setup very simple, avoiding for example the inconsistency of DNS information using more than one primary DNS (as you reported) or issues about incoherent results if more than one server check to the top-bdii instances. To mitigate these spof, we check (via another nagios instance) the DNS server and the Nagios used to update the DNS records and a sms notification is sent in case of problem to the people on duty for H24 support.
+About a best practice document, i think it should explain:
+    * recommended hardware setup;
+    * why DNS round robin is a good technique to adopt for top-bdii load balancing;
+    * what to check to verify availability of a top-bdii instance;
+Other issues, like the use of virtual machines, how to configure the DNS, how to check the top-bdii instances (using nagios or a cron, for example) and how to update the DNS are implementation details: they highly depend on the configuration, experiences and policies adopted at each resource center and ngi. Of course, the best practice documentation could be integrated with some use cases.
 == Authors ==

Difference between revisions of "MAN05 top-BDII and site-BDII High Availability"

Revision as of 12:51, 2 June 2011

Introduction

Method

Implementation examples

Authors

Navigation menu

Search