Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

WMS best practices

From EGIWiki
Revision as of 17:56, 7 December 2011 by Crsndr (talk | contribs) (→‎Hardware)
Jump to navigation Jump to search

Objective

This document objective is to to provide guidelines on how to improve the availability and load balancing of the WMS by addressing three main areas:

  • Requirements to deploy a wmsmonitor service
  • Best practices from a client perspective
  • Best practices to implement a High Availability WMS service
  • WMS maintenance


Hardware

Hardware (WMSMonitor)

  • dual core CPU
  • 20GB of hard disk space
  • 4 GB RAM


Hardware (DNS if used only for this purpose)

  • single core CPU (it can also be installed as a virtual machine)
  • 10GB of hard disk space
  • 1 GB RAM


Hardware (WMSs)

Each WMS should be installed in a dedicated HW (e.g.: no virtual machines). Number of cores, RAM, disk space and number of machine should be proportional to the VOs supported and the number of job submitted to the WMSs but should have at least:

  • quad core CPU
  • 150 GB of hard disk space
  • 8 GB RAM

This kind of machine is able to process about 10.000 jobs/day (repeated for several days in a single month) with peaks at 15.000 jobs/day. Better HW seems not to have show linear improvements in the job rate (Eg.: even if the performances are improved, doubling the number of cores and amount of RAM is not reflected in the number of job processed per day). We tested only systems with 2 SATA, ATA mirrored disks for the entire system. Better disks or SSD might improve the performances and allow a higher job rate. The directory that are more most used by the system are:

/var/lib/mysql /var/glite/SandboxDir

it might improve the performances a system that stores those two directory in different physical disks.

Physical vs Virtual Machines

Given the minimum hardware requirements there should not be any difference in using a Physical or Virtual Machine for the WMSMonitor and the DNS. However in the case of WMSMonitor the use of a database and the frequent disk access could be a limiting factor in using a Virtual Machine. For a small number of clients this should not represent an issue. The use of virtio can improve performances.


DNS round robin load balacing

Load balancing is a technique to distribute workload evenly across two or more resources. A load balancing method, which does not necessarily require a dedicated software or hardware node, is called round robin DNS.

We can not assume that all the job submitted to the WMS will require the same amount of resources and thus generate the same resource load (this depends on the job request, if there are errors in the submission and the job needs to be resubmitted, how many times it must be resubmitted, etc.). The load depends also on the type of hardware in which the WMS is installed. For an effective load balancing, a pool of available WMS should be regularly updated and the WMSs that have a higher load should be removed from that pool. All the WMSs that are in the pool should be used in a round robin fashion based on the DNS name resolution. With the help of the sensors installed in each WMS, the loadbalancing takes care to add and remove the WMSs from the pool on the DNS by updating records mapping on the same hostname. This results in a hostname that maps to multiple IP addresses under the configured DNS zone. As an example: in dns.top.domain, add multiple A records mapping to the same hostname with multiple IP addresses

Zone wms.zone.domain
name.wms.zone.domain IN A x.x.x.x
name.wms.zone.domain IN A y.y.y.y
name.wms.zone.domain IN A z.z.z.z

The 3 records are always served as answer but the order of the records will rotate in each DNS query. If the metrics on the test performed on one of those WMS report problems, the WMS is removed from the pool by removing the corresponding entry on the DNS. This mechanism provides fault tolerance of the WMSs.

In a similar way, a configurable number of WMSs that is under highest utilization is kept out of the pool to allow the submission of new jobs only to the WMSs that have less load.


Implementation

The metric measured by the wmsmonitor rely on sensors installed on each WMS. The detailed procedures for the WMS sensors and WMSMonitor server installation are available at this address:

https://twiki.cnaf.infn.it/twiki/bin/view/WMSMonitor/InstallationProcedureV2_1

An updated version of sensors and servers for the upcoming release of the WMS released in EMI is in Pre-View testing now. Documentation is available at this address:

https://twiki.cnaf.infn.it/twiki/bin/view/WMSMonitor/WebDownload

and the packages are distributed on request.


WMS maintenance

Even if a single instance of WMS is working correctly it may require regular maintenance. Especially the MySQL database keeps growing every time new jobs are submitted and does not shrink when they are removed. For this reason one of the following two operations can and should be performed if the free space is below 20%:

  • (WMS does not need to be drained) before the installation of the WMS and the creation of the databases this configuration should be added, if not already present, in /etc/my.cnf:
innodb_file_per_table
default-storage-engine=InnoDB

in the [mysqld] section. This allows to use the following command:

mysqlcheck --optimize lbproxy -u root -p

that optimize and reduce the size of the MySQL tables.

  • (WMS needs to be drained) after the WMS is drained the services can be stopped:
service gLite stop
service mysqld stop

It is then possible to remove the MySQL files

rm -rf /var/lib/mysql

at this point a reconfiguration of the WMS is necessary. This will recreate the database and tables structure.