Monday 25 April 2011

Building a RHEL 6/Centos 6 HA Cluster for LAN Services (part 4)

Clustered DNS Service
Named (bind) has a supplied Resource Agent in RH6, sadly it has at present a number of limitations. Firstly (might be fixed when you look at this) it doesn't ship with the required helper file named-parse-config.pl (bz#648897) so doesn't work at all without putting this in place. Even with this file in place the RA has a number of annoying limitations. Firstly it runs named as root which I'm not keen on. Also even though this agent changes the listen address to match the IP address passed to the service, Zone Transfers etc always come from the main IP of the node running the service which could do with not being the case (bz#680748).

So reluctantly I decided to use the script RA again.

I'll gloss over the details of the steps that are pretty identical to the dhcp service so:

(on one node)

/sbin/lvcreate --size 1G --name lv00named cluvg00
mkfs -t gfs2 -p lock_dlm -j 2 -t bldg1ux01clu:named /dev/cluvg00/lv00named

both nodes

Update /etc/fstab
mount /data/named

/sbin/chkconfig named off
/etc/init.d/named stop

On one node created my service mini-root again:

tar cpvf - etc/named* var/named | (cd /data/named; tar xpvf -)

Now we need to edit cluster.conf on one node and add in the resources:

<ip address="10.1.10.26" monitor_link="1"/>

<clusterfs device="/dev/cluvg00/lv00named" fstype="gfs2" mountpoint="/data/named" name="namedfs" options="acl"/>

and a new service section:


<service autostart="1" domain="bldg1ux01A" exclusive="0" name="named" recovery="relocate">
<clusterfs ref="namedfs"/>
<ip ref="10.1.10.26"/>
<script file="/etc/init.d/named" name="named"/>
</service>


Bump the version number as in the dhcpd example.

We now need to edit the /data/named/etc/named.conf file to direct things to the cluster shared data areas rather than the local machine ones. Also to make named listen on the cluster service IP rather than the node's. So in the options section I have:


options {
directory "/data/named/var/named";
dump-file "/data/named/var/named/data/cache_dump.db";
listen-on { 10.1.10.26; };
        statistics-file "/data/named/var/named/data/named_stats.txt";
        memstatistics-file "/data/named/var/named/data/named_mem_stats.txt";

Basically anywhere you see a /var/named or an /etc/named in this file it should probably now be /data/named/var/named and /data/named/etc. Also note the IP of the service is in here to listen-on.

This setup also suffers from the RA problem of zone transfers etc originate from the node's IP rather than the service IP (if you use that). Either open up both node's IP's on the upstrem boxes to allow zone transfers from these IP's too, or look at options like "transfer-source", "query-source", "notify-source". There may be others. (as per bz#680748). 

One last thing is, point named at the new cluster config file in the /etc/sysconfig/named file:

OPTIONS="-c /data/named/etc/named.conf"

Test the cluster.conf, propagate and start if necessary (as per the dhcpd example). Check /var/log/messages for issues.

Add the service IP to both nodes hosts file and local DNS

10.1.10.26 bldg1cludns bldg1cludns.lan

If it all works you may want to point the nodes at this shared IP in their resolv.conf (seems to work for me) and in the dhcp server setup use the commented line that will direct the client machines to use the clustered DNS service.

Testing and Fence Testing
You should probably at this time (as things aren't too complex yet), test that the two services will run on either node. "clusvcadm -r named" will relocate the named service to the other node.

To test fencing I used "halt -f" on each node to check that the other node would fence it. I sat next to the machine and watched the power switch turn the dead machine off and on. To test the backup fence method I just changed the order in the node config of the fence methods. I used this to test my DRAC backup fencing. 

Clustered CUPS
Cups actually has a built in clustering ability. I don't use it however. Personally I disliked the idea of two different cups invocations hitting the printers and also having to maintain two cups installs. Also I like to be able to hit a cups server and know where my print job went through. So I preferred to use cluster suite to manage cups.

There is no RA for cups (I'm guessing they think you'll use the built in stuff). 

Similar to all the other services I've looked at so far:

On each node: 

/sbin/chkconfig cups off
mkdir /data/cups 
/etc/init.d/cups stop

On one node,  

/sbin/lvcreate --size 2G --name lv00cups cluvg00
mkfs -t gfs2 -p lock_dlm -j 2 -t bldg1ux01clu:cups /dev/cluvg00/lv00cups

On each node update /etc/fstab and

mount /data/cups

On one node: 

cd /
tar cpvf - etc/cups var/spool/cups var/cache/cups | (cd /data/cups; tar xpvf -)

This service isn't so easy to direct to a new config file as it has not ability to pass this in via a sysconfig file. So instead of hacking the init.d files what I did on both nodes was:

cd /etc/cups/
mv cupsd.conf cupsd.conf.org
ln -s /data/cups/etc/cups/cupsd.conf cupsd.conf



I then edited /data/cups/etc/cups/cupsd.conf, the main paramters to change for the cluster in my case were:


Listen 10.1.10.27:631
ServerRoot /data/cups/etc/cups
RequestRoot /data/cups/var/spool/cups
TempDir /data/cups/var/spool/cups/tmp
CacheDir /data/cups/var/cache

I edited cluster.conf again, a new resource :

<ip address="10.1.10.27" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00cups" fstype="gfs2" mountpoint="/data/cups" name="cupsfs" options="acl"/>

Then a new service:

<service autostart="1" domain="bldg1ux01B" exclusive="0" name="cups" recovery="relocate">
<script file="/etc/init.d/cups" name="cups"/>
<ip ref="10.1.10.27"/>
<clusterfs ref="cupsfs"/>
</service>

I decided to put this one on node 2 by default.

Add this service to both nodes hosts files and local DNS:

10.1.10.27 bldg1clucups bldg1clucups.lan

One other minor wrinkle is that each node needs to be able to print even if it doesn't currently have a running cups. This is because we will soon have a running Clustered Samba on here and it will need to print from both nodes. So we must direct cups client apps to the service IP (otherwise they'll look for a local domain socket). To do this add a line to /etc/cups/client.conf:

ServerName bldg1clucups

This has one difference in behaviour from the standard domain socket normally used for this comms, in that it doesn't display printers discovered via browsing. Personally I find this an advantage when we come to samba, as it won't share out non-local printers.

Bump the ver, verify the config and push it. Start the service if necessary. Examine logs if it doesn't start. Add a printer and test it prints from either node (test with lpstat first if you like). Shift the service to the other node and test it still works.

Clustered HTTP
Now a very common case a clustered HTTP service, so we finally get to use a Resource Agent.

First though I found a bug in the default http install that makes it search all the users home dirs. If you have the automounter this is obviously very slow. So my first step is:

yum remove mod_dnssd gnome-user-share

and this fixes.

Also the apache RA didn't work originally, so make sure you are up to date.

Then the usual deal:

On both nodes:

/sbin/chkconfig httpd off
mkdir /data/httpd

On one node, 
/sbin/lvcreate --size 2G --name lv00httpd cluvg00
/sbin/mkfs -t gfs2 -p lock_dlm -j 2 -t bldg1ux01clu:httpd /dev/cluvg00/lv00httpd
Update /etc/fstab

On one node
cd /
tar cvpf - etc/httpd var/www | (cd /data/httpd; tar xpvf - )

I needed to fix relative links under /data/httpd/var/www/icons and /data/httpd/etc/httpd and just made them specific e.g 

Under /data/httpd/var/www/icons I changed :
poweredby.png -> /usr/share/pixmaps/poweredby.png

Resource section again:
<ip address="10.1.10.28" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00httpd" fstype="gfs2" mountpoint="/data/httpd" name="httpdfs" options="acl"/>

and a service

<service autostart="1" domain="bldg1ux01B" exclusive="0" name="httpd" recovery="relocate">
<clusterfs ref="httpdfs"/>
<ip ref="10.1.10.28"/>
<apache config_file="conf/httpd.conf" name="httpd" server_root="/data/httpd/etc/httpd" shutdown_wait="10"/>
</service>

Bump the ver, verify the config and push it. Start the service if necessary. Examine logs if it doesn't start.

Add to hosts and DNS
10.1.10.27 bldg1cluhttp bldg1cluhttp.lan



No comments:

Post a Comment