Friday, 16 December 2011

A RHEL 6/Centos 6 HA Cluster for LAN Services switching to EXT4 (Part 1)

Introduction
Sadly I have had to revisit my cluster configuration to remove GFS2 from my setup. This is for several reasons. Firstly I have recently discovered that you cannot share a filesystem out on NFS and simultaneously use it for local access (e.g Samba or backups). Red Hat do not support you doing this and it can lead to kernel panics and file system corruption. This is to do with a bad interaction between the three locking systems in use (see my notes under the Clustered Samba setup in my original cluster setup articles). Sadly this is what I want to do, so users can share files between Linux and Windows. The RH recommendation of sharing the NFS export in samba seems possible, but the Samba people don't recommend that. Sharing other filesystems (ext3/4) on NFS and local access can cause file corruption if you were to rely on file locking between NFS and local access, but none of our applications or users expect this to work. So I can live with any potential danger of individual file corruption but not filesystem corruption.

Secondly, I have seen a certain amount of instability on GFS2 when not sharing on NFS (just pure local access used for cups). Thirdly, the performance I have seen from GFS2 has been an issue for the sort of workloads I have (home directories and shared project directories). This has been especially true for anything that requires directory traversals (e.g. backups). So perhaps GFS2 isn't ideal for my sorts of workload. Also I guess, I can also save the license for "resilient storage" on RHEL6.

So I have decided to re-implement my cluster using a non clustered filesystem (ext4) and have the mounts failover to the nodes running the service.

Initial Setup
The hardware setup, OS install, NTP setup, partitioning, DRBD, Clustered Logical Volume Manager and Fence setup (on the APC device and DRAC) are identical to my original setup (so please refer back to my original postings).

As I'm switching to a non-clustered filesystem Linbit would recommend using a primary/secondary DRBD. I haven't done this for two reasons. Firstly, I like to manage all my DRBD storage on one drbd device for simplicity (all my cluster storage in one place). Secondly, I'd like to option of switching back to GFS2 in the future (when my issues with it are repaired).

The ext4 Filesystem and Clustering
I have chosen to use ext4 in a failover setup. This means I can't mount it on both nodes simultaneously. So basically what will happen is that when a service comes up it will get demand mount it on that node and the service will come up.

There are issues with this. Firstly there is no protection from double mounting the filesystem on both nodes, doing so will likely corrupt both if you allow this to happen. The cluster itself though will not let this happen in itself when it is managing things. If a service is moved, the cluster will attempt to umount the filesystem from the first node before mounting it onto the second node. If it fails to umount it, the service will be marked "failed". Then the administrator can look at why it failed to umount. The admin can then disable the service again (all you can do from a failed service) but this time the cluster will not check that it has been umounted, so you need to be really sure it is umounted on the previous node before starting it elsewhere.

There was a resource agent that could prevent this nasty double mounting by exclusively locking the Logical Volume. Sadly this hasn't made it into the RH supported or distributed resource agents. It is here however and has been reported to work well (though I haven't tried it):

https://www.redhat.com/archives/cluster-devel/2009-June/msg00065.html

So after all this information and caveats to create the service filesystems all we need to do is the usual (as the GFS2 cluster) e.g.

/sbin/mkfs.ext4 /dev/cluvg00/lv00dhcpd

I would *NOT* add this to fstab, that makes double mounting inevitable, if you were daft enough to make it mount at boot time or very likely as the entry makes mounting too easy. Having to manually type in the paths for mounting these filesystems hopefully gives some extra thinking time to check that it isn't mounted on the other node.

Then like the GFS version of this cluster setup the filesystem is created:


mkdir /data
mkdir /data/dhcpd
mount /dev/mapper/cluvg00-lv00dhcpd /data/dhcpd


Then setup the file tree under here for this service (as in my GFS cluster blog).

Next replacing clustered samba with a failover samba setup.....