Monday, 25 April 2011

Building a RHEL 6/Centos 6 HA Cluster for LAN Services (part 5)

Clustered NFS Server
I'm now going to add clustered NFS services. I'm going to have a shared projects area and an area for homedirs. To provide rudimentary load balancing I'm going serve these from separate nodes by default. Also as these are going to be using the lions share of my storage I'm going to make them quite large.

I don't know, but I suspect that for a large data area where performance is an issue it's probably not a good thing to grow these in small chunks. I'd assume this might lead to LVM fragmentation which may hurt performance.

First step lets bring up the storage as ever (on one node):


/sbin/lvcreate --size 200G --name lv00home cluvg00


/sbin/lvcreate --size 500G --name lv00projects cluvg00


/sbin/mkfs -t gfs2 -p lock_dlm -j 2 -t bldg1ux01clu:home /dev/cluvg00/lv00home


/sbin/mkfs -t gfs2 -p lock_dlm -j 2 -t bldg1ux01clu:projects /dev/cluvg00/lv00projects

Update fstab and mount these up on both nodes.

Also we need to add a parameter to make statd cluster and failover aware (or at least help it with this, or so I'm told). So on both nodes add to /etc/sysconfig/nfs:

STATD_HA_CALLOUT="/usr/sbin/clunfslock"


In the cluster.conf file we need to add a single nfsexport (this ensures the daemons are working) resource and an nfsclient resource for each thing we are exporting.

<ip address="10.1.10.29" monitor_link="1"/>

<clusterfs device="/dev/cluvg00/lv00projects" fstype="gfs2" mountpoint="/data/projects" name="projectsfs" options="acl"/>
<nfsclient name="nfsdprojects" options="rw" target="10.0.0.0/8"/>                                      
<ip address="10.1.10.30" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00home" fstype="gfs2" mountpoint="/data/home" name="homefs" options="acl"/>
<nfsclient name="nfsdhome" options="rw" target="10.0.0.0/8"/>

And then two service definitions for each of these:


<service autostart="1" domain="bldg1ux01A" exclusive="0" name="nfsdprojects" recovery="relocate">
               <ip ref="10.1.10.29"/>
                <clusterfs ref="projectsfs">
                     <nfsexport ref="bldg1cluexports">
                           <nfsclient ref="nfsdprojects"/>                                                                           
                      </nfsexport>
                 </clusterfs>
</service>
 <service autostart="1" domain="bldg1ux01B" exclusive="0" name="nfsdhome" recovery="relocate">
                <ip ref="10.1.10.29"/>
               <clusterfs ref="homefs">
               <nfsexport ref="bldg1cluexports">
                    <nfsclient ref="nfsdhome"/>
             </nfsexport>
      </clusterfs>
</service>


Notice that they are in the different failover domains to direct them to be served by different nodes by default (unless one fails that is).

There is an issue with this and NFSv3 clients. The portmapper replacement in RHEL6, rpcbind, by default it replies to incoming requests using the node IP rather than the service IP. This confuses client firewalls so they fail to mount. The only real work around just now is to fully open up both node IP's on the client Firewalls e.g at the bottom of the client machines /etc/sysconfig/iptables on say RHEL 5


-A RH-Firewall-1-INPUT -s 10.1.10.20 -j ACCEPT
-A RH-Firewall-1-INPUT -s 10.1.10.21 -j ACCEPT


Not great (bz#689589). Sadly the flags on rpcbind that are used for multi-homing , that should help with this don't seem to work when the IP isn't up when rpcbind is started.

Again add to DNS and hosts these service IP's

10.1.10.29   bldg1clunfsprojects bldg1clunfsprojects.lan
10.1.10.30   bldg1clunfshome bldg1clunfshome.lan

Bump the cluster.conf version number, verify the file and propagate and you should be able to mount the exports, either in a hard mount or via the automounter on client machine e.g

mount bldg1clunfshome:/data/home /mnt
mount bldg1clunfsprojects:/data/home /mnt

or

ls /net/bldg1clunfshome/data/home
ls /net/bldg1clunfsprojects/data/projects

Clustered Samba


NOTE: I have recently discovered that Red Hat do not allow you to share a GFS2 filesystem between local access (e.g Samba) and NFS. This is due to a lack of interoperability between local file access locking and NFS file locking (flocks vs plocks). On other filesystems this may lead to file  level corruption, however on GFS2 this may lead to FILESYSTEM level corruption ! And/or kernel panics (flocks, plocks and glocks not getting along, and I'm not making this up!)


So the below notes are fine if you use a filesystem shared out on Samba that isn't shared out on NFS (not as below but a new and different filesystem). Which is very unfortunate, as if you are anything like me it's exactly what you want to do (share files between Windows and Linux).  Or you can share out on samba an NFS mount (what RH say to do) but not recommended by the Samba people. I'm also pretty sure my backup software won't like to backup an NFS mount!


My solution to this that I will document here, was radical surgery. Reimplement my cluster using ext4 failover mounts. I can easily live with not relying on locks working between Samba and NFS (I  never expected this to work), I can't live with filesystem corruption and kernel panics. 


The original documented case here may eventually work once bug #580863 is resolved. If you want to see the rather murky world of Linux file locking this article is great here. I have left the original text unchanged below in the hope this bug gets resolved.




There are two ways of clustering samba. One if to use a failover method in the cluster.conf file. But a new way is to use samba's relatively new built in clustering. This provides load balancing (whereas the failover samba only provides HA). This method is outside the standard cluster.conf, but RH ship it so lets use it.

You need to ensure ctdb package is installed on both nodes.

I'm going to first create a common locking directory, clustered samba uses this to share a locking directory. For my own purposes, outside samba, I also use it to hold locks for cronjobs etc that can run on either node. I'm also going to site my printer click to print drivers for Windows in here.

So for this it's usual deal and setup (with some added setups for the samba areas):


/sbin/lvcreate --size 2G --name lv00lclu cluvg00
/sbin/mkfs -t gfs2 -p lock_dlm -j 2 -t bldg1ux01clu:lclu /dev/cluvg00/lv00lclu 


mkdir /data/lclu/samba
mkdir /data/lclu/samba/ctdb
mkdir /data/lclu/samba/drivers
chmod 775 /data/lclu/samba/drivers
chown root:itstaff /data/lclu/samba/drivers


The "itstaff" group above are the people allowed to add drivers to the server.

I now edit /etc/sysconfig/ctdb on both nodes, the options I have are:



CTDB_RECOVERY_LOCK="/data/lclu/samba/ctdb/.ctdb.lock"
CTDB_PUBLIC_INTERFACE=bond0
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
CTDB_MANAGES_SAMBA=yes
CTDB_SAMBA_CHECK_PORTS="445"
CTDB_MANAGES_WINBIND=no
CTDB_MANAGES_VSFTPD=no
CTDB_MANAGES_HTTPD=no
CTDB_NODES=/etc/ctdb/nodes
CTDB_DEBUGLEVEL=ERR

In /etc/sysconfig/samba I have (both nodes):
# Options to smbd
SMBDOPTIONS="-D -p 445"
# Options to nmbd
NMBDOPTIONS="-D"
# Options for winbindd
WINBINDOPTIONS=""

I prefer using port 445, which should be lighter weight than port 139 (with it's netbios wrapping).

Now I put into  /etc/ctdb/nodes the IP addresses of my private network (clustered samba will use these for internal comms) (both nodes):
192.168.1.1
192.168.1.2

Then in /etc/ctdb/public_addresses I put in the public IP that I want this service to run at, with mask , again on both nodes:
10.1.10.32/24
10.1.10.33/24

Then just my modified /etc/samba/smb.conf:

[global]
        workgroup = MYDOMAIN
        clustering = yes
        netbios name = bldg1clusmb
        max log size = 100
        preserve case = yes
        short preserve case = yes
        security = ADS
        realm = MYDOMAIN.LAN
        password server = dclan.lan
        encrypt passwords = yes
        load printers = yes
        local master = no
        client use spnego = yes
        log level = 1
        printing = cups
        printcap = cups
        cups options = "raw"
        use client driver = no
        printer admin = @itstaff
        map to guest = Bad User
        guest account = guest

[printers]
        printable = yes
        path = /var/spool/samba
        browseable = no
        public = yes
        guest ok = yes
        writable = no
        default devmode = yes

[print$]
        comment = Windows Printer Driver Download Area
        path = /data/lclu/samba/drivers
        browseable = no
        guest ok = yes
        read only = yes
        write list = @itstaff
        force group = +itstaff
        force create mode = 0775
        force directory mode = 0775
                                                                                                                                                      
; Local disk configurations
                                                                                                                                                      
[projects]                                                                                                                                            
        guest ok = no                                                                                                                                 
        writeable = yes                                                                                                                               
        path = /data/projects                                                                                                                         
        force create mode = 0664
        force directory mode = 0775
                                                                                                                                                      
[user]                                                                                                                                                
        guest ok = no                                                                                                                                 
        writeable = yes                                                                                                                               
        path = /data/home                                   

The first thing to note is I don't have a homes share. This is because I use the automounter. Any homedirs I mount from the cluster will go via NFS (as they refer to a service IP (by the name bldg1clunfshome) ), samba mounts via NFS tend not to work very well (due to locking issues) and will be slower. So I have created a "user" share that people can get straight to their homedirs via.

I add the public addresses to DNS and hosts. 
10.1.10.32   bldg1clusmbA bldg1clusmbA.lan
10.1.10.32   bldg1clusmbB bldg1clusmbB.lan

BUT this time we want to add to DNS bldg1clusmb that will point to two A records, one for each service IP address that I'm using for samba e.g

# nslookup bldg1clusmb
Server:         10.1.10.26
Address:        10.1.10.26#53

Name:   bldg1clusmb.lan
Address: 10.1.10.32
Name:   bldg1clusmb.lan
Address: 10.1.10.33

I also have an "netbios name = bldg1clusmb" parameter, cause as I'm using AD I need to join the Samba to AD with the name the clients will refer to it as. But you'll need to start it before joining. 

Stop any samba's running on the nodes and chkconfig them off.

/etc/init.d/smb stop
/sbin/chkconfig smb off

And on both nodes start the ctdb service and chkconfig on:

/etc/init.d/ctdb start
/sbin/chkconfig ctdb on

You can check the status of the ctdb with "ctdb status". It will take a little while to settle but eventually it should return:

Number of nodes:2
pnn:0 192.168.1.1    OK (THIS NODE)
pnn:1 192.168.1.2    OK
Generation:369849912
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

Then you can join to AD or what ever and connect from Windows clients using \\bldg1clusmb and DNS should now round-robin between the two nodes from various clients. 

There is a bit of an issue with printing. Sadly the registry information about printers doesn't get copied between the nodes as yet. My hack around this is to stop the ctdb on one of the nodes. Then install the printers or printer drivers on the server that's up (just using the cluster name \\bldg1clusmb). Then when finished, copy the files ntdrivers.tdb, ntforms.tdb, ntprinters.tdb and the printing directory (and contents) all in /var/lib/samba to the other (down) node's /var/lib/samba. Then restart ctdb on the down node. As they share the driver directory this should now allow both nodes to perform click to print auto client driver installs. Just remember to do this procedure every time you make printer driver changes (or any default settings on these printers for windows). A bit of a hassle, but seems to work.


No comments:

Post a Comment