That's mostly it. I'd recommend a cron job to run monthly to check that the two node's DRBD's blocks are fully in sync. I thought originally something like this would work:
# Check DRBD integrity every third Sunday of the month
38 6 15-21 * 7 /sbin/drbdadm verify all
However this will OR the two options, so will run every Sunday and every day 15-21. So instead I have:
38 6 15-21 * * /usr/local/sbin/drbdverifysun >/dev/null 2>&1
and then this script (drbdverifysun) has:
# Only run the verify on a Sunday, cron is already restricting date range
if [ "`date +%a`" = "Sun" ] ; then
/sbin/drbdadm verify all
fi
However this will OR the two options, so will run every Sunday and every day 15-21. So instead I have:
38 6 15-21 * * /usr/local/sbin/drbdverifysun >/dev/null 2>&1
and then this script (drbdverifysun) has:
#!/bin/bash
if [ "`date +%a`" = "Sun" ] ; then
/sbin/drbdadm verify all
fi
, to cause it only to run if a Sunday.
And also probably an occasional cron job to check that the fence device is still pingable by the nodes would be a good idea.
I'd also make sure that you can run all the services on either node. You don't want to discover that doesn't work when a node fails! You can move them around with clusvcadm or in web console lucci. Which I haven't needed or used up to now but is useful to monitor services or to move services around.
If you want to use iptables on the nodes (and I do), I'd make life easy for yourself and fully open up the bond1 (the back to back connection) interface to ACCEPT on both nodes. There is a lot of multicasting etc going on and you'll just make work for yourself trying to see what needs opened up. I'd just tie down the services allowed on the main network interface.
One thing missing from my cluster setup so far is kerberized NFSv4. I will hopefully get a chance to revisit that.
I'd also make sure that you can run all the services on either node. You don't want to discover that doesn't work when a node fails! You can move them around with clusvcadm or in web console lucci. Which I haven't needed or used up to now but is useful to monitor services or to move services around.
If you want to use iptables on the nodes (and I do), I'd make life easy for yourself and fully open up the bond1 (the back to back connection) interface to ACCEPT on both nodes. There is a lot of multicasting etc going on and you'll just make work for yourself trying to see what needs opened up. I'd just tie down the services allowed on the main network interface.
One thing missing from my cluster setup so far is kerberized NFSv4. I will hopefully get a chance to revisit that.
Here are the final key files at the end. First my cluster.conf:
<?xml version="1.0"?>
<cluster config_version="48" name="bldg1ux01clu">
<cman expected_votes="1" two_node="1"/>
<clusternodes>
<clusternode name="bldg1ux01n1i" nodeid="1" votes="1">
<fence>
<method name="apc7920-dual">
<device action="off" name="apc7920" port="1"/>
<device action="off" name="apc7920" port="2"/>
<device action="on" name="apc7920" port="1"/>
<device action="on" name="apc7920" port="2"/>
</method>
<method name="bldg1ux01n1drac">
<device name="bldg1ux01n1drac"/>
</method>
</fence>
</clusternode>
<clusternode name="bldg1ux01n2i" nodeid="2" votes="1">
<fence>
<method name="apc7920-dual">
<device action="off" name="apc7920" port="3"/>
<device action="off" name="apc7920" port="4"/>
<device action="on" name="apc7920" port="3"/>
<device action="on" name="apc7920" port="4"/>
</method>
<method name="bldg1ux01n2drac">
<device name="bldg1ux01n2drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<failoverdomains>
<failoverdomain name="bldg1ux01A" ordered="1" restricted="1">
<failoverdomainnode name="bldg1ux01n1i" priority="1"/>
<failoverdomainnode name="bldg1ux01n2i" priority="2"/>
</failoverdomain>
<failoverdomain name="bldg1ux01B" ordered="1" restricted="1">
<failoverdomainnode name="bldg1ux01n1i" priority="2"/>
<failoverdomainnode name="bldg1ux01n2i" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<nfsexport name="bldg1cluexports"/>
<ip address="10.1.10.25" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00dhcpd" fstype="gfs2" mountpoint="/data/dhcpd" name="dhcpdfs" options="acl"/>
<ip address="10.1.10.26" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00named" fstype="gfs2" mountpoint="/data/named" name="namedfs" options="acl"/>
<ip address="10.1.10.27" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00cups" fstype="gfs2" mountpoint="/data/cups" name="cupsfs" options="acl"/>
<ip address="10.1.10.28" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00httpd" fstype="gfs2" mountpoint="/data/httpd" name="httpdfs" options="acl"/>
<ip address="10.1.10.29" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00projects" fstype="gfs2" mountpoint="/data/projects" name="projectsfs" options="acl"/>
<nfsclient name="nfsdprojects" options="rw" target="10.0.0.0/8"/>
<ip address="10.1.10.30" monitor_link="1"/>
<clusterfs device="/dev/cluvg00/lv00home" fstype="gfs2" mountpoint="/data/home" name="homefs" options="acl"/>
<nfsclient name="nfsdhome" options="rw" target="10.0.0.0/8"/>
</resources>
<service autostart="1" domain="bldg1ux01A" exclusive="0" name="dhcpd" recovery="relocate">
<script file="/etc/init.d/dhcpd" name="dhcpd"/>
<ip ref="10.1.10.25"/>
<clusterfs ref="dhcpdfs"/>
</service>
<service autostart="1" domain="bldg1ux01A" exclusive="0" name="named" recovery="relocate">
<clusterfs ref="namedfs"/>
<ip ref="10.1.10.26"/>
<script file="/etc/init.d/named" name="named"/>
</service>
<service autostart="1" domain="bldg1ux01B" exclusive="0" name="cups" recovery="relocate">
<script file="/etc/init.d/cups" name="cups"/>
<ip ref="10.1.10.27"/>
<clusterfs ref="cupsfs"/>
</service>
<service autostart="1" domain="bldg1ux01B" exclusive="0" name="httpd" recovery="relocate">
<clusterfs ref="httpdfs"/>
<clusterfs ref="projectsfs"/>
<ip ref="10.1.10.28"/>
<apache config_file="conf/httpd.conf" name="httpd" server_root="/data/httpd/etc/httpd" shutdown_wait="10"/>
</service>
<service autostart="1" domain="bldg1ux01A" exclusive="0" name="nfsdprojects" recovery="relocate">
<ip ref="10.1.10.29"/>
<clusterfs ref="projectsfs">
<nfsexport ref="bldg1cluexports">
<nfsclient ref="nfsdprojects"/>
</nfsexport>
</clusterfs>
</service>
<service autostart="1" domain="bldg1ux01B" exclusive="0" name="nfsdhome" recovery="relocate">
<ip ref="10.1.10.30"/>
<clusterfs ref="homefs">
<nfsexport ref="bldg1cluexports">
<nfsclient ref="nfsdhome"/>
</nfsexport>
</clusterfs>
</service>
</rm>
<fencedevices>
<fencedevice agent="fence_apc" ipaddr="192.168.2.3" login="apc" name="apc7920" passwd="securepassword"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.1.10.22" login="fence" name="bldg1ux01n1drac" passwd="securepassword"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.1.10.23" login="fence" name="bldg1ux01n2drac" passwd="securepassword"/>
</fencedevices>
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
</cluster>
My fstab file:
#
# /etc/fstab
# Created by anaconda on Thu Jan 20 17:37:26 2011
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=2ca89192-0dfa-45ab-972d-9fd15e5c6414 / ext4 defaults 1 1
UUID=7ab69be7-52fd-4f08-b08b-f9aea7c7ef70 swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/cluvg00/lv00dhcpd /data/dhcpd gfs2 acl 0 0
/dev/cluvg00/lv00named /data/named gfs2 acl 0 0
/dev/cluvg00/lv00cups /data/cups gfs2 acl 0 0
/dev/cluvg00/lv00httpd /data/httpd gfs2 acl 0 0
/dev/cluvg00/lv00projects /data/projects gfs2 acl 0 0
/dev/cluvg00/lv00home /data/home gfs2 acl 0 0
/dev/cluvg00/lv00lclu /data/lclu gfs2 acl 0 0
And /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.1.10.20 bldg1ux01n1.lan bldg1ux01n1
10.1.10.21 bldg1ux01n2.lan bldg1ux01n2
192.168.1.1 bldg1ux01n1i
192.168.1.2 bldg1ux01n2i
192.168.2.1 bldg1ux01n1f
192.168.2.2 bldg1ux01n2f
192.168.2.3 bldg1ux01fd
10.1.10.22 bldg1ux01n1drac bldg1ux01n1drac.lan.
10.1.10.23 bldg1ux01n2drac bldg1ux01n2drac.lan.
10.1.10.25 bldg1cludhcp bldg1cludhcp.lan.
10.1.10.26 bldg1cludns bldg1cludns.lan.
10.1.10.27 bldg1clucups bldg1clucups.lan.
10.1.10.28 bldg1cluhttp bldg1cluhttp.lan.
10.1.10.29 bldg1clunfsprojects bldg1clunfsprojects.lan.
10.1.10.30 bldg1clunfshome bldg1clunfshome.lan.
10.1.10.32 bldg1clusmbA bldg1clusmbA.lan.
10.1.10.33 bldg1clusmbB bldg1clusmbB.lan.
Well that should be it. I just wanted to write this as I found no single resource online to get all this going. Hopefully this will spare someone out there from having to grub around looking for information the way I had to.
More Power to your Penguins
No comments:
Post a Comment