My motivation for this cluster was plain and simple. I no longer wanted to tolerate being reliant on a single boxes for LAN services. I wanted to sit at home on a Sunday evening when a box fails and not have to head into the office. Though I've spent lots of late nights getting this going, so failing this to some extent, hopefully I'll now get some payback.
Sadly most information on the Internet about RH clustering tends to focus on web services, which is probably the common case (esp if a company is dependant on their website for revenue). However I think the costs of hardware is now such that HA clustering is probably now opening up to the masses, where RAID was a decade or two ago. I wanted HA for more Intranet applications providing LAN services for desktop systems. The services I want to provide are:
- File Services (NFS)
- Printing Services (CUPS)
- DNS Server (named)
- Samba (File and Print)
- Intranet Web Service (HTTP)
As this is effectively a department level server, it can't be too costly. So I don't want to use a SAN box, especially as you move down the ranges of these they start having single points of failure. I'm trying not to have these. I want to use commodity hardware (but decent boxes). And I want to use a supported platform (in my case RHEL 6, lets have the latest and greatest). Though everything here should be relevant for use with Centos 6 (which at the time of writing wasn't out yet) or SL6 (though I haven't check if the rebuild the RH cluster stuff).
As I want a supported platform I'm going to use Red Hat Cluster Suite and not something like pacemaker which RH won't support. I'm going to assume a certain familiarity with RH Clustering in this.
For clustered storage I'm going to use DRBD. Which is often described as "RAID 1 over a network", or simply both nodes keep the data on their local disks and it's constantly replicated. This wasn't originally supported by RH with Cluster Suite (or should we say now, in RH High Availability and Resilient Storage Add-ons) but fortunately RH have now come to an agreement with LINBIT where they will support you using DRBD (and drop back to LINBIT with DRBD issues). Great ! See http://www.linbit.com/en/news/single-news/art/60/31/
But if you want a really low cost solution all these components are available at no charge via Centos (soon) and the free DRBD. Though if you have an existing LINBIT contract for DRBD, apart from getting the manual as a PDF, you also get a recently made available kernel version independent DRBD. This is where the kernel ABI's used are guaranteed not to change during the lifetime of an RH release. This means you can update the kernel without having to have a matching DRBD kernel module. It's just one less hassle, but it's not essential. You just need to make sure you have the correct DRBD when updating. The kernel version independent module may even be available free, I haven't checked.
For my cluster I'm using 2 Dell PowerEdge R610's. I've pushed the boat out a bit on hardware config. I have loaded each machine with 5 x 600GB 10,000 RPM SAS drives with a PERC controller. On each of these machines, I put 4 of these drives into a RAID 10 and set the last one as a hot-spare.
Other than that, I have dual PSU's, 32 GB of RAM and some fast CPU's. I also added in a 10 Gb ethernet card to act as my storage interconnect.
I also purchased an APC Ethernet Power Switch (AP7920) and connected my first machine's PSU's to the first two ports of this and the second machine to the second two ports on this. I also purchased a small dumb switch to connect this devices network to the machines.
Now the server hardware I'm using seems pretty meaty, however clustered filesystems have a certain amount of overhead and so I'm trying to compensate for this to some extent (with RAID 10 using fast drives and 10 Gb interconnects). You may not require the same level of performance as I do, so you could easily drop this down.
The R610 comes with a 4 port Gigabit ethernet (with my added 10Gb port). So I'm going to fully use this. I colour coded my connections so I wouldn't confuse them. My network connections are as follows on both machines:
- Ethernet 1 (eth0) and 10Gb (eth4) : Back to back connections for DRBD and cluster comms
- Ethernet 2 (eth1) connected to the local dumb switch with the fence device connected
- Ethernet 3 (eth2) and 4 (eth3) I'm connecting to the main network.
I also wired these systems DRAC's to the main network switches.
I'm going to use eth2 and eth3 as bonded interfaces. So if you want HA then I'd recommend connecting these to two different switches in your stack or cards if using a chassis switch.