Tuesday, July 27, 2010

Setting Up A Highly Available NFS Server

In this tutorial I will describe how to set up a highly available NFS server that can be used as storage solution for other high-availability services like, for example, a cluster of web servers that are being loadbalanced. If you have a web server cluster with two or more nodes that serve the same web site(s), than these nodes must access the same pool of data so that every node serves the same data, no matter if the loadbalancer directs the user to node 1 or node n. This can be achieved with an NFS share on an NFS server that all web server nodes (the NFS clients) can access.
As we do not want the NFS server to become another "Single Point of Failure", we have to make it highly available. In fact, in this tutorial I will create two NFS servers that mirror their data to each other in realtime using DRBD and that monitor each other using heartbeat, and if one NFS server fails, the other takes over silently. To the outside (e.g. the web server nodes) these two NFS servers will appear as a single NFS server.
In this setup I will use Debian Sarge (3.1) for the two NFS servers as well as for the NFS client (which represents a node of the web server cluster).
I want to say first that this is not the only way of setting up such a system. There are many ways of achieving this goal but this is the way I take. I do not issue any guarantee that this will work for you!

1 My Setup

In this document I use the following systems:
  • NFS server 1: server1.example.com, IP address:; I will refer to this one asserver1.
  • NFS server 2: server2.example.com, IP address:; I will refer to this one asserver2.
  • Virtual IP address: I use as the virtual IP address that represents the NFS cluster to the outside.
  • NFS client (e.g. a node from the web server cluster): client.example.com, IP address:; I will refer to the NFS client asclient.
  • The /data directory will be mirrored by DRBD between server1 and server2. It will contain the NFS share /data/export.

2 Basic Installation Of server1 and server2

First we set up two basic Debian systems for server1 andserver2. You can do it as outlined on the first two pages of this tutorial: http://www.howtoforge.com/perfect_setup_debian_sarge. As hostname, you enterserver1 and server2 respectively, and as domain you enter example.com.
Regarding the partitioning, I use the following partition scheme:
/dev/sda1 -- 100 MB /boot (primary, ext3, Bootable flag: on)
/dev/sda5 -- 5000 MB / (logical, ext3)
/dev/sda6 -- 1000 MB swap (logical)
/dev/sda7 -- 150 MB unmounted (logical, ext3)
(will contain DRBD's meta data)
/dev/sda8 -- 26 GB unmounted (logical, ext3) 
(will contain the /data directory)

You can vary the sizes of the partitions depending on your hard disk size, and the names of your partition might also vary, depending on your hardware (e.g. you might have/dev/hda1 instead of /dev/sda1 and so on). However, it is important that /dev/sda7 has a little more than 128 MB because we will use this partition for DRBD's meta data which uses 128 MB. Also, make sure /dev/sda7 as well as /dev/sda8 are identical in size on server1 andserver2, and please do not mount them (when the installer asks you:
No mount point is assigned for the ext3 file system in partition #7 of SCSI1 (0,0,0) (sda).
Do you want to return to the partitioning menu?

please answer No)! /dev/sda8 is going to be our data partition (i.e., our NFS share).
After the basic installation make sure that you giveserver1 and server2 static IP addresses (server1:, as described at the beginning of http://www.howtoforge.com/perfect_setup_debian_sarge_p3).
Afterwards, you should check /etc/fstab on both systems. Mine looks like this on both systems:
# /etc/fstab: static file system information.
   proc            /proc           proc    defaults        0       0
   /dev/sda5       /               ext3    defaults,errors=remount-ro 0       1
   /dev/sda1       /boot           ext3    defaults        0       2
   /dev/sda6       none            swap    sw              0       0
   /dev/hdc        /media/cdrom0   iso9660 ro,user,noauto  0       0
   /dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0
If you find that yours looks like this, for example:
# /etc/fstab: static file system information.
   proc            /proc           proc    defaults        0       0
   /dev/hda5       /               ext3    defaults,errors=remount-ro 0       1
   /dev/hda1       /boot           ext3    defaults        0       2
   /dev/hda6       none            swap    sw              0       0
   /dev/hdc        /media/cdrom0   iso9660 ro,user,noauto  0       0
   /dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0
then please make sure you use /dev/hda instead of/dev/sda in the following configuration files. Also make sure that /dev/sda7 (or /dev/hda7) and /dev/sda8 (or/dev/hda8...) are not listed in /etc/fstab!

3 Synchronize System Time

It's important that both server1 and server2 have the same system time. Therefore we install an NTP client on both:
apt-get install ntp ntpdate
Afterwards you can check that both have the same time by running

4 Install NFS Server

Next we install the NFS server on both server1 andserver2:
apt-get install nfs-kernel-server
Then we remove the system bootup links for NFS because NFS will be started and controlled by heartbeat in our setup:
update-rc.d -f nfs-kernel-server remove
update-rc.d -f nfs-common remove

We want to export the directory /data/export (i.e., this will be our NFS share that our web server cluster nodes will use to serve web content), so we edit /etc/exportson server1 and server2. It should contain only the following line:
This means that /data/export will be accessible by all systems from the 192.168.0.x subnet. You can limit access to a single system by using192.168.0.100/ instead of192.168.0.0/, for example. See
man 5 exports
to learn more about this.
Later in this tutorial we will create /data/exports on our empty (and still unmounted!) partition /dev/sda8.

5 Install DRBD

Next we install DRBD on both server1 and server2:
apt-get install kernel-headers-2.6.8-2-386drbd0.7-module-source drbd0.7-utils
cd /usr/src/
tar xvfz drbd0.7.tar.gz
cd modules/drbd/drbd
make install

Then edit /etc/drbd.conf on server1 and server2. It must be identical on both systems and looks like this:
resource r0 {
   protocol C;
   incon-degr-cmd "halt -f";
   startup {
   degr-wfc-timeout 120;    # 2 minutes.
   disk {
   on-io-error   detach;
   net {
   syncer {
   rate 10M;
   group 1;
   al-extents 257;
   on server1 {                # ** EDIT ** the hostname of server 1 (uname -n)
   device     /dev/drbd0;        #
   disk       /dev/sda8;         # ** EDIT ** data partition on server 1
   address; # ** EDIT ** IP address on server 1
   meta-disk  /dev/sda7[0];      # ** EDIT ** 128MB partition for DRBD on server 1
   on server2 {                # ** EDIT ** the hostname of server 2 (uname -n)
   device    /dev/drbd0;         #
   disk      /dev/sda8;          # ** EDIT ** data partition on server 2
   address;  # ** EDIT ** IP address on server 2
   meta-disk /dev/sda7[0];       # ** EDIT ** 128MB partition for DRBD on server 2
As resource name you can use whatever you like. Here it's r0. Please make sure you put the correct hostnames of server1 and server2 into /etc/drbd.conf. DRBD expects the hostnames as they are shown by the command

uname -n
If you have set server1 and server2 respectively as hostnames during the basic Debian installation, then the output of uname -n should be server1 and server2.
Also make sure you replace the IP addresses and the disks appropriately. If you use /dev/hda instead of/dev/sda, please put /dev/hda8 instead of /dev/sda8into /etc/drbd.conf (the same goes for the meta-diskwhere DRBD stores its meta data). /dev/sda8 (or/dev/hda8...) will be used as our NFS share later on.

6 Configure DRBD

Now we load the DRBD kernel module on both server1and server2. We need to do this only now because afterwards it will be loaded by the DRBD init script.
modprobe drbd
Let's configure DRBD:
drbdadm up all
cat /proc/drbd

The last command should show something like this (on both server1 and server2):
version: 0.7.10 (api:77/proto:74)
   SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
   0: cs:Connected st:Secondary/Secondary ld:Inconsistent
   ns:0 nr:0 dw:0 dr:0 al:0 bm:1548 lo:0 pe:0 ua:0 ap:0
   1: cs:Unconfigured
You see that both NFS servers say that they are secondary and that the data is inconsistant. This is because no initial sync has been made yet.
I want to make server1 the primary NFS server andserver2 the "hot-standby", If server1 fails, server2takes over, and if server1 comes back then all data that has changed in the meantime is mirrored back fromserver2 to server1 so that data is always consistent.
This next step has to be done only on server1!
drbdadm -- --do-what-I-say primary all
Now we start the initial sync between server1 andserver2 so that the data on both servers becomes consistent. On server1, we do this:
drbdadm -- connect all
The initial sync is going to take a few hours (depending on the size of /dev/sda8 (/dev/hda8...)) so please be patient.
You can see the progress of the initial sync like this onserver1 or server2:

cat /proc/drbd
The output should look like this:
version: 0.7.10 (api:77/proto:74)
   SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
   0: cs:SyncSource st:Primary/Secondary ld:Consistent
   ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0
   [==========>.........] sync'ed: 53.1% (11606/24733)M
   finish: 1:14:16 speed: 2,644 (2,204) K/sec
   1: cs:Unconfigured
When the initial sync is finished, the output should look like this:
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
   0: cs:Connected st:Primary/Secondary ld:Consistent
   ns:37139 nr:0 dw:0 dr:49035 al:0 bm:6 lo:0 pe:0 ua:0 ap:0
   1: cs:Unconfigured

7 Some Further NFS Configuration

NFS stores some important information (e.g. information about file locks, etc.) in /var/lib/nfs. Now what happens if server1 goes down? server2 takes over, but its information in /var/lib/nfs will be different from the information in server1's /var/lib/nfs directory. Therefore we do some tweaking so that these details will be stored on our /data partition (/dev/sda8 or/dev/hda8...) which is mirrored by DRBD betweenserver1 and server2. So if server1 goes down server2can use the NFS details of server1.
mkdir /data
mount -t ext3 /dev/drbd0 /data
mv /var/lib/nfs/ /data/
ln -s /data/nfs/ /var/lib/nfs
mkdir /data/export
umount /data

rm -fr /var/lib/nfs/
ln -s /data/nfs/ /var/lib/nfs

8 Install And Configure heartbeat

heartbeat is the control instance of this whole setup. It is going to be installed on server1 and server2, and it monitors the other server. For example, if server1 goes down, heartbeat on server2 detects this and makesserver2 take over. heartbeat also starts and stops the NFS server on both server1 and server2. It also provides NFS as a virtual service via the IP address192.168.0.174 so that the web server cluster nodes see only one NFS server.
First we install heartbeat:
apt-get install heartbeat
Now we have to create three configuration files forheartbeat. They must be identical on server1 andserver2!
logfacility     local0
   keepalive 2
   #deadtime 30 # USE THIS!!!
   deadtime 10
   bcast   eth0
   node server1 server2
As nodenames we must use the output of uname -n onserver1 and server2.
server1  IPaddr:: drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 nfs-kernel-server
The first word is the output of uname -n on server1, no matter if you create the file on server1 or server2! AfterIPaddr we put our virtual IP address, and after drbddisk we use the resource name of our DRBD resource which is r0 here (remember, that is the resource name we use in /etc/drbd.conf - if you use another one, you must use it here, too).
auth 3
   3 md5 somerandomstring
somerandomstring is a password which the twoheartbeat daemons on server1 and server2 use to authenticate against each other. Use your own string here. You have the choice between three authentication mechanisms. I use md5 as it is the most secure one.
/etc/heartbeat/authkeys should be readable by root only, therefore we do this:
chmod 600 /etc/heartbeat/authkeys
Finally we start DRBD and heartbeat on server1 andserver2:
/etc/init.d/drbd start
/etc/init.d/heartbeat start

9 First Tests

Now we can do our first tests. On server1, run
In the output, the virtual IP address show up:
eth0      Link encap:Ethernet  HWaddr 00:0C:29:A1:C5:9B
   inet addr:  Bcast:  Mask:
   inet6 addr: fe80::20c:29ff:fea1:c59b/64 Scope:Link
   RX packets:18992 errors:0 dropped:0 overruns:0 frame:0
   TX packets:24816 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:2735887 (2.6 MiB)  TX bytes:28119087 (26.8 MiB)
   Interrupt:177 Base address:0x1400
   eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:A1:C5:9B
   inet addr:  Bcast:  Mask:
   Interrupt:177 Base address:0x1400
   lo        Link encap:Local Loopback
   inet addr:  Mask:
   inet6 addr: ::1/128 Scope:Host
   UP LOOPBACK RUNNING  MTU:16436  Metric:1
   RX packets:71 errors:0 dropped:0 overruns:0 frame:0
   TX packets:71 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:0
   RX bytes:5178 (5.0 KiB)  TX bytes:5178 (5.0 KiB)
Also, run
df -h
on server1. You should see /data listed there now:
Filesystem            Size  Used Avail Use% Mounted on
   /dev/sda5             4.6G  430M  4.0G  10% /
   tmpfs                 126M     0  126M   0% /dev/shm
   /dev/sda1              89M   11M   74M  13% /boot
   /dev/drbd0             24G   33M   23G   1% /data
If you do the same
df -h

on server2, you shouldn't see and /data.
Now we create a test file in /data/export on server1and then simulate a server failure of server1 (by stopping heartbeat):
touch /data/export/test1
/etc/init.d/heartbeat stop

If you run ifconfig and df -h on server2 now, you should see the IP address and the /datapartition, and
ls -l /data/export
should list the file test1 which you created on server1before. So it has been mirrored to server2!
Now we create another test file on server2 and see if it gets mirrored to server1 when it comes up again:
touch /data/export/test2
/etc/init.d/heartbeat start
(Wait a few seconds.)
df -h
ls -l /data/export

You should see and /data again onserver1 which means it has taken over again (because we defined it as primary), and you should also see the file/data/export/test2!

10 Configure The NFS Client

Now we install NFS on our client (
apt-get install nfs-common
Next we create the /data directory and mount our NFS share into it:
mkdir /data
mount /data is the virtual IP address we configured before. You must make sure that the forward and the reverse DNS record for client.example.com match each other, otherwise you get a "Permission denied" error on the client, and on the server you'll find this in/var/log/syslog:
#Mar  2 04:19:09 localhost rpc.mountd: Fake hostname localhost for - forward lookup doesn't match reverse
If you do not have proper DNS records (or do not have a DNS server for your local network) you must change this now, otherwise you cannot mount the NFS share!
If it works you can now create further test files in /dataon the client and then simulate failures of server1 andserver2 (but not both at a time!) and check if the test files are replicated. On the client you shouldn't notice at all if server1 or server2 fails - the data in the /datadirectory should always be available (unless server1 andserver2 fail at the same time...).
To unmount the /data directory, run
umount /data
If you want to automatically mount the NFS share at boot time, put the following line into /etc/fstab:  /data    nfs          rw            0    0