Realtime data mirroring under Linux

ArticleCategory: [Choose a category for your article]

System Administration

AuthorImage:[Here we need a little image from you]

[Photo of the Author]

TranslationInfo:[Author and translation history]

original in en Atif Ghaffar 

AboutTheAuthor:[A small biography about the author]

Atif is a chameleon. He changes his roles, from System Administrator, to programmer, to teacher, to project manager, to whatever is required to get the job done.
Atif thinks that he owes a lot to Linux, the open-source community and projects for being his teacher.
More about him can be found at his homepage

Abstract:[Here you write a little summary]

In this article we will explore real-time data replication under Linux without using expensive SANs (Storage Area Network, e.g GFS) or other Network Block devices.
We will use FAM and IMON for our replication system.
FAM (File Alteration Monitor) and IMON (Inode Monitor) have been developed by SGI originally for IRIX.

The folks at SGI have been very cool to port it to Linux and make it open source.

When cost is not an issue, I would go for a GFS (Global File System) and SAN based solution, but when cost is a factor, and data sharing is necessary I am not left with a lot of choices.
I have a few choices to pick from. In this article we will discuss them and see what are the advantages/disadvantages.

ArticleIllustration:[This is the title picture for your article]

[Illustration]

ArticleBody:[The article body]

Why replicate instead of sharing?

Aren't fileservers supposed to make data available to clients?
Yes they are.

If we use a file server that shares files over NFS or SMB etc, then we have a bottle neck and a Single Point Of Failure.

If we share data over GFS with a shared storage (SAN or MultiChannel SCSI), we have the Storage Box as the Single Point Of Failure and its very expensive to set up a system with that configuration.

We can use NBD (Network Block Devices) to set up a network mirror, but I am not very comfortable with that. NBDs have their limitations, are difficult to setup and manage and are just too much bother, when all you need is to replicate a few webserver data across a few webservers.

Keeping it simple

Ok, lets try replicating.
Here is one scenario
You have 2 webservers, one main server and the other as the backup.
You make all changes on the master machine and rsync the changes to the second machine.
Simple?
But how to automate it? Your users will FTP to master machine multiple times a day. What will happen if there is a failure on the master server and the back server takes over?
Easy. I have the answer to that. They will not see the changes they made, and will be pretty angry. :)
Well you can run "rsync -av --delete source destination" from CRON every 5 seconds, but then your machine will not be really useful for anything else, would it?


Here is another scenario
You have one FTP server to upload the data and
six webserver that respond in a round robin fashion.
So the data on each machine should be the same. You can get away with NFS for sometime if you are lucky, but you wont for long.

Now, what should be done?
I think the answer is "copy the data to the webservers only if there is a change to the files", and if there is no change to the data, don't do anything.

This is exactly what we will do using "fam".

Keeping it smart

So how do we know there is a change on the files?
Here is one answer that I would expect from a M$ Windows developer.
We can search the directory we are monitoring every few seconds and compare its timestamps and size with the version we had in cache.
Yeah right

Polling: looking for files timestamps/size and comparing with the older version is expensive.
Imagine if your box is running "ls -lR /somedirectory" every 5 seconds on your webserver :)

The elegant way would be for the file to tell us when it has changed, so we can take an action upon it.
This is exactly what "IMON" will do for us.

What is FAM?

source: http://oss.sgi.com/projects/fam/faq.html
fam, the File Alteration Monitor, provides an API which applications can use to be notified when specific files or directories are changed.
FAM comes in two parts: fam, the daemon which listens for requests and delivers notification, and libfam, a library which client applications can use to communicate with FAM.
If the monitored files are mounted from a remote host, the local fam will attempt to contact fam on the remote host, and will pass the requests on to the remote fam.
fam can also notify its clients when a file starts and stops execution. (The IRIX Interactive Desktop uses this to change a program's icon while it's running, for example.)
fam was originally written for IRIX in 1989 by Bruce Karsh, and was rewritten in 1995 by Bob Miller. This open-source release of fam builds and runs on both Linux and IRIX, and is the same fam that will be included with IRIX 6.5.8.

What is IMON?

source: http://oss.sgi.com/projects/fam/faq.html
imon, the Inode Monitor, is the part of the kernel that tells fam when files have changed. When applications tell fam they're interested in files or directories, fam passes that interest on to imon. When file operations are performed on files monitored by imon, the kernel tells imon; imon tells fam, and fam notifies the applications which are interested in the files.
imon was originally written for the IRIX kernel in 1989 by Wiltse Carpenter; the Linux port was done by Roger Chickering. The Linux implementation in the imon kernel patch is similar to the IRIX implementation in most ways, but it hooks into the kernel filesystem code differently.

Installing FAM and IMON

FAM and IMON are both available from SGI's website. See Resources below.
IMON is a patch that you can apply to your kernel. This will add possibility for your kernel to monitor Inodes.
To patch the kernel, cd to your kernel sources directory.
and apply the patch
cd /usr/src/linux
patch -pi < patchfile

then run make config or make menuconfig and select when you are asked for
Inode Monitor (imon) support (EXPERIMENTAL)
in the FileSystems section
compile the kernel as usual and reboot (sorry).
Compiling FAM itself is pretty simple.
cd to the fam sources directory and run
./configure && make all install
Voila its installed.

Next we will install a Perl module called SGI::FAM, so we can write our event handler in perl.

Installing SGI::FAM Perl module

You didn't really think, I would ask you to code C/C++. Did you?
Well I don't know about you, but I am too lazy and impatient, so I will write my replication handler in Perl

Download and install SGI::FAM by Jesse N. Glick
To install these modules, simply run the CPAN module
perl -MCPAN -e shell
install SGI::FAM
this should install SGI::FAM and all prerequisite modules.

Replicating with fam_mirror

fam_mirror is a script that I wrote to automate the replication.
you can view or download it here.
You can edit it and
change $replicaHosts to meet your hosts,
change $rsh with whatever command you can run from one machine to another
and the same with $rsync.

So back to scenario 1
2 machines running as webservers (web1, web2). 1 of them as master (web1) and the other as slave (web2).
Primary FTP server is (web1).
web2 does not run FTP service at all. (otherwise users may try to write to files even when the system is in backup mode)

The web document root on both machines is /var/www
setup rsh or ssh on both machines. web2 should allow web1 to run remote commands without a password. I usually add my ssh_key to the authorized_keys of replica Hosts.
rsync all data from web1 to web2
rsync -avz /var/www/ web2:/var/www/
Edit fam_mirror and change @replicaHosts to
@replicaHosts=qw(web2)
run fam_mirror on web1.
fam_mirror /var/www &
and then make changes to files on web1. All changes will also be written to web2.

Now to scenario 2 (A farm of webservers)
Hosts "linuxweb1", "linuxweb2", "linuxweb3" and "linuxweb4" runs as webservers
Host "linuxftp1" runs as ftp server (main fileserver)
web hosts do not allow FTP to users.
install fam, imon, SGI::FAM and fam_mirror on host "linuxftp1"
Setup rsh or ssh between the machines.
hosts linuxweb[1-4] should allow linuxftp1 to run remote commands without prompting for a password.
Edit fam_mirror and set @replicaHosts to
@replicaHosts=qw(linuxweb1 linuxweb2 linuxweb3 linuxweb4);
Change $rsh and $rsync if neccessary. Assuming that web document root is /var/www on all machines.
run on linuxftp1
INIT_MIRROR=1 fam_mirror /var/www &

Now all changes on linuxftp1 should be visible on linuxweb[1-4]

Resources

Known Problems

I found that the solution that I have presented here has a little problem: Its actually not working with large directories. (directory with 4-5 thousand sub directories). The kernel is complaining about kmalloc etc.
I am trying to get this sorted out. Once I have this sorted out, then I will add the information in the article.
Let me know if you are already aware of a solution to this problem.