High Availability systems under Linux
ArticleCategory: [Choose a category for your article]
System Administration
AuthorImage:[Here we need a little image form you]
TranslationInfo:[Author and translation history]
original in en Atif Ghaffar
AboutTheAuthor:[A small biography about the author]
Atif is a chameleon. He changes his roles, from System
Administrator, to programmer, to teacher, to project manager, to
whatever is required to get the job done.
Occasionally you can find him programming, or writing documentation
on his laptop while sitting on the toilet seat.
Atif thinks that he owes a lot to Linux and open-source community
and projects for being his teacher.
More about him can be found at his homepage
Abstract:[Here you write a little summary]
When designing a mission critical systems, either during flowcharting
or when building it physically with boxes, cables etc, one
has to ask the following questions :
- How important are the service that will run on these machines to you?
- How many other services are dependent on the service you are
going to run on these machines (think NIS/NFS/DB/LDAP server)?
- What happens if a part of the machine fails. (power supply,
network cable, hardisk etc)?
- What happens when the machine fails completely?
When I ask these questions to myself, I get the same answer most of
the time.
I will get fired :)
On the other hand when I ask my self the question "Will the
Operating System Fail" I alway get this answer.
No. You are not running 32 bit extensions for a 16
bit patch to an 8 bit operating system originally coded
for a 4 bit microprocessor, written by a 2 bit
company, that can't stand 1 bit of competition. (
got this from a .sig)
Now for some serious discussion.
ArticleIllustration:[This is the title picture for your
article]
ArticleBody:[The article body]
Why HA?
Even though I trust Linux blindly, I don't trust the companies
that make the machines, power supply, network cards, motherboards
etc, and I am always afraid that if one of these fail, my system will
be unusable. Hence the service will be unavailable, further more I
will be taking down all the company services even though they are
not directly related to me. For example
- Some service that I don't even know exists , I have nothing to do
with at all, may start misbehaving, because it can not resolve
billingSys106.company.com. hmmmm, let me think what can be the
reason, Oh I was responsible of DNS and decided against the company
regulations to run it on Linux. :)
- Or someone can not use the SAP system, because my LDAP server is
down. Oh wait a sec, didn't I fight 3 months to move the SAP
authentication to LDAP ??
- Or no weenies can log into their Win Workstations. Hey, we just
have a Unix box down, why should your NT setup be disturbed with
that. Oh! last time when nobody was watching, I moved the NT domain
controller to Linux+Samba with authentication to LDAP.
The same of course can happen on a Windows Server, but there won't be a
lot of hoo haa about it because the dummies are used to it, but I
warn you: If this will happen on a Linux box, there will be a lot
of "you just can not trust Linux", etc, etc from the management.
- In one of the companies I worked for, the NFS server was
feeding data to a corporate web server, Intranet server, database
server, and many other services that will bring the company to a
halt.
Of course using NFS was a bad choice, but let's just go with it for
the sake of an example.
This server was made HA using Sun's cluster solution that would
cost you both your arms and legs
Another service which was most important was the intranet used by
+1500 people.
Now lets discuss this concept in a little Depth.
What is HA?
High Availability is what it says it is.
Something that is Highly Available.
Some service that is really important to keep your company
functional.
Example:
- intranet site
- File server
- Mail Service
- DNS service
These services can fail due to two factors.
- Software misbehavior
- Hardware misbehavior
For hardware misbehaviors a lot of caution is taken by the
management when ordering hardware, for example, every machine would
have redundant power supplies, Raid 5, etc
What is often over looked is the software misbehavior.
Believe it or not, I have seen Linux boxes hang up because of a
sudden problem with Network card, overheating of the CPU etc.
The big boss is not really interested to know if the power supply
went down or the system halted due to a faulty network card.
The only thing your boss, employees and customers are interested
are that the "service" should be available.
Note that I have highlighted the term service.
Of course the service runs on a machine, and redirecting the service
and requests to another healthy machine is the art of High
Availability.
Example implementations of HA
In this example we will theoretically create an Active/Passive
cluster running an apache server, serving the intranet.
To create this small cluster, we will use one good machine with
lots of RAM, and many CPUs and another one with just enough RAM/CPU
to run the service.
The fist machine will be the master node while the second will be
backup node.
The job of the backup node is to take over the services from the
master node if it thinks that the master node is not
responding.
How will this work
Lets just think, how our users access the intranet.
They type http://intranet/ in their browser and the DNS server
redirects them to 10.0.0.100 (example ip)
How about if we put two servers running this intranet service which
different ip address, and just ask the DNS server to redirect to
the second one if the master node comes down.
Sure, thats one possibility, but there are issues about DNS
caching on the clients etc and perhaps you want to run the DNS
server on a HA cluster itself.
Another possibility, if master node fails, then the slave node may
take over its ip address and start serving the requests.
This method is called IP takeover, and is the method that we will
be using in our examples. Now all browsers will still be accessing
http://intranet/ which will translate to 10.0.0.100 even if the
master node fails without making any changes to the DNS.
How do clusters talk
How would the master/slave know that the other node in the
cluster has failed?
They will talk to each other over a serial cable and over a cross
link Ethernet cable (for redundancy, serial cable or Ethernet cable
may fail) and check each others heartbeat (yes like the heartbeat
you have) If your heartbeat stops, then you are probably dead
The program to monitor the heartbeats of the cluster nodes is
called... guess...heartbeat.
heartbeat is available at
http://www.linux-ha.org/download/
The program for ip address take over is called fake and is
integrated in heartbeat.
If you do not have an extra network card to put in two machines you
may run heartbeat over a serial cable (null modem) only.
On the other hand network cards are cheap, so add another one for
redundancy.
Preparing the Cluster nodes
As previously mentioned, we will use one cool machine and
another not so cool machine.
Both machines will be equipped with 2 network cards each and at
least one serial port.
We will need one cross link cat 5 RJ45 (Ethernet) cable and a null
modem (cross link serial cable)
We will use the first network card on both machines for their
Internet ip addresses (eth0)
We will use the second network card on both machine for a private
network to talk udp heartbeat (eth1)
We will give both machines their Internet ip addresses and
names.
For example to eth0 of both nodes
clustnode1 with ip address 10.0.0.1
clustnode2 with ip address 10.0.0.2
Now we will reserve a floating ip address (this is the service ip
address that I highlighted earlier)
10.0.0.100 (intranet). We don't need to assign it to any machine at
the moment
Next we configure the machines for their second network card and
give them any ip addresses from a range that is not used.
for example to eth1 of both nodes an ip address with netmask
255.255.255.0
clustnode1 ip address 192.168.1.1
clustnode2 ip address 192.168.1.2
Next we connect the serial cables to Serial port 1 or 2 of the
machines and make sure that they are working/talking with each
other.
(Make sure that you connect to the same port of each machine, its
easier that way)
See
http://www.linux-ha.org/download/GettingStarted.html
Installing heartbeat
Installing the software is straight forward, heart beat is
available in rpm and tar.gz both binary and source packages.
If you have problem installing the software, then you probably
should not be taking the responsibility to install a HA system (it
won't be HA, perhaps it will be NA)
There is an excellent Getting
Started with Linux-HA guide so I wont replicate the information
here.
Configuring the cluster
configure the hearbeat
example if heartbeat configuration files are in /etc/ha.d
then
edit file /etc/ha.d/authkeys with your favourite editor
#/etc/ha.d/authkeys
auth 1
1 crc
#end /etc/ha.d/authkeys
you can later move to md5 or sha when you are more comfortable, for
the first test leave the authentication mechanism to be 1.
edit /etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
deadtime 10
serial /dev/ttyS3 #change this to appropriate port and remove this comment
udp eth1 #remove this line if you are not using a second network card.
node clustnode1
node clustnode2
edit file /etc/ha.d/haresources
#masternode ip-address service-name
clustnode1 10.0.0.100 httpd
this defines that the masternode is clustnode1, for example when the
clustnode1 goes down then clustnode2 will take over the service,
but when clustnode1 comes backup again, it will reclaim its service.
That is why we are using a good and not so good machine (clustnode1
is the good machine)
The second item defines the ip address that should be taken over
with the service , and the third item defines the name of the
service.
When the machine1 takes over the service, it will try to execute
/etc/ha.d/httpd start
if it does not find the file then it will try
/etc/rc.d/init.d/httpd start
The same is true when giving up a service if clustnode2 is giving
up a service, it will try
/etc/ha.d/httpd stop
if it does not find the file then it will try
/etc/rc.d/init.d/httpd stop
When you are finished with the configuration on clustnode1, you can
copy the files to node2.
in the directory /etc/ha.d/rc.d you will find the script called
ip-request, etc which will do the job of assigning the ip address etc.
now start /etc/rc.d/init.d/heartbeat on both machines.
install a different index page on the machines to be served by the
http server
for example.
on clustnode1
echo hello world from clustnode1 >/yourWwwDocRoot/index.html
and on clustnode2
echo hello world from clustnode2 >/yourWwwDocRoot/index.html
make sure that on both nodes, the service httpd does not start
automatically on boot, remove the links from the rcN directories or
even better move the startup script "httpd" or "apache" from
/etc/rc.d/init.d/ to /etc/ha.d/rc.d/ on both machines
If everything is setup correctly and hearbeat is running and
communicating then clustnode1 will have the ip address 10.0.0.100
and it will be replying to the http requests.
try it a couple of times and make sure that its replying. If
everything seems ok, then shutdown clustnode1 and within 10
seconds, clustnode2 will take over the service and the ip address.
Your max down time will be 10 seconds.
What about data integrity issues
When service httpd moves from node1 to node2 it does not see the
same data. I loose all the files that I was creating with my httpd
CGI's.
Two Answers:
1. You should never write to file from your CGI's. (use a network
database instead.. MySQL is pretty good)
2. You can attach the two nodes to a central external SCSI storage,
and make sure that only one is talking to it at one time, and also
make sure that you change the SCSI id of the host card on machine a
to 6 and leave on machine b 7 or vice -versa.
I have tried this with Adaptec 2940 SCSI cards, and they let me
change the SCSI id. Most cheap cards will not let you do that.
Some Raid controllers are sold as cluster-aware controllers but
make sure that the vendor will allow you to change the HOST ID of
the card without buying Microsoft cluster kit.
I had to NetRaid adapters from HP and they definitely do not
support Linux. I had to break them to have a good feeling about the
money spent.
Next step will be to buy Fibrechannel cards, a fibrechannel hub and
a Fibrecahnnel storage to create a small SAN, they are definitely
more costly than using shared SCSI but they are a good
investment.
You can run GFS (Global File System, see below in resources) over FC which allows you
to have transparent access to the storage from all machines as if
they were local storage.
We are using GFS in production environment over 8 machines where 2
of them are in a similar HA configuration as I have described
above.
What about active/active cluster
You can easily build an Active/Active server if you have a good
storage system that allows concurrent access. Examples are Fibrechannel
and GFS.
If you are content with Network filesystems such as NFS, you may
use that, but I would not suggest that.
Anyway, you can map serviceA to clustnode1 and serviceB to
clustnode2 example of my haresource file
clustnode2 172.23.2.13 mysql
clustnode1 172.23.2.14 ldap
clustnode2 172.23.2.15 cyrus
I use GFS for storage so I don't have a problem with concurrent
access to data and can run as many services as is manageable by
these machines.
Here clustnode2 is the master for mysql and cyrus which clustnode1
is the master for ldap.
If clustnode2 goes down then clustnode1 takes over all the ip
addresses and the services.
Resources
- Linux-HA.org
- The home page of Linux HA
-
kimberlite clustering technology
- A Kimberlite Cluster provides support for two server nodes
connected to a shared SCSI or Fibre Channel storage subsystem, in
an active-active failover environment. The software provides the
ability to detect when either node leaves the cluster, and will
automatically trigger recovery scripts which perform the procedures
necessary to restart applications on the remaining node. When the
node rejoins the cluster, applications can be moved back to it,
manually or automatically, if required. Sample recovery scripts are
provided. Kimberlite is designed to deliver the highest levels of
data integrity and be extremely robust. It is suitable for
deployment in any environment that requires high availability for
un-modified Linux applications.
- ultra
monkey
- Ultra Monkey is a project to create load balanced and
highly available services on a local area network using Open Source
components on the Linux Operating System. At this stage the focus
is on producing a scalable, highly available web farm, though the
technology is easily expandable to other services such as email and
FTP.
- Linux Virtual
Server
- The Linux Virtual Server is a highly scalable and highly
available server built on a cluster of real servers, with the load
balancer running on the Linux operating system. The architecture of
the cluster is transparent to end users. End users only see a
single virtual server.
- 4U
cluster / 4U
SAN (Shameful plug)
- 4U cluster and 4U SAN is HA cluster and SAN implementation by
our company 4Unet.
If you are an ISP, Carrier, or a telecom company and require High
Availability solutions to be designed and implemented then 4Unet
will be the right place to ask.
Note: 4Unet is an integrator, they do not sell clusters or SANs,
they implement it for their customers. All technologies used for
these clusters/SAN are open source.
4Unet's target customers are only ISPs, Careers, and telecom
companies.
- Global File
System
- The Global File System (GFS) is a shared disk cluster file
system for Linux. GFS supports journaling and recovery from client
failures. GFS cluster nodes physically share the same storage by
means of Fibre Channel or shared SCSI devices. The file system
appears to be local on each node and GFS synchronizes file access
across the cluster. GFS is fully symmetric, that is, all nodes are
equal and there is no server which may be a bottleneck or single
point of failure. GFS uses read and write caching while maintaining
full UNIX file system semantics.