Make Sure Your High Availability System Isn’t Just False Bravado

This is something of an aggregated re-post of a series of articles that have been published over at Simple Talk.

I’m paranoid by nature. I was the one growing up who, while other boys daydreamed about giant mechanical spiders eating the city, played baseball and wrecked their BMX bikes, was daydreaming about building a house in a mountain that was completely self sustaining, hurricane-proof, earthquake immune and had a helipad and rocket propelled evacuation capsule. In fact, I still dream about my hoped-for mountain home’s solar field, grey water recycling plant, greenhouses and indoor/outdoor pool complete with a waterfall and mutant laser dolphins.

This paranoia has served me quite well in my professional life. I like to team NICs, use multipaths, replicate data, version documents, archive files, etc. and etc. I also like clustering and other forms of high availability. However, while reading a vendor white paper (yes, I read those once in a while) I began to ponder just what was being protected by most HA solutions. After some time given over to thinking, researching and writing about the topic, I was shocked at how little protection most if not all high availability systems provide.

Aside from a technical misunderstanding of HA, there are also some career-based misunderstandings that surround high availability. Some people tend to think that HA can make their career more secure or their bosses appreciate them more or their userbase happier. HA will not, by itself (or even in part, as the case may be), make any of those things happen.

After my study on the topic was over and the dust had settled, I had hacked out four articles that follow in succession with each other to clarify the situation. I list them all out here for a quick reference to the entire series:

  1. 7 reasons why High Availability will help you fail in even more spectacular ways than ever!
  2. 7 Career Pitfalls that High Availability Systems Will Not Help a SysAdmin Avoid
  3. 7 Things that High Availability is Not
  4. The One Way That High Availability Will Help You
  5. High Availability or High Recoverability?” (This article was added in August 2011 as a result of some great comments that were added to this blog post)

If you decide to read any or all of those articles, let me know what you think of my treatment of the topic. I once was a lover of high availability solutions, but after closer inspection, I see the concept as a lot less of the savior that I once did. It’s a very tightly scoped thing that has less potential to save your bacon than you may first think. In fact, in some scenarios it can be a time and money waster that distracts you from where you should really be spending your time making a system more resilient.

Do you agree? Do you disagree? Does your experience dictate otherwise or is it in lockstep with my ideas? Let me know in the comments below.


  1. ITHedgeHog

    July 12, 2011 at 1:35 pm

    HA and other techniques fulfill a fairly narrow requirement for the fail over of systems which simply cannot go down. Generally though they’re systems which cannot go down in one location, most HA implementations I’ve encountered have been clustered machines in a single location.

    In my area of the market we have very few ‘redundent’ systems, we have a single cluster running which is used to host our Hyper-V Servers so that they are able to fail over from node A or B on to the backup node C.

    For most of the Hyper-V systems on that cluster and for other stand alone servers we have no redunency built in because of the cost of such solutions – the cost of them being down is not worth the cost of implementing any form of HA. It’s far cheaper for the company to pay the Technology Services team to redeploy a server and its work load than it is to run a second instance.

    HA shouldn’t be treated as some kind of pill that cures your IT ills, instead it should almost be seen as an IT shot gun – yes if deployed and maintained correctly it can save your life when you need it but if not….

    … Say good bye to your own foot.


    • Wesley David

      July 12, 2011 at 1:40 pm

      You bring up a very good point. The idea of a HA system costing more than the downtime itself is one that I don’t think I gave enough time. One really has to weigh the cost of downtime and then accurately estimate how much downtime will likely be sustained. Once you have some ballpark figures, it could very well be that it’ll cost less than the five, six or even seven figures that HA systems will drain you for.

      In many cases it seems that the better option is to simply make the recovery process streamlined so that it takes less time to recover.


      • ITHedgeHog

        July 12, 2011 at 1:46 pm

        Thats exactly right, some rough figures for us –

        We have an IIS Server, we’ll call it production server 1. It’s running 25 web sites with a variety of web application pools pre-configured.

        Provided we have the hardware to hand (And we do!) we can redeploy the entire system in approximately 2 hours using a variety of techniques including:

        * Windows Imaging to deploy a preconfigured image
        * SCCM to deploy the required patches
        * Web Deployment Tool to redeploy pre built Website’s and App Pool Configuration

        Whilst there is a significant investment in the systems already the actual cost in staff time is about £300 when you fact in how many people would be involved.

        Implementing HA? Ball park that is going to cost us approximately £5K to £10K to purchase the necessary equipment, build the system, test it and finally deploy it.


      • tsykoduk

        July 12, 2011 at 1:52 pm

        We used to call it “buying 9s” as in 9’s of uptime. It should come down to a simple fiscal choice in most cases. How much is an hour of outage in the middle of the day going to cost you? A day? Don’t spend 10x that on some fancy wizbang new tech to mitigate that issue that might never happen!

        Now, I have worked in environments where outages were not measured in money, but rather in lives. Then you spend what you can, in a reasoned, planned manner, getting the most bang for your buck first, and advancing up the chain until your budget is gone.


        • Wesley David

          July 12, 2011 at 2:05 pm

          Really, most SMBs would do better to spend money on backup and recovery measures, like Steve mentioned above. You get greater ROI in that you can protect many different systems, not just that one important thing that you’re considering HA for. I may turn this into another article. Hmmm…

          Also, when lives are at stake, that’s quite a different thing. I’m glad that nothing I’ve worked on so far has had such consequences in the balance.


      • josephkern

        July 12, 2011 at 2:13 pm

        I thought you implied it with this statement:

        “High availability will not save the company any money. HA costs money, and a lot of it. The only time it (hopefully) saves anything is in the case of a disaster, which really isn’t true savings. Yes, it prevents the active loss of money should a worst case scenario happen, but that’s not true financial savings. Thus, you shouldn’t present it to your superiors (or even yourself, for that matter) as a cost-saver. Present it like it really is: A butt saver.”


        • Wesley David

          July 12, 2011 at 2:31 pm

          True, that portion hopefully toppled (or at least put a huge stress fracture in) the idea that HA is a money saver. Strictly speaking, it’s not. I fear for the next poor sales rep that presents a HA solution to me framed as a cost saver. =)


  2. bdmorrison

    July 12, 2011 at 6:23 pm

    All valid and interesting points in your articles and I think anyone who has managed HA services/clusters has thought of these things.

    I have questioned the purpose of clustering certain services. If a service shits the bed because of the clustering, why am I clustering it? If I have 2 years on a single node, never failing over (and when it does, see above — it shits the bed), why am I clustering it?

    Another point you bring up, a part of HA (in my opinion) IS DR. It’s something we’re slowly working towards at work. Moving a node of each cluster to the DR site for failover, but it’s not as simple as just moving a node to the DR site. Spanning VLAN’s, redundancy in networking/internet, DNS, storage, etc. etc. takes MORE money than to just take a node and throw it in a cage 100 miles away.

    And you can question HA, is it really HA if my node-pair is in the same rack or in a rack next to each other? I have networking off each head to both server cores. I have two power leads off each head to two different PDU’s/Sources of power. I have redundant links for heartbeat between the heads…So I can stand to lose a head, but what if the impact is more global…both heads w/in a foot of each other, how good is that? BUT what does dark fiber cost to run a metro cluster for a NetApp??

    Look at the recent Amazon outages and all the “HA” that exists with the “cloud”. I’ve told a number of people recently, 100% up time does not exists, sure you can get close to it, but it’s expensive.

    We had a discussion at work the other day regarding certain services (internet connectivity for end users) being always available no matter where in the world they might be. Think about all the ways to get internet, then think about what if their device fails, what other device to they need to get internet? Then the training for each device…It snowballs to a HUGE dollar amount.

    Hopefully my comments made some sense and not total and complete ramblings.


    • Wesley David

      July 12, 2011 at 6:47 pm

      AHA! Excellent point that clustering can itself cause availability issues. Oh the irony.

      Yes, clustering two servers that are humming along millimeters away from each other in the same rack really causes you to inspect every link in the dependency chain. The NICs, the top-of-rack switches, any fabrics, carrier equipment, carriers… oh the bloodshed!! D=


Leave a Reply

Follow TheNubbyAdmin!

follow us in feedly

Raw RSS Feed:

Contact Me!

Want to hire me as a consultant? Have a job you think I might be interested in? Drop me a line:

Contact Me!

Subscribe via Email

Your email address is handled by Google FeedBurner and never spammed!

The Nubby Archives

Circle Me on Google+!

Photos from Flickr

Me on StackExchange

The IT Crowd Strava Group

%d bloggers like this: