Don’t Laugh at People who Place Battery Backups in Their Colocation Racks

There was a time when I was confused over if I should place battery backup devices in colocation racks. My first thought was that “you can never be too careful.” Then I began to become complacent. My basic idea was that the colocation would be vastly more capable of protecting the power system than me. If there is a power outage that they can’t stop, then certainly there’s nothing I can do to stop the damage.

Do you see the error in thinking? Certainly the colocation has vast resources in both money and experience to maintain a top-notch power system. Of course, I’m only speaking of colocation environments that are top-notch themselves. I’m not speaking about Mom-N-Pop’s Huntsmans’ Mercantile and Datacenter Solutions. You must first choose a capable datacenter in order to be reasonably assured in placing faith in their infrastructure. However, major catastrophes can and do happen. Errors in engineering will never cease. Datacenters can and do lose power to their floor.

Rimuhosting, a New Zealand hosting provider of no mean reputation, recently had a total power outage in their Dallas datacenter. Their Dallas colocation center, Colo4, released information concerning the outage:

What Happened: On Wednesday, August 10, 2011 at 11:01AM CDT, the Colo4 facility at 3000 Irving Boulevard experienced an equipment failure with one of the automatic transfer switches (ATS) at service entrance #2, which supports some of our long-term customers. The ATS device was damaged and did not allow either commercial or generator power automatically — or through bypass mode. Thus, to restore the power connection, a temporary replacement ATS was required to be put into service.

Colo4’s standard redundant power offering has commercial power backed up by diesel generator and UPS.  Each of our six ATSs reports to its own generator and service entrance. The five other ATSs and service entrances at the facility were unaffected.

The ATS failure at service entrance #2 affected customers who had single circuit connectivity (one power supply). For customers who had redundant circuits (or A/B dual power supplies), they access two ATS switches, so the B circuit automatically handled the load. (A few customers with A/B power experienced initial downtime due to a separate switch that was connected to two PDUs and the same service entrance. Power was quickly restored.)


Assessment: As part of our after-action assessment, the Colo4 management team has debriefed with all on-site technical team and electrical contractors as well as the equipment manufacturer, UPS contractors and general contractors to provide assessments on the ATS failure. While an ATS failure is rare, it is even rarer for an ATS to fail and not allow it to go into bypass mode.

While the ATS could be repaired, we made the decision to order a new replacement ATS. This is certainly a more expensive option, but it is the option that provides the best solution for the long-term stability for our customers.

The Takeaway

Bad things happen in this world. Be prepared.

This does not mean that you should protect yourself from 67 hours of lost power, however. That would be… costly. In the event of a large power outage, you’re likely going to experience some network loss as well, so your priority will likely not be to keep your customers’ systems completely free from disruption. The goal is to make the recovery smoother. The sudden loss of power to your racks will likely end up in more corruption than a Chicago city council meeting. Only you can determine how long you should be able to sustain power loss at your colocation, but thirty minutes or less seems like a reasonable amount of time to decide if you need to shutdown your systems or not.

As a result of this incident (which I was not directly affected by), my mindset to customer provided battery backed power in a datacenter has changed. Once I was cautious about it, and then I slacked off and ignored it. Now, I’m more a proponent of it than ever. Sure, you will likely not be able to remotely control your servers to perform graceful shut downs if the power affects the datacenter’s network equipment. In that case, hopefully you’ll be given physical access if the building has its physical security on a battery backup (which reminds me, one needs to ask about those kinds of things before choosing a colo).

If it’s a worst case scenario where you have no remote or physical access, make sure that you have proper shutdown procedures and scripts put in place to gracefully shut down all of your systems once the batteries reach a certain level of power consumption. There’s no need to add data corruption issues to the problem of missed SLAs, business downtime and angry users/customers.

What do you do in your colocation space? Do you provide your own battery backed power or do you trust the colocation to not let you down?


  1. voretaq7

    August 22, 2011 at 8:13 am

    Two small problems with BBUs in the rack.

    First one is that they don’t protect the CoLo facility’s core: Your rack is up, running and happy, but the switches and routers that get you to the internet are busy being dark and quiet. This is fine if you’re OK with that (as you pointed out, the BBUs will keep you from having to deal with ungraceful shutdowns, data corruption/recovery, etc.).

    Second one is thermal load (HEAT) — Batteries get warm, as do the power conditioners that go with them. Adding a bank of batteries can substantially increase the heat load of a room, which means you may be asked to pay more to offset the extra BTUs you’re producing.

    (A third problem, weight, is not usually as big an issue as the other two: Most rooms can handle the weight of a rack full of batteries these days, particularly as today’s sealed batteries are a bit lighter.)


    • Wesley David

      August 22, 2011 at 9:48 am

      Point 1 causes the decision on “how much battery backed uptime do I need?” to be answered in small numbers, I think. If you have more than a few minutes of an outage at any reasonably equipped DC, that usually means something very bad has happened and you’re not likely to be back up for an uncomfortable number of hours. Whatever length of time it takes to get things shut down nice and proper so you’re not doing 12 hours of fsck’s when the lights come back on. 5 Minutes? 10? I can think of a few cases where that might not be enough time though.

      Point 2 would seem to be lessened as a result. 4u (maybe 6u) worth of BBUs per rack, stacked at the bottom where the coldest air usually is (assuming raised floors) would, in my estimation, minimize the heat concerns. Hopefully the Datacenter Mall Cop doesn’t walk the floor with a thermal imaging camera. =)


      • tsykoduk

        August 22, 2011 at 9:58 am

        heck, 2 U of battery should be more then enough. You only really want a few moments of run time…


  2. tsykoduk

    August 22, 2011 at 9:57 am

    The only concern that I have ever had with adding batteries to a colo-d rack was that it throws off the colo’s heat and power calculations.

    That being said, I think that argument is rubbish.

    Really, you gain two advantages from batteries in your rack.

    First off, if there is a catastrophic failure, and you have stuff set up right, your servers will gracefully spin down. Nice if you are running a database.

    Secondly, you gain some measure of protection against that ‘one second bump’ when some poor sod touches the hot wire…


    • Wesley David

      August 22, 2011 at 10:44 am

      That’s an interesting point about the colo’s calculations. Wouldn’t the draw be exactly the same with or without a BBU though? What power calculation are you referring to.

      Ah yes, the the “bump” – In my case it would probably be cause by one of the SouthWest’s four-ton cockroaches making a meal out of a mains line. =)


  3. quux

    August 22, 2011 at 6:29 pm

    RE: the idea of having 30 minutes of BBU so that you can essentially go there and make the decision yourself.

    I don’t see that as being realistic in most situations. Why? Even with the best monitoring, it’s probably going to be ~5 minutes before the outage notice reaches you via SMS. Another ~5-10 minutes from the nearest computer, trying to understand the outage. Then there’s getting dressed, finding the car keys, backing out of the driveway. Assume this takes only 5 minutes, and you now have 15 minutes to get to the datacenter, gain entrance, and work your magic.

    In short – it’s probably not realistic. If I were designing a UPS solution for (my systems within) a colo, I’d automate the whole thing around the decision ‘has the colo power been down for more than 5 minutes? If yes, graceful shutdown everything connected, in the proper order.’

    If feasible I would automate startup too. ‘Colo power back on and no manual override? Start it back up, in proper order.’

    And be done with it (not forgetting battery maintenance every 6 months). Emergency response stuff needs to be simple and reliable.


    • Wesley David

      August 22, 2011 at 7:04 pm

      All good points. I think I was being optimistic in that I’m not to far from a major DC in my city, plus I’m unlikely to spend 5 – 10 minutes trying to understand the outage but would instead instantly jump into the Batmobile and roar down the road to the DC in a frothing panic. Nonetheless, even if I got there in time with 10 minutes of battery to spare… what’s the advantage? Not much of one that I could see. automation is the way to go.


  4. scumola

    August 25, 2011 at 10:58 am

    I was told by a datacenter NOC technician that it’s illegal to place a UPS inside a rack or cage in a datacenter for fire-suppression reasons.

    There’s a big red switch that a NOC technician can press in case of an emergency that will shut off power to the entire room. If a UPS is in the room keeping a rack of machines powered, this becomes a problem for fire personel, like the firemen won’t even go into the room if there’s still power to devices or something like that.

    Just having a UPS in a rack in a datacenter makes me question whether or not that issue is even known to the datacenter’s NOC personel.


    • Wesley David

      August 25, 2011 at 3:40 pm

      Whoa… whoa… illegal? I’m going to have to bring that up with my DC rep. I’ve never heard of that or even considered it. Thanks for the heads-up. =/


    • wfaulk

      August 26, 2011 at 10:26 am

      According to the facilities engineer at the DC I work at, there is no law or firefighter-based reason for disallowing UPSes, and that sounded as riduclous to him as it sounded to me. Firefighters are not going to go into a building with 10MW of power still flowing, but a handful of 2kW UPSes is not really a concern.

      However, he was very quickly adamant about not having UPSes in the facility because of a concern that customer UPSes could negatively affect the DC UPSes in the event of a mains outage. It was honestly not clear to me why, but I’m not a facilities engineer. He also added that if we found one in a customer cabinet, we would demand that they remove it and that if they didn’t do so in a timely fashion that we would remove it ourselves.

      The clear universal takeaway is: if you’re thinking about putting your own UPS in a datacenter, clear it with the datacenter before you do so.


  5. Glen Turner

    August 28, 2011 at 11:16 am

    Colos don’t tend to disallow UPSs in racks for safety reasons — it is well understood how to connect the Emergency Power Off of a facility to downstream UPSs. I don’t know about the USA, but this is required in Australia, and moreover a remote EPO also has to be present at the firefighter panel.

    They have two other good reasons.

    Firstly, a downstream battery roughly doubles the maximum current to the rack. That is, after a fault the gear in the rack is still running and the UPS is recharging. That not only means that the colo has to double it’s own UPS and generators, but also that maximum current draw is significantly above the average, and this occurs at the worst possible time (when another fault is most likely).

    Secondly, batteries present a significant risk to other occupants of the colo. I don’t know if you’ve ever seen a big UPS fail, but gas venting, fire and shrapnel are common. That’s why the colo itself doesn’t have its batteries on the facility floor, but in its own room. A fair percentage of colo customers won’t have the same testing and replacement discipline as the site (UPS batteries last about three years, with failures almost certain after six years. ie, just one missed maintenance cycle from a customer trying to save money).

    If you need robust availability of services, then the way to do that is to spread your service around multiple sites and have automatic and seamless failover. Then you don’t even particularly care if one of the colo itself burns down/floods/discovers asbestos/etc.


    • Wesley David

      August 29, 2011 at 12:20 pm

      Thanks for the thoughts. I’m still mulling over this whole thing. I’ve been talking with quite a few people about it as I’m designing a rack for a project I hope to get involved with soon. I think another post is in order to address some of these ideas.

      Batteries, if allowed by the DC and local laws, and if the fuses can handle them and if you’re disciplined enough to cycle them appropriately, seem like a good idea still… but that’s a lot of ifs. It might be adding more potential failure than is worth it. Still cogitating…


  6. CT

    August 28, 2011 at 11:17 am

    I’ve been in many facilities facilities that don’t allow batteries in the racks. Most quote fire code, though I’d doubt it’s actually illegal anywhere. Probably just company policy, one I agree with

    As for the Colo4 incident you quoted, multiple Colo4 customers had bought Colo4’s redundant “A/B” power. Yet, they had no power in their racks (I was there, my employer’s gear was moderately affected). Apparently some power circuits were miss-labeled.

    Though, it’s also worth noting that I’ve never heard of that happening in a “larger”, more well-known facility.


    • Wesley David

      August 29, 2011 at 12:17 pm

      That’s part of my question about the situation. How well operated was Colo4? It seemed major enough to have known better and been operated more carefully.

      I think it’s worth keeping my business with the cream of the crop DCs for this reason. I’ve been shocked at how small and slipshod some DCs are run.


  7. Peter

    August 29, 2011 at 7:34 am

    In my experiences an UPS draws an disproportional amount of current when it starts up and has to feed the servers as well as recharge the batteries. It means you can put less equipment into your rack because you have to stay well below the fuse ratings. Or else the fuses will blow when the power comes back.

    I have seen that happen in more than one occasion.

    Regarding firemen. We did tell the fire department about the battery room and that we use our own extinguishing system. They noted that and will keep that in mind if, heaven forbid, things get out of hand.


    • Wesley David

      August 29, 2011 at 12:15 pm

      Okay, good information to know. I talked to one person that works in a local DC yesterday and he didn’t know of any battery restrictions. I’ll talk to a few other DCs in the area soon about it.


Leave a Reply

Follow TheNubbyAdmin!

follow us in feedly

Raw RSS Feed:

Contact Me!

Want to hire me as a consultant? Have a job you think I might be interested in? Drop me a line:

Contact Me!

Subscribe via Email

Your email address is handled by Google FeedBurner and never spammed!

The Nubby Archives

Circle Me on Google+!

Photos from Flickr

Me on StackExchange

The IT Crowd Strava Group

%d bloggers like this: