Is anyone else as paranoid about versioning their backups as I am? For me, it’s not enough to simply backup a file or system. It’s not enough to know the backup was “successful” (whatever that means… which is a whole ‘nuther post). Imagine this simple scenario:
You backup a folder. You get an email alert saying whether the backup was successful or not. Every day you receive an email saying things went peachy. A user calls and asks for a file to be restored from that folder. You check the backup and can’t find the file. They deleted it three days ago. You backup once every 24 hours. The user’s deletion has been set in stone. “Game over, man!”
Of course, this probably wouldn’t happen to anyone because of a thing that most backup programs create called “backup sets”. But what about the things you don’t backup with a big, shiny, overpriced software package? What about the things you script out with rsync, robocopy or just good ol’ cp?
Here’s my latest real world example. I use a private Zoho wiki to document everything IT related for one of the places I do work for. Don’t worry, passwords are the only exception to that rule. I wanted to back that wiki up lest a datacenter error or a Zoho hiccup caused that very valuable information to disappear.
There are two ways to backup the wiki. First is by manually clicking a download link in the Wiki’s settings page. The second way is with a unique URL that a direct connection can be made to for scripting purposes. The wiki is bundled into a zip file and coughed up to the script that accesses that URL.
Since automation is the way for a SysAdmin to go, I started making a simple PowerShell script using the BitsTransfer module. No problems! Now I can schedule this to backup once a day! Oh but wait…
What if I end up needing some information that I deleted from a few days back (Hint: Delete nothing! Put the page or text in a junk bin somewhere on the wiki)? What if there was a Zoho-side error that nuked some info and I didn’t notice it until later (that happened once)? My 24 hour window doesn’t seem big enough to let me sleep well.
Okay, so let’s put some logic in the script. I’ll check to see what day it is and make a new backup file for each day of the week. That should give me enough time to notice almost any error and recover it. And if not, I can always expand it to check for the month’s day thereby going up to 31 days in the past. Or even 365 if I wanted to make a giant If / Else statement pagoda.
Done! Now I have 7 days of unique, full backups for my documentation wiki! Oh but wait…
What if the backup works but pulls down an empty zip file (happened once)? What happens when I leave this job site and move on to other jobs (which will happen soon) and I’m not looking at the wiki every day to notice changes? How do I know that what I’m downloading is meaningful?
Let’s add some more logic to the script. Let’s check the file size after it’s downloaded and send an email if it’s under a certain size. The current size is 1.2MB so let’s check to see if it falls below 1MB.
Super! We’re one degree removed from clinical OCD. Of course, there’s no guarantee that the 1MB of data in the zip file is actually useful. Maybe I just downloaded 1MB of random hex characters. However, I’ll accept this as reasonable enough to be comfortable with. Oh but wait…
That wiki will grow over time. From 1.2 MB to 2.4MB to… who knows how big? I like to include screenshots in m documentation so one PNG or JPG will bump it up a hundred kilobytes or so. That hard-coded 1MB threshold will be outdated soon. I’d like to base my warning be based on a statistical deviation of sorts.
Let’s add some more logic to the script. Let’s check the last backup’s size, compare it to the latest backup’s size and then send a warning if the newer backup is more than 100KB smaller than it’s predecessor. Sweet! I think I’ll go straighten the tassels on my throw rug now.
Now I have a week’s worth of backup versions as well as logic and alerting to warn me of unexpected decreases in the backup’s size. If anyone is interested in seeing the PowerShell script, let me know and I’ll post it. It’s the forecasted winner of the Ugliest Script in the World 2010 pageant.
What say ye? Do you get paranoid about versioning your backups? Do small windows of restoration keep you up at night? Do you know that replication is not a backup? What’s your story about slim restoration windows (both happy and unhappy endings are welcomed)? Guest post slots are available.
P.S. While writing this, Matt Simmon’s post at simple-talk.com called “Unteachable Disaster Recovery Techniques” came to my mind. In short, it helped solidify my understanding that “Replication is not a backup”.
Also applicable is the painful story of how Ma.gnolia went down because the only backup that was being done was a database replication. Ouch. Learn from others’ pain.