Lessons Learned from Cascading Failure and Face Punching a House

In regard to my post “Cascading Failure, Technical Debt, and Punching a House with my Face“, I was asked about my conclusions and how I dug myself out of that hole.

Before I go any further, let me confess that the blog post itself was another failure in that saga. I began writing it the day I discovered the a forgotten boot DVD in the optical drive was the cause of the server not coming back up, and I continued adding to the post over the next several days. Because I didn’t give a contiguous block of time to the writing, I left out some details. Furthermore, I posted it too soon because I was absent minded as I was saving the post and accidentally published it instead of saved it. I had to quickly unpublish it, however it was too late; an email notification went out to my subscribers so the unfinished article was read by a few people. Then when they clicked to go to the blog, they were met with a 404 error. Finally, I continued to write the post, scheduled it to be published the next day, and then completely forgot to polish it off because I staggered away from my computer to hit the treadmill, shower, and then collapse into bed. I woke up the next day, my computer still on, my desk still in “work mode” with items scattered all over it, and the WordPress control panel still open with the post being edited. The failure train was still going full steam.

If you’ve read the previous blog post, you might want to read it again because it’s now a little better presented with some key points added that I had left out.

The conclusions that I came to as a result of the Circus of Calamity that happened over Mother’s Day weekend are nothing new to me, and very likely nothing new to you. However I think they bear enough fruit to be written down. I’d like to codify my thoughts on technical debt in the system administration world and how to avoid it or deal with it if you’re currently in over your head. Mayhaps this will be the first effort in a larger work.

Without giving too much thought to the order in which I think the following tenants are valued at, here are some of the lessons that stand out to me.

Lessons learned:

Devote contiguous time to projects. In this case, my client gives me an hour cap. I can work on their systems for a certain amount of hours per month. That naturally lends itself to non contiguous blocks of time as I hit my maximum and then wait for the month to roll over. However, even with that limit, it’s best to spend it all in a largely uninterrupted segment of time. Regardless of if you are salaried, full time contracted, or an independent hired gun that works for whoever needs you whenever they can pay you, the concept remains the same.

If you have something to do, devote as much contiguous time to performing a task as you can. In my cast I tend to carve my day up thusly: four or five hours on one client’s systems, then another four or five hours on another client, and then perform three or four hours of tasks that I need to do on my own systems, bookkeeping, and general business busywork. This leeds to long days of frequent context shifting. In my last post’s situation, that led to errors like leaving a DVD in the optical drive of a server as well as not copying over the client’s password store on a schedule. The last post of mine was even a victim; as I context shifted too frequently, I forgot that I still had some polishing to do on it.

Don’t break your thoughts up. Think on a specific task, task set, and/or client for as long as possible with as few interruptions and context changes as possible.

Fix problems as they come. “I’ll get to that later” is death. The problem with the ILO needed to be addressed immediately. The problem with the BIOS clock resetting after power state changes needed to be addressed immediately. The problem with the hard drive controller failing needed to be addressed immediately. These were emergency level things that got distracted by the peculiarities of being an independent consultant and working for a place that has very small project budgets. Or perhaps that had nothing to do with it. Perhaps I could have pushed harder and taken more initiative. I know this office well and could have ordered the BIOS battery and contacted a local contractor to walk in and replace it. Act first, bill later! After all the years working with this group, I’m fairly certain that such an emergency action would be accepted and paid with no question.

Regardless of my specific situation, the idea is that, with few exceptions, one needs to address problems that crop up at that time, and not later. This is a sister concept to devoting contiguous time to a project. Keep on with solving a problem, and its directly related troubles for as long as possible without interruption. That can mean hours, or days, or longer as possible. This can lead to a rabbit-hole scenario where one simple change then leads to a huge infrastructure change. However, simply glossing over an issue adds another mound of debt to the overall systems debt. Don’t know why DNS queries are taking five seconds to resolve? Eh, it’s just a few seconds to wait and we’ve got bigger issues to solve with the payroll application. However, when a problem crops up as a result of DNS not being as smooth as one would expect, now you’ve got to face the DNS problem with another problem on your back. That pressure may lead to greater problems by encouraging you to implement half-measure solutions for the DNS resolution delay, which then causes another problem, which then causes another, and another… etc. and etc.

This is rather hard for consultants like me, however, so this bad tendency is strengthened. Consultants are paid by the hour in most cases, so the pressure to deliver without drawing out billable time is great. If a flat-rate project quote is made, you’re always one step away from hitting a mine field if you go too deeply into ancillary systems. I’ve lost my shirt as a result of a flat-rate project quote that ended up with scope-creep. While once in a while chasing down each problem to its root turns out great because you get extra business to fix those systems, most of the time it causes much more pain and suffering in the form of unpaid invoices, broken systems, and angry business owners.

This also seems rather tough for any IT person in general. At any given moment we’ve got large amounts of projects and rooms full of executives all competing for our time. Each one thinks they’re the most important person and project. Each one wants to be completed yesterday. There comes a point when being driven by the tyranny of the urgent has to stop, one way or another. The balance of when to chase down a problem to completion is a fine one, but I think we should collectively assume that a problem should be fixed immediately and require strong evidence to the contrary before abandoning the pursuit.

Get rest and stay healthy. I’ve been grinding hard for far too long. Starting a business is no joke. Keeping the business alive with paying clients all the while sharpening your skills and keeping potential clients engaged just in case existing business leaves is even less funny. I’ve been working so many hours in a week for three straight years that I’m aging myself prematurely. I’d love it if I could take a vacation or relax more, but the truth of the matter is that the clients I’ve picked up haven’t been the most lucrative and there have been some billing and invoicing… issues. I don’t have the time or the money to do much else with life except clatter away in front of a computer.

I’ve been departing in the evenings to get back to a hobby that has always interested me: weight lifting. That’s helped, and I’m contentedly regaining strength and getting re-bitten by the lifting bug and the addiction to “the pump”, but this is a fairly new re-committment. The plain facts are that I’m tired and zapped of mental energy. I’ve been grinding and it shows. The dumb mistakes I made for the client in the last post are in some part a result of mental exhaustion. I’ve lost some of the love for information technology that I used to have. I’ve visibly aged ten years in just three and I don’t have much material gain to show for it. I’m tired, and I did a disservice to my best client because of it. Shame on me.

(That said, if anyone knows of a business known for paying market hourly rates, and on time, that needs an independent system administrator with my skills based in Phoenix, Arizona, I’ve now got some time that I can book for a new client. Please, no employment positions at this point. Only consultant / contractor.)

Kanban can help! Kanban – I lurves it. I’m not about to suggest that it is the solution to everyone’s problems, but for those of us who are more tactile and visual, this kind of project and task management system can really make a difference. Kanban, in the simplest explanation that I’ve come up with so far, is a means of visualizing work and encouraging a limit to concurrent work.

In the cases of carving out contiguous time, fixing problems at the earliest possible moment after discovery, and even staying rested, kanban can be very useful since it forces you to be constantly aware of what work is currently being performed and what work is waiting to be performed. I use it to break larger projects into smaller chunks. Typically if a task would take more than four hours then it needs to be broken down into more than one ticket. However, in some cases I simply write “Work on X project, 4 hours” and that’s enough. But I’m digressing into the specifics of kanban when that’s not the point here.

Kanban as a means of staying on target and being ever aware of what context you’re currently in can be a huge boon to the hamstrung IT person. It’s especially helpful if you work with others and keep the kanban board highly visible. That way people always know what you’re working on. It’s very helpful to have management buy-in to that kind of system. Why? Imagine you’re in an environment where everyone thinks their projects are of utmost importance. If you limit your concurrent working projects to one or two like a good kanban system suggests, you can point to the board, specifically the area that is dedicated to tasks that you are currently working on this very moment, and force a choice. When someone complains about their stuff not getting done, then, with the blessings of leadership, you can require that the person who wants their project to be given top priority, contact the task owner of the current project that’s being worked on and explain to them why that has to be shelved and how long it will take for you to get back to work on it.

This can really help. Everyone understands sticky notes on a white board. If Gilles in accounting thinks that his project is the most important thing, tell him he can move James’s ticket from the currently worked-on project over to the holding tank. You know James. Six feet eight inches of security guard whose beard has more muscle than your entire body? Good luck, Gilles! But seriously, putting things into perspective for various project owners can seriously help you block off contiguous amounts of time and stay on track.

Sadly, in my case as an independent consultant, I can’t make Client A call Client B and explain why Client B needs to let up on using my time. It doesn’t work that way. However, I can still use kanban to aid in my own self discipline of keeping track of what I’m working on and when.

Digging out of the Hole

Some people have asked how I’ve dug out of that hole with the client. I’m still working on it. Meanwhile, in addition to that client, I’ve got a few other projects that I’m trying to sew up, so it’s a delicate balance of time and guarding contiguous working hours on each project. I can tell you that when I do get out of the hole of technical debt, it will be in large part due to kanban and a good set of targeted goals.

I need to first create a large vantage point for all my major task spheres. Currently I have three clients that I’m working with. One is a bit of a deadbeat. One is low priority. One is high priority (the one with the crashing server). In my personal life, I’ve got a few large goals to be concerned with as well. I’ll make those large circles of potential task lists and then figure out what is most important to me at this point in my life.

I can tell you that the high priority client (who also pays on time and is in good standing) will be at the top of the list of work projects. My low priority client will be a close second because the project is smaller and close to completion. It will be a great relief to get them finished. The client who is late paying is down on the list of priorities. Even them paying up their overdue invoices won’t get them past third spot until I see a history of on-time payments and not wanting cut-rate hourly rates.

Within each task sphere will be a list of tasks ordered by importance. Of highest importance for the client with the failing server will be to get new equipment in and start the migration. That much is obvious. I’ll need to address each quirk and hiccup along the way, such as ILOs dropping off the network and the like. Furthermore, I’ll need to guard contiguous time. Instead of dividing a day in three parts, (four or five hours for one client, four or five hours for a second client, and then a handful of hours for business management), I prefer to block off multiple days in a row for each client and task. That looks like this: Monday and Tuesday for one client, Wednesday for another, Thursday for business management, Friday and Saturday for another client (Yes, I work six days a week. It ain’t easy being self employed).

This should facilitate a steady march towards normalcy and healthy systems.

Parting Thoughts

  1. Don’t assume something isn’t important. Assume it’s important and require a lot of evidence to prove that it’s not. Give a serious effort at fixing any problem at the earliest possible moment.
  2. Dedicate contiguous time to completing a task and don’t dare multitask.
  3. Prioritize based on danger and worth.
  4. Chill out and get some exercise. BRO, DO YOU EVEN LIFT?

Not exactly ground breaking advice, but perhaps you need to hear it. I know I do. Preach it. Got any other tips and ideas? Any stories of failure and phoenix-like recovery from ashes? Let me know in the comment below or send me a guest post!

4 Comments

  1. andy

    September 8, 2013 at 6:55 pm

    I’d love to be able to break my days up like that – unfortunately, it’s unrealistic working the way I do.. I get lots of little short term tasks, where as you have larger project based things.

    I like the idea though.. all weekend I tried to dedicate 4 hours to writing a roadmap.. never happened :(

    Reply

    • Wesley David

      September 9, 2013 at 1:31 pm

      Yes, it appears that my few clients do afford me longer term project style work. Get back to your roadmap! NAO!! =)

      Reply

Leave a Reply

Follow TheNubbyAdmin!

follow us in feedly

Raw RSS Feed:

Contact Me!

Want to hire me as a consultant? Have a job you think I might be interested in? Drop me a line:

Contact Me!

Subscribe via Email

Your email address is handled by Google FeedBurner and never spammed!

The Nubby Archives

Circle Me on Google+!

Photos from Flickr

Me on StackExchange

%d bloggers like this: