Archive for 'August, 2011'

Home » Archives for August 2011

What Commands are Available on my Linux Machine? (Bash Only)

Posted in: SysAdmin
  |  by: Wesley David
Tags: Linux

Having more and more Linux machines that I seem to be encountering and administering, there have been a number of occassions where I’ve been stymied by not having a bash command that I was expecting to be available. I wanted a way to see all available commands within bash. A search revealed that many people had many different ways of achieving that goal. Many of them were rather convoluted and required not insignificant shell scripts. However one solution stood out above the others.

Thanks to a user over at StackOverflow, I was introduced compgen. Allow me to introduce you. Compgen is a bash builtin that, in the words of bash’s man page:

compgen [option] [word]

Generate[s] possible completion matches for word according to the options, which may be any option accepted by the complete builtin

And what options does the complete builtin accept? Quite a few. the ones that are most pertinent to me are the -c and -a options. Compgen run with the -c option will show all commands available and the -a option shows all aliases. Piping the results to grep make searching for a command a breeze. Recently I needed to see which filesytem tools were available on a machine so I simply used compgen -c | grep fsck

compgen with -c option and piped to grep

Now when you’re sitting at a new Linux machine and want to know if you have any commands of a certain type, your just a one-line away from finding out. Do you have any different methods? Let me know in the comments.



30AUG
2
Tweet

When Viruses Seem More Reliable than Windows

Posted in: Humor, SysAdmin
  |  by: Wesley David

While researching a Windows update for potential problems, I stumbled upon a forum post that I was unusually amused by. This wasn’t a typical case of internet savagery or the ravings of an inebriated internet troll. This was the kind of prose that belies deeper misunderstandings than what is first seen.

The backstory: A forum user posts a quick question asking how he can fix a certain update’s inability to install. Not an uncommon issue. Many Windows updates hang on installation and need some cajoling. Another forum user offers a quick link to a possible workaround. And then the fun begins. A third user kicks the door down, stomps into the thread and sets his coffee cup on the CAPS LOCK key. What follows is some notable misunderstandings of the basic capacities of operating systems and computer science in general. I’ll let you discover the post in its full glory if you so choose, but here are a few of the highlights.

As usual Microsoft has DONE nothng to provide a CANNOT FAIL fix for this error. [...] When is MS going wise up and start writing ALL update in NO FAIL MACHINE CODE. If a Virus can overrule Microsoft then Microsoft can surly write an UPDATE THAT CANNOT BE STOPPED OR FAIL the same way viruses are written. [...]

Certainly, Microsoft has some issues with their update system and smoothly recovering from failed updates is a bit of a pain, especially for someone who is just a casual computer user. Furthermore, I do not mean to completely marginalize the frustration that anyone, particularly this user, has felt concerning a Microsoft product. However, what strikes me is the misunderstanding that 1) Viruses cannot fail to install (and are more reliable than a major operating system) and 2) That it’s possible to make a “no fail system” of any kind, especially for one of the largest collections of code on the planet.

Perhaps it’s just a failure of the computer industry to communicate how things really are. Perhaps it’s the problem of so much emphasis being placed on user friendliness that when something inevitably goes awry, people are shocked at the complexity involved to recover from the error. Or perhaps it’s truly an oversight on the part of the vendor who hasn’t performed the necessary actions to make their software stretch the possibilities and make the difficult seem easy.

Before I veer off into too much speculation and philosophizing, let’s make a simple goal for ourselves as technology workers: Educate someone near you in the topic of computers. They don’t have to be taught subnetting, object oriented programming or ITIL. Just let them know a bit more about computers and their operation. Teach them safe browsing. Talk to them about complexities using basic analogies. Show them how to not be intimidated by their PC, but make them understand its limitations. Leave them better than when you found them.

Together we can lessen the abuse of caps lock on the internet and, more importantly, lessen someone’s frustration with the ubiquitous personal computer.



25AUG
7
Tweet

Don’t Laugh at People who Place Battery Backups in Their Colocation Racks

Posted in: SysAdmin
  |  by: Wesley David

There was a time when I was confused over if I should place battery backup devices in colocation racks. My first thought was that “you can never be too careful.” Then I began to become complacent. My basic idea was that the colocation would be vastly more capable of protecting the power system than me. If there is a power outage that they can’t stop, then certainly there’s nothing I can do to stop the damage.

Do you see the error in thinking? Certainly the colocation has vast resources in both money and experience to maintain a top-notch power system. Of course, I’m only speaking of colocation environments that are top-notch themselves. I’m not speaking about Mom-N-Pop’s Huntsmans’ Mercantile and Datacenter Solutions. You must first choose a capable datacenter in order to be reasonably assured in placing faith in their infrastructure. However, major catastrophes can and do happen. Errors in engineering will never cease. Datacenters can and do lose power to their floor.

Rimuhosting, a New Zealand hosting provider of no mean reputation, recently had a total power outage in their Dallas datacenter. Their Dallas colocation center, Colo4, released information concerning the outage:

What Happened: On Wednesday, August 10, 2011 at 11:01AM CDT, the Colo4 facility at 3000 Irving Boulevard experienced an equipment failure with one of the automatic transfer switches (ATS) at service entrance #2, which supports some of our long-term customers. The ATS device was damaged and did not allow either commercial or generator power automatically — or through bypass mode. Thus, to restore the power connection, a temporary replacement ATS was required to be put into service.

Colo4’s standard redundant power offering has commercial power backed up by diesel generator and UPS.  Each of our six ATSs reports to its own generator and service entrance. The five other ATSs and service entrances at the facility were unaffected.

The ATS failure at service entrance #2 affected customers who had single circuit connectivity (one power supply). For customers who had redundant circuits (or A/B dual power supplies), they access two ATS switches, so the B circuit automatically handled the load. (A few customers with A/B power experienced initial downtime due to a separate switch that was connected to two PDUs and the same service entrance. Power was quickly restored.)

[...]

Assessment: As part of our after-action assessment, the Colo4 management team has debriefed with all on-site technical team and electrical contractors as well as the equipment manufacturer, UPS contractors and general contractors to provide assessments on the ATS failure. While an ATS failure is rare, it is even rarer for an ATS to fail and not allow it to go into bypass mode.

While the ATS could be repaired, we made the decision to order a new replacement ATS. This is certainly a more expensive option, but it is the option that provides the best solution for the long-term stability for our customers.

The Takeaway

Bad things happen in this world. Be prepared.

This does not mean that you should protect yourself from 67 hours of lost power, however. That would be… costly. In the event of a large power outage, you’re likely going to experience some network loss as well, so your priority will likely not be to keep your customers’ systems completely free from disruption. The goal is to make the recovery smoother. The sudden loss of power to your racks will likely end up in more corruption than a Chicago city council meeting. Only you can determine how long you should be able to sustain power loss at your colocation, but thirty minutes or less seems like a reasonable amount of time to decide if you need to shutdown your systems or not.

As a result of this incident (which I was not directly affected by), my mindset to customer provided battery backed power in a datacenter has changed. Once I was cautious about it, and then I slacked off and ignored it. Now, I’m more a proponent of it than ever. Sure, you will likely not be able to remotely control your servers to perform graceful shut downs if the power affects the datacenter’s network equipment. In that case, hopefully you’ll be given physical access if the building has its physical security on a battery backup (which reminds me, one needs to ask about those kinds of things before choosing a colo).

If it’s a worst case scenario where you have no remote or physical access, make sure that you have proper shutdown procedures and scripts put in place to gracefully shut down all of your systems once the batteries reach a certain level of power consumption. There’s no need to add data corruption issues to the problem of missed SLAs, business downtime and angry users/customers.

What do you do in your colocation space? Do you provide your own battery backed power or do you trust the colocation to not let you down?



22AUG
16
Tweet

MegaPath’s Tech Talk Video Contest has Been Extended!

Posted in: SysAdmin
  |  by: Wesley David

Recently I mentioned a certain contest run by MegaPath called the Tech Talk Video Contest. It has just been announced that the contest is still running and has been extended to September 19, 2011. The prize pool has been raised to $37,500USD in total cash prizes and the first prize has been raised to $14,000USD.

And if you can’t actually make a video, you still have until October 3, 2011 to share your favorite videos for a chance to win one of three $500 Sharing Prizes.

Check out the videos that have already been submitted and get contest entry details here: http://contest.MegaPath.com/?mtag=SA_main.

If anyone enters, let me know and I’ll mention your entry on the blog!



19AUG
0
Tweet

Behold a Buzzword is Born: High Recoverability

Posted in: SysAdmin
  |  by: Wesley David

In a recent blog post, I collated all of my writings on High Availability that Simple-Talk has published. I always try to solicit my readers’ advice since I know that my own understanding is likely well below many of yours. I’ve only been in this industry for a relatively short amount of time (I count 2004 to be when I first considered working with computers as a profession) and many of you were seasoned veterans while I still thought that TCP/IP was a Mac OS 7.6 extension that made dial-up better (or something like that).

My readership never disappoints and several comments on my articles opugning high availability (and the misconceptions that surround it) caused me to think even deeper into the subject. Thanks in large part to Steve the Hedgehog, Greg “Tsykoduk” Nokes and Barry Morrison I engaged Deep Thought Mode and considered the notion that high availability could be replaced by something that I termed “High Recoverability.” Furthermore, high recoverability would be more of a boon to virtually all IT systems and the organization as a whole than high recoverability would be. Check the article out over at Simple Talk.

High Recoverability Summarized

If you had a business that would lose $10,000 if its systems were down for one full business day, would you spend $20,000 to prevent it? $30,000? Perhaps a CEO would sign off on that. But what will $30,000 get you? Likely it will gain you less than you think. You may be able to implement high availability for one or two key components in the business. Certainly part of that pricetag will be in recurring support fees for whatever is implemented, unless you like to try and be a subject matter expert on those systems on top of your regular duties.

Once the HA systems are implemented though, what do you have? A service that sits silently and waits for the worst to happen. Likely there isn’t much of any value that’s been added to the business or the IT infrastructure. What if you could take the budget for a high availability project and spend it on making your key systems “highly recoverable?” Which brings me to the most obvious question: “What exactly is “high recoverability?”

Simply put: High recoverability is implementing and annealing a series of services that will enable you to bring specific systems and their services back online and handling tasks in the shortest amount of time possible (preferably in under one business day).

It starts with considering the hardware (or virtual hardware), moving to the operating system, then configuration, application data and finally vetting. In some cases you could go from total hardware failure to having the service(s) on the server back online in mere minutes.

The real value lies in the reality that the systems used to achieve so-called “High Recoverability” will benefit the entire IT infrastructure well beyond any disaster recover scenarios. Standardized images, automated software deployment, patch management, standardized hardware or virtualization, automated service testing and many, many more things serve to make everything from the CIO’s blackberry to the building crew’s ticketing system work better.

Check out the article “High Availability or High Recoverability?” for a more detailed exposition of the concept. Discard the needless buzzword at will, but keep the idea firmly in your mind.

High Availability vs High Recoverability Article List

Here’s the updated list of articles that have been conjured up on the topic (in order):

  1. 7 reasons why High Availability will help you fail in even more spectacular ways than ever!
  2. 7 Career Pitfalls that High Availability Systems Will Not Help a SysAdmin Avoid
  3. 7 Things that High Availability is Not
  4. The One Way That High Availability Will Help You
  5. High Availability or High Recoverability?

Is there anything left to address in the series? Does the high availability horse finally deserve a proper state burial? Is the concept of high recoverability a boon or a bane? Let me know in the comments below if you think anything has been improperly addressed. Even better, let me know if you have what you would consider a “highly recoverable” environment. If so, I’d be thrilled to interview you or have you write about the topic.



18AUG
0
Tweet

Check out the Talentopoly Podcast!

Posted in: SysAdmin
  |  by: Wesley David

An Introduction to Talentopoly

In case you haven’t heard of it, there is a new community dedicated to the IT trifecta: Developers, Administrators and Designers. If you don’t think designers are within the realm of IT, then you probably don’t work much with the interwebs. The community is Talentopoly.com and it has a few key points:

  • Share content with others by posting interesting links.
  • Ask burning questions about whatever IT topic is on your mind.
  • Get noticed by others by posting your resume.

The general goal of involvement at Talentopoly is to mingle and learn. It has a rather cozy atmosphere and membership is invite only as of this post. Incidentally, if you want an invite just send me your email address via my contact form.

The Talentopoly Podcast

Just recently Jared Brown, the founder of Talentopoly, along with Developer / Designer Super Hero Brandon Corbin and Designer par excellence Stephen Dixon, have started the Talentopoly Podcast. In Jared’s own words:

I’ve teamed up with @BrandonCorbin (developer & designer – wearer of many hats) and @StephenMDixon (designer extraordinaire). The three of us will be recording a 30 – 60 minute weekly podcast. We’ll be discussing some of the best programming, design, and IT related links posted on Talentopoly.com every week.

The show is all about having fun while talking tech so we’ll be drinking a few beers, wine, or whatever other refreshing beverages we can find.

As of this post, there have been two podcasts produced, but iTunes hasn’t published it yet. They’re working on that. Here’s what was talked about in the first two episodes:

Episode 1 – Adobe Loves to Bankrupt Us All

We Talk About

  • iCloud
  • Laravel – A Clean & Classy PHP Framework
  • MacOS X is an Unsuitable Platform for Web Development
  • HTML5 tools, Animation tools – Adobe Edge Preview from Adobe Labs
  • Natural Language Processing with Node JS
  • Injecting Personality in Your Web Designs

Episode 2 – Please Use WebKit Microsoft

We Talk About

  • Apple releases tool to create external Lion recovery drives
  • Apple to Lodsys: you’ll have to go through us to sue iOS devs
  • CSS – text-overflow property is interesting …
  • There Will Be No Files In The Cloud
  • How I Explained MapReduce to My Wife
  • Five Lessons From a Year of Tablet UX Research
  • The Node Beginner Book » A comprehensive Node.js tutorial
  • MongoDB GUI administration tool for PHP, built on Vork
  • Project Management Software, CRM, Sales, Intranet
  • Whitespace – Why is it important?

Head on over to podcast.Talentopoly.com and give it a listen! Who knows? Perhaps your Talentopoly post could be the next big discussion.



15AUG
0
Tweet

Solving the Error “Unexpected Inconsistency: Run fsck Manually” on a Linux Machine

Posted in: SysAdmin
  |  by: Wesley David
Tags: Linux

My Problem:

Booting my Fedora 14 laptop after a clean shutdown resulted in the following boot-time error message:

/dev/mapper/vg_fedora1530-lv-home: UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY (i.e., without -a or -p options)

My Solution:

Boot into a Linux Live CD, unmount all affected partitions (assuming they were automounted) and perform an e2fsck -f. In the case of wanting to unmount all partitions on your sda disk:

umount /dev/sda*
fsck /dev/sda1 -f

The -f switch forces the checking of the filesystem even if nothing appears to be wrong. Hey, you can’t be too careful. Optionally, you can add the the -p or -y options. From the e2fsck man page:

-p Automatically repair (“preen”) the file system. This option will cause e2fsck to automatically fix any filesystem problems that can be safely fixed without human intervention. If e2fsck discovers a problem which may require the system administrator to take additional corrective action, e2fsck will print a description of the problem and then exit with the value 4 logically or’ed into the exit code. (See the EXIT CODE section.) This option is normally used by the system’s boot scripts. It may not be specified at the same time as the -n or -y options.

-y Assume an answer of `yes’ to all questions; allows e2fsck to be used non-interactively. This option may not be specified at the same time as the -n or -p options.

The Long Story:

Booting up my laptop for the morning, I walked away to grab some breakfast. When I came back, I noticed that it was not at the customary Fedora 14 login screen. Instead, it was a shell prompt blinking just undernearth an ominous red “FAILED” warning. Something was wrong with one of my filesystems.

/dev/mapper/vg_fedora1530-lv-home: UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY (i.e., without -a or -p options)

Running fsck manually basically means that you have to accept each and every possible change to the filesystem that fsck recommends. The -a option is the same as the -p option and is only kept around for backwards compatibility. The -p option fixes only those things that are considered safe enough to fix without human intervention. I’m not sure what logic is set to determine what needs human intervention, so I’d love to hear from someone that knows. The -y option automatically selects “yes” to any and all requests for intervention from fsck.

The error above wants me to manually intervene for every possible error. I thought about it for a minute. I know virtually nothing about the grit and grime of a file system so won’t know what I should and should not change.

You can run fsck -n on mounted filesystems as it does not perform any writes. It basically opens the FS as read only. I did that and saw an avalanche of errors tumble down my screen. It ended with the ominously worded warning:

Error while iterating over blocks in inode 11027197: Illegal triply indirect block found
e2fsck aborted

I rebooted into a Fedora live CD, (the same CD that I used to install Fedora about six months ago), so that I could operate on the unmounted filesystem. I ran fsck.ext4 (which is really just e2fsck – more on that whole fsck mess in a future post) on my lv_home partition with the -f flag to force the check even if the filesystem looked fine. I did not use -p or -y, even though I wanted to for time’s sake. I knew this check would take a quite a few minutes.

I kicked off the fsck operation and in fact there were so many errors in the filesystem and I had so little clue what fixes I should and should not be accepting I ended up wedging a pen between my monitor and the keyboard’s ‘y’ key. I’m ghetto like that. After minutes and minutes of errors whizzing by on the screen, the check was done. I nervously rebooted the machine and chose to boot from the troubled partition. Happily, everything worked. I was greeted by my old familiar login screen and all seemed well. My lost+found folder was totally empty.

But why was my filesystem corrupted in the first place? I have no idea. It hasn’t been hard rebooted. There have been no power issues. Perhaps the physical drive is going. I’ll be checking SMART data sometime soon.

Let me know how you handle ext* corruption issues. Any way that you preempt corruption? Any way that you handle it in an automated fashion?



10AUG
5
Tweet

Don’t Miss The ServerFault Scalability Conference This October!

Posted in: SysAdmin
  |  by: Wesley David
Tags: ServerFault

Just in case you’ve missed my many references to it, I have been struggling with an addiction. To a website. A website that has badges. No, it’s not facebook. It’s ServerFault.com.

Like any true addict, I defend my addiction. However, unlike most addicts, my addiction truly is productive! Also, unlike addicts, I CAN QUIT ANYTIME I WANT!! What’s so productive about the site is that I get to rub elbows with people who are way, way smarter than me. I get to watch them as they engage Deep Thought Mode over particularly snarled problems… as well as occasionally smacking their foreheads at some choice examples of silliness. People like Chopper3, JoeQwerty, Warner, MrDenny and of course Evan Anderson have some stellar technological skills and they amazingly share their expertise with others on ServerFault… for free.

As great of a resource as ServerFault is, every once in a while it’s productive to actually venture out into meatspace and talk with your colleagues face to face. There are plenty of great conferences already in existence for SysAdmins to attend, however ServerFault has teamed with the High Scalability blog and created a new one called the “High Scalability Conference“.

From the website:

Scalability, brought to you by Server Fault and High Scalability, is a one-day educational conference this October 14, 2011 in San Francisco, CA. Attend and learn how to create scalable systems.

Don’t miss these awesome sessions!
Ganeti Virtualzation Management: Improving the Utilization of Your Hardware and Your Time
Speaker:
Tom Limoncelli, Google

Three Scaling Directions, A Panel Discussion
Speakers:
Markus Frind, Plenty of Fish
Ben Kochie, Google
Jeff Atwood, Stack Exchange

Scalability in Network Visibility – Is your Network Too Complex to Know What’s Going On?
Speaker:
Loris Degioanni, Riverbed

The event is taking place for a single day on October 14, 2011 at The Concourse Exhibition Center in San Francisco and costs $299USD. However, if you use the promotional code ‘nubbyadmin’ you’ll recieve $100 off! Thanks Kyle!

The Scalability Conference is part of the San Francisco leg of the larger series of worldwide StackOverflow DevDays 2011 Conferences. In fact, you can attend both days of the SanFrancisco DevDays conference and the single day Scalability conference with a SuperPass for $748USD.

The presenters are top notch. The venue is class. The organizers are choice. The price is amazing. I hope you’ll sign up. While you may not get to meet the ServerFault members that I mentioned above (or maybe you will!) I’m sure you’ll meet someone that you’ve seen on ServerFault. If you do make it, make sure to take plenty of pictures! Especially if you see a Paddington Bear ride… (shameless ServerFault inside joke).



8AUG
0
Tweet

Announcing a New SysAdmin Tool Repository

Posted in: SysAdmin
  |  by: Wesley David
Tags: scripting git

How many of us have toiled away on a home-brewed script to solve some seemingly esoteric problem? Or perhaps it’s a completely mundane and repetitive task that we’re automating. After pounding out line after line you are left with one of two thoughts (or both, as is my case):

  1. Surely someone has had this problem and scripted the solution before me! I wish I had their script.
  2. Surely someone after me will wish they had the script that I’ve just now created. I wish I could distribute it easily.

For example, some of my most recent scripting projects include designing a RoboCopy log file parser (not quite like anything I’ve seen from anyone else so far), a script that runs as a scheduled task / cron job to make recurring SpiceWorks tasks, as well as a completely automated WordPress deployment script. The best I can do to share them is to blog about them and hope that there are good enough keywords in the post to make the script return when someone makes an educated search engine query. Until now.

George Beech, ServerFault Valued Associate #00002, has created a project on github called SysAdminTools. You can read more about the project at his blog Broken Haze. Here’s an excerpt from his announcement:

So in the spirit of sysadmin day, I’m announcing a new open source project that I’ve put up on github today. I’m calling the project “SysAdminTools” my vision is that it is a place where we can put all of those tools that we create out there and help our fellow admins by stopping the constant re-inventing of the wheel at all of the different places out there.

If you sign up to GitHub you can start contributing to the growing list of SysAdmin scripts. All scripts in the repository are released under the Apache 2.0 license (I was hoping for the two or three-clause BSD license, but alas =) ).

I look forward to seeing what the SysAdminTools GitHub Project turns into. I finally have a place where I can place utility scripts that seem like they should have a larger audience than just me. Heck, even scripts that seem to only have usefulness to one person or organization will probably be more useful to others than what the author realizes. Join up and bless the community with your finest (or not so fine) scripts. Don’t worry, we’ll all pitch in to make them better.

Please, share this project with as many people as you can. Together I’m sure we can make this project take off and hopefully save some serious man-hours of collective script writing.



1AUG
0
Tweet

Advertisements

Announcing a New SysAdmin Tool Repository
Announcing a New SysAdmin Tool Repository
Announcing a New SysAdmin Tool Repository
Announcing a New SysAdmin Tool Repository

Follow This Blog

Want to have these posts emailed to you? Enter your email address here. Google Feedburner takes care of the rest!

Delivered by FeedBurner

About Me!

Contact Me!

The Nubby Archives

  • [+] 2012 (43)
    • May (7)
    • Apr (11)
    • Mar (10)
    • Feb (8)
    • Jan (7)
  • [-] 2011 (73)
    • Dec (4)
    • Nov (7)
    • Oct (6)
    • Sep (11)
    • Aug (9)
    • Jul (6)
    • Jun (3)
    • May (1)
    • Apr (8)
    • Mar (5)
    • Feb (5)
    • Jan (8)
  • [+] 2010 (71)
    • Dec (6)
    • Nov (3)
    • Oct (4)
    • Sep (14)
    • Aug (2)
    • Jul (4)
    • Jun (14)
    • May (19)
    • Apr (5)

Be Social!

Circle me!





profile for WesleyDavid on Stack Exchange, a network of free, community-driven Q&A sites

Copyright © 2011
Top