My Problem:

Booting my Fedora 14 laptop after a clean shutdown resulted in the following boot-time error message:

/dev/mapper/vg_fedora1530-lv-home: UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY (i.e., without -a or -p options)

My Solution:

Boot into a Linux Live CD, unmount all affected partitions (assuming they were automounted) and perform an e2fsck -f. In the case of wanting to unmount all partitions on your sda disk:

umount /dev/sda*
fsck /dev/sda1 -f

The -f switch forces the checking of the filesystem even if nothing appears to be wrong. Hey, you can’t be too careful. Optionally, you can add the the -p or -y options. From the e2fsck man page:

-p Automatically repair (“preen”) the file system. This option will cause e2fsck to automatically fix any filesystem problems that can be safely fixed without human intervention. If e2fsck discovers a problem which may require the system administrator to take additional corrective action, e2fsck will print a description of the problem and then exit with the value 4 logically or’ed into the exit code. (See the EXIT CODE section.) This option is normally used by the system’s boot scripts. It may not be specified at the same time as the -n or -y options.

-y Assume an answer of `yes’ to all questions; allows e2fsck to be used non-interactively. This option may not be specified at the same time as the -n or -p options.

The Long Story:

Booting up my laptop for the morning, I walked away to grab some breakfast. When I came back, I noticed that it was not at the customary Fedora 14 login screen. Instead, it was a shell prompt blinking just undernearth an ominous red “FAILED” warning. Something was wrong with one of my filesystems.

/dev/mapper/vg_fedora1530-lv-home: UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY (i.e., without -a or -p options)

Running fsck manually basically means that you have to accept each and every possible change to the filesystem that fsck recommends. The -a option is the same as the -p option and is only kept around for backwards compatibility. The -p option fixes only those things that are considered safe enough to fix without human intervention. I’m not sure what logic is set to determine what needs human intervention, so I’d love to hear from someone that knows. The -y option automatically selects “yes” to any and all requests for intervention from fsck.

The error above wants me to manually intervene for every possible error. I thought about it for a minute. I know virtually nothing about the grit and grime of a file system so won’t know what I should and should not change.

You can run fsck -n on mounted filesystems as it does not perform any writes. It basically opens the FS as read only. I did that and saw an avalanche of errors tumble down my screen. It ended with the ominously worded warning:

Error while iterating over blocks in inode 11027197: Illegal triply indirect block found
e2fsck aborted

I rebooted into a Fedora live CD, (the same CD that I used to install Fedora about six months ago), so that I could operate on the unmounted filesystem. I ran fsck.ext4 (which is really just e2fsck – more on that whole fsck mess in a future post) on my lv_home partition with the -f flag to force the check even if the filesystem looked fine. I did not use -p or -y, even though I wanted to for time’s sake. I knew this check would take a quite a few minutes.

I kicked off the fsck operation and in fact there were so many errors in the filesystem and I had so little clue what fixes I should and should not be accepting I ended up wedging a pen between my monitor and the keyboard’s ‘y’ key. I’m ghetto like that. After minutes and minutes of errors whizzing by on the screen, the check was done. I nervously rebooted the machine and chose to boot from the troubled partition. Happily, everything worked. I was greeted by my old familiar login screen and all seemed well. My lost+found folder was totally empty.

But why was my filesystem corrupted in the first place? I have no idea. It hasn’t been hard rebooted. There have been no power issues. Perhaps the physical drive is going. I’ll be checking SMART data sometime soon.

Let me know how you handle ext* corruption issues. Any way that you preempt corruption? Any way that you handle it in an automated fashion?