VCSA does not boot due to file system errors

by Dec 7, 2020vCenter

Due to a power failure of the storage where the vCenter Server Appliance resides, the VCSA does not boot. Connecting to the console shows the following output:

Failed to check /dev/log_vg/log
vCenter console File System Check error

When you see this screen, none of the services are started as the appliance does not fully start. This implies that there is no means of connecting to the H5 client nor the VAMI interface on port 5480.

Why does the VCSA not boot and where do I start troubleshooting?

There are two important things to mentioned on the screenshot above, this is where we start:

  • Failed to start File System Check on /dev/log_vg/log
  • journalctl -xb

First we take a look at ‘journalctl -xb’. To do this we need to supply the root password and launch the BASH:

launch BASH
Emergency mode bash shell

journalctl -xb

Now that have shell access we can take a look at ‘journalctl -xb’:

journalctl -xb
ShellSession

Type G to go to the bottom of the log file:

G
ShellSession

Work upwards, the most relevant logs will be at the bottom. For the sake of this blog post, I have type -S. This will turn on/off word wrap, in this case, I turned on word wrap.

Going up a little I find these entries:

There is a problem with a certain inode and File System Check (fsck) should be run manually.

journalctl showing more info about the failed volume
journalctl -xb

File System Check

Let’s see how we can do that. Is it as simple as running:

fsck /dev/mapper/log_vg-log
ShellSession

It seems like it. Running the above command finds some errors and suggests to repair. I confirmed.

Other volumes

Let’s check the other logical volumes (lvm). First we will run ‘lsblk’ to take a look at the drive layout:

lsblk
ShellSession

With lsblk we take a look at the drive layout
VCSA drive layout

Remark: When we take a look at the type, we see the disks, eg. sda, sdb, etc… The difference between sda and the rest is that sda is partitioned with standard partitions and on the rest the disks an LVM has been created.

I checked all other volumes and found none of them were having issues.

Reboot

To reboot while you are in maintenance boot:

reboot --force
ShellSession

After the reboot, I could connect to the H5 client and clear the relevant errors.

Remark

This blog post is very similar to this one here. Although they are very much alike, the issues in the older blog post were on a standard partition on a VCSA 6.5 whereas the issues described and addressed in this post are on a VCSA 7.0 LVM physical volume.