Failed to clear bootbank content /altbootbank: [Errno 9] Bad file descriptor: ‘/altbootbank/state.xxxxxxx’

In a VSAN project the VMware Compatibility Guide mentioned a different driver version for the raid controller than the one that was installed. So I tried to install a driver update for the raid controller through the CLI. This did not work out as expected because the /altbootbank was in a corrupted state. There were two ways to go ahead, either reinstall from scratch or try to rebuild the /altbootbank from the /bootbank contents. This was not a production server so I had the freedom to apply a more experimental approach and therefor I chose the not supported, not recommended approach to rebuild the /altbootbank from the /bootbank contents.

I ran the following command to install the driver:

esxcli software vib install -d /vmfs/volumes/datastore/patch.zip

I got the following error message:

[InstallationError]

Failed to clear bootbank content /altbootbank: [Errno 9] Bad file descriptor: '/altbootbank/state.xxxxxxx'

Please refer to the log file for more details.

I found the following two links describing the issue.

https://communities.vmware.com/thread/413441?start=0&tstart=0
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2033564

The vmware KB is going through the steps to solve this, which in this case didn’t. The better solution is to repair or reinstall but this is a time consuming task.

The steps in the KB didn’t solve it, so I tried to delete it with:

rm /altbootbank/state.5824665/
rm –rf /altbootbank/state.5824665/

The ghost file/directory would not delete. The first command returned ‘This is not a file’, the second ‘This is not a directory’.
I repeated the same commands after a reboot with the same results. As the server was still booting well I knew the /bootbank was still ok. I wanted to replace the /altbootbank with the contents of the /bootbank partition.

THE FOLLOWING IS NOT RECOMMENDED NOR SUPPORTED! DO NOT EXECUTE ON A PRODUCTION ENVIRONMENT !

Identity the naaID and partition number of the /altbootbank:

vmkfstools -Ph /altbootbank

Scratch the partition through recreating the file system:

vmkfstools -C vfat /dev/disks/naaID:partitionNumber

Remove the /altbootbank folder:

rm –rf /altbootbank

Create a symlink to the newly created vFat volume with /altbootbank:

ln –s /vmfs/volumes/volumeGUID /altbootbank

Copy all the contents from /bootbank to /altbootbank:

cp /bootbank/* /altbootbank

Change the bootstate=3 in /altbootbank/boot.cfg

vi /altbootbank/boot.cfg

Run /sbin/autobackup.sh script to update the changes

/sbin/autobackup.sh

 

Reconfigure diagnostic partition

Reconfigure diagnostic partition with PowerCLI using Get-EsxCli

The following Get-EsxCli command will unconfigure your diagnostic partition and reconfigure with smart selection. This was needed because the install partition uuid had changed due to an option in the NetApp system while doing system testing.

$server_list = Get-VMhost

Foreach ($srv in $server_list)
{
 $esxcli = Get-EsxCli -VMhost $srv
 #$esxcli.system.coredump.file.add($null,"VMFS_log_partition","$srv.name",$null)
 $esxcli.system.coredump.partition.set($null,$null,$null,$true)
 $esxcli.system.coredump.partition.set($true,$null,$true,$null)
 $esxcli.system.coredump.partition.get()
}

Many thanks to http://www.virten.net/2014/02/howto-use-esxcli-in-powercli/