Reconfigure diagnostic partition using Get-EsxCli -V2

The following powershell snippet is going to unconfigure the diagnostic coredump partition using the esxcli version 2 cmdlet. The second part will reconfigure the diagnostic partition with the ‘smart’ option so that an accessible partition is chosen.

If you want to configure a new diagnostic partition the you will find the necessary information in the following VMware knowledge base article: Configuring a diagnostic coredump partition on an ESXi 5.x/6.x host (2004299). There will be additional steps to supply the partition id.

The following powershell snippet is going to unconfigure the diagnostic coredump partition using the esxcli version 2 cmdlet. The second part will reconfigure the diagnostic partition with the ‘smart’ option so that an accessible partition is chosen.

If you want to configure a new diagnostic partition the you will find the necessary information in the following VMware knowledge base article: Configuring a diagnostic coredump partition on an ESXi 5.x/6.x host (2004299). There will be additional steps to supply the partition id.

$srv = Get-VMHost ESXiHost
$esxcli = Get-EsxCli -VMHost $srv -V2
$arg = $esxcli.system.coredump.partition.set.CreateArgs()
$arg.unconfigure = "true"
$esxcli.system.coredump.partition.set.Invoke($arg)
$arg = $esxcli.system.coredump.partition.set.CreateArgs()
$arg.unconfigure = "false"
$arg.enable = "true"
$arg.smart = "true"
$esxcli.system.coredump.partition.set.Invoke($arg)

First we connect to the esxi host directly and insert the connection details in the variable $srv:

$srv = Get-VMHost ESXiHost

Then we create a esxcli object $esxcli using the variable $srv we created previously:

$esxcli = Get-EsxCli -VMHost $srv -V2

Now we create a variable $arg to store the arguments we will provide later:

$arg = $esxcli.system.coredump.partition.set.CreateArgs()

Setting the $arg property ‘unconfigure’ to true will deactivate the diagnostic partition:

$arg.unconfigure = "true

The invoke command will invoke the command remotely on the esxi host. After execution the diagnostic partition is deactivated:

$esxcli.system.coredump.partition.set.Invoke($arg)

The second part starts with creating a new set of arguments:

$arg = $esxcli.system.coredump.partition.set.CreateArgs()

Reactivate the coredump, because we deactivated it before:

$arg.unconfigure = "false"

Enable the coredump partition:

$arg.enable = "true"

The ‘smart’ property will try to use an accessible partition:

$arg.smart = "true"

The last argument will configure the diagnostic partition using the supplied parameters:

$esxcli.system.coredump.partition.set.Invoke($arg)

Configuring Tesla M60 cards for NVIDIA GRID vGPU

Configuring Tesla M60 cards for NVIDIA GRID vGPU

There are a couple of steps which need to be taken to configure the Tesla M60 cards with NVIDIA GRID VGPU in a vSphere / Horizon environment. I have listed them here quick and dirty. They are an extract of the NVIDIA Virtual GPU Software User Guide.

  • On the host(s):
    • Install the vib
      • esxcli software vib install -v directory/NVIDIA-vGPUVMware_ESXi_6.0_Host_Driver_390.72-1OEM.600.0.0.2159203.vib
    • Reboot the host(s)
    • Check if the module is loaded
      • vmkload_mod -l | grep nvidia
    • Run the nvidia-smi command to verify the correct communictation with the device
    • Configuring Suspend and Resume for VMware vSphere
      • esxcli system module parameters set -m nvidia -p “NVreg_RegistryDwords=RMEnableVgpuMigration=1”
    • Reboot the host
    • Confirm that suspend and resume is configured
      • dmesg | grep NVRM
    • Check that the default graphics type is set to shared direct
    • If the graphics type were not set to shared direct, execute the following commands to stop and start the xorg and nv-hostengine services
      • /etc/init.d/xorg stop
      • nv-hostengine -t
      • nv-hostengine -d
      • /etc/init.d/xorg start
  • On the VM / Parent VM:
    • Configure the VM, beware that once the vGPU is configured that the console of the VM will not be visible/accessible through the vSphere Client. An alternate access method should already be foreseen
    • Edit the VM configuration to add a shared pci device, verify that NVIDIA GRID vGPU is selected
    • Choose the vGPU profile
      more info on the profiles can be found here under section ‘1.4.1 Virtual GPU Types’: https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html
    • Reserve all guest memory
  • On the Horizon pool
    • Configure the pool to use the NVIDIA GRID vGPU as 3D Renderer

vShield Endpoint SVM status vCenter alarm

vShield Endpoint SVM status vCenter alarm

vCenter is showing an alarm on the TrendMicro Deep Security Virtual Appliance (DSVA): ‘vShield Endpoint SVM status

Checking vShield for errors: The DSVA VA console window shows: (as to where it should show a red/grey screen)

Let’s go for some log file analysis
To get a login prompt: Alt + F2
Login with user dsva and password dsva (this is the default)
less /var/log/messages (why less is more: you get almost all the vi commands)
G to go to the last line

For some reason the ovf file is not like it is expected. The appliance is not able to set some ovf settings, in this case the network interfaces. q (to exit the log file display) sudo –s (to gain root privileges) enter the dsva user password  

test

    (to create the dsva-ovf.env file, if necessary delete the file first) reboot (to reboot the appliance, once rebooted give it 5 minutes and the alarm should clear automatically)

vCenter is showing an alarm on the TrendMicro Deep Security Virtual Appliance (DSVA): ‘vShield Endpoint SVM status Checking vShield for errors: The DSVA VA console window shows: (as to where it should show a red/grey screen) Let’s go for some log file analysis To get a login prompt: Alt + F2 Login with user dsva and password dsva (this is the default) less /var/log/messages (why less is more: you get almost all the vi commands) G to go to the last line For some reason the ovf file is not like it is expected. The appliance is not able to set some ovf settings, in this case the network interfaces. q (to exit the log file display) sudo –s (to gain root privileges) enter the dsva user password  

test

    (to create the dsva-ovf.env file, if necessary delete the file first) reboot (to reboot the appliance, once rebooted give it 5 minutes and the alarm should clear automatically)

[code language=”css”] your code here [/code]

Failed to clear bootbank content /altbootbank: [Errno 9] Bad file descriptor: ‘/altbootbank/state.xxxxxxx’

In a VSAN project the VMware Compatibility Guide mentioned a different driver version for the raid controller than the one that was installed. So I tried to install a driver update for the raid controller through the CLI. This did not work out as expected because the /altbootbank was in a corrupted state. There were two ways to go ahead, either reinstall from scratch or try to rebuild the /altbootbank from the /bootbank contents. This was not a production server so I had the freedom to apply a more experimental approach and therefor I chose the not supported, not recommended approach to rebuild the /altbootbank from the /bootbank contents.

I ran the following command to install the driver:

esxcli software vib install -d /vmfs/volumes/datastore/patch.zip

I got the following error message:

[InstallationError]

Failed to clear bootbank content /altbootbank: [Errno 9] Bad file descriptor: '/altbootbank/state.xxxxxxx'

Please refer to the log file for more details.

I found the following two links describing the issue.

https://communities.vmware.com/thread/413441?start=0&tstart=0
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2033564

The vmware KB is going through the steps to solve this, which in this case didn’t. The better solution is to repair or reinstall but this is a time consuming task.

The steps in the KB didn’t solve it, so I tried to delete it with:

rm /altbootbank/state.5824665/
rm –rf /altbootbank/state.5824665/

The ghost file/directory would not delete. The first command returned ‘This is not a file’, the second ‘This is not a directory’.
I repeated the same commands after a reboot with the same results. As the server was still booting well I knew the /bootbank was still ok. I wanted to replace the /altbootbank with the contents of the /bootbank partition.

THE FOLLOWING IS NOT RECOMMENDED NOR SUPPORTED! DO NOT EXECUTE ON A PRODUCTION ENVIRONMENT !

Identity the naaID and partition number of the /altbootbank:

vmkfstools -Ph /altbootbank

Scratch the partition through recreating the file system:

vmkfstools -C vfat /dev/disks/naaID:partitionNumber

Remove the /altbootbank folder:

rm –rf /altbootbank

Create a symlink to the newly created vFat volume with /altbootbank:

ln –s /vmfs/volumes/volumeGUID /altbootbank

Copy all the contents from /bootbank to /altbootbank:

cp /bootbank/* /altbootbank

Change the bootstate=3 in /altbootbank/boot.cfg

vi /altbootbank/boot.cfg

Run /sbin/autobackup.sh script to update the changes

/sbin/autobackup.sh