My first deploy with VMware Cloud Foundation

This entry is part 1 of 1 in the series VCF Lab Constructor
  • My first deploy with VMware Cloud Foundation

I have been working on a script to deploy environments on a regular basis in my homelab. While I have made great progress I have not been able to get this completed due to the lack of time. It did up my powershell script writing skills.

A while ago I followed a webinar about VMware Cloud Foundation Lab Constructor (VLC in short). This will deploy a VCF environment in a decent amount of time. With little effort I have been able to get this up and running multiple times. There are some pitfalls I ran into. My goal is to get to learn more about VCF, NSX-T and K8s all in a VMware Validated Design (VVD) setup.

You can get access too by completing the registration form at tiny.cc/getVLC.

The following files are included in the download:

  • Example_DNS_Entries.txt
  • VCF Lab Constructor Install Guide 39.pdf
  • VLCGui.ps1
  • add_3_hosts.json
  • add_3_hosts_bulk_commission VSAN.json
  • default-vcf-ems.json
  • default_mgmt_hosthw.json
  • maradns-2.0.16-1.x86_64.rpm
  • mkisofs.exe
  • plink.exe

As I already have a DNS infrastructure in place I used ‘Example_DNS_Entries.txt‘ as a reference to create all the necessary DNS entries.

Read the documentation pdf FIRST. It will give you a good insight in what will be set up, won’t be set up and how everything will be set up. I’m not planning to repeat info that is included in the documentation. The only thing that I have copied from this pdf is the disclaimer because I feel it is important:

Below I have included the various configuration files and split them to show the different parts and also show where I deferred from the default. There are the configuration files that the VLC script will use:

  • Management domain:
    • default-vcf-ems.json → changed all ip addresses, gateways, hostnames, networks and licenses
    • default_mgmt_hosthw.json → changed the amount of CPUs (8 → 12), the amount of RAM (32 and 64 → 80) and the disk sizes(50,150 and 175 → 150)
  • Workload domain
    • add_3_hosts.json → changed the hostname, management IP and IP gateway

To deploy VCF and be able to deploy NSX-T you will need a good amount of resources. The mimimum of host resources to be able to deploy NSX-T is 12vCPUs (There is a workaround to lower the vCPU requirements for NSX-T) and 80GB of RAM due to the NSX-T requirements.

The configuration files

The first file is the ‘default_mgmt_hosthw.json’. This file describes the specs for the (virtual) hardware for the management domain hosts:

default management host hardware json

The second file is the ‘default-vcf-ems.json’. This file describes the configuration for all software components for the management domain:

default VCF EMS JSON

The last configuration file is ‘add_3_hosts.json’. This configuration file is optional and can be used to prepare three extra hosts for the first workload domain:

3 additional hosts json

Where did I change the defaults

There are some settings that I changed from the defaults aside from changing the names and network settings:

  • in the ‘default_mgmt_hosthw.json’ I have changed the CPU to 12 to be able to deploy NSX-T
  • in the ‘default_mgmt_hosthw.json’ I have changed the RAM 80 to be able to deploy NSX-T

How do we start

If you are meeting the prerequirements it is fairly simple. Fire up the ‘VLCGui.ps1’. This will present the following gui which will give the ability to supply all the necessary information and to connect to your physical environment. It speaks for itself, just make sure the Cluster, Network Name and Datastore field are higlighted blue like the following.

What’s next

I hope to expand this inital post with a couple of follow-up posts. These are the topics that I’m currently thinking about:

  • NSX-T
  • importing the upgrade and deployment bundles
  • K8s

… and maybe more …

Additional info

Support:

Slack VLC Support channel – http://tiny.cc/getVLCSlack

Some blogs:

https://blog.bertello.org/2019/08/building-nested-vcf-using-vcf-lab-constructor-vlc/ and https://blog.bertello.org/category/automation/

https://my-sddc.net/

https://vinfrastructure.it/2019/10/vmware-cloud-foundation-3-9/

https://blogs.vmware.com/cloud-foundation/

Cross vCenter vMotion Utility

Whilst upgrading the homelab I also decided to rebuild from scratch. There were some challenges to overcome because I have running VM’s I don’t want to shut while migrating.

My current homelab setup and the go to setup is documented here (work in progress). Basically it comes down to:

  • Original setup: three hosts backed with iSCSI storage for running the VM’s
  • Temporary setup:
    • New vCenter with two of the three hosts configured for vSAN with connection to the iSCSI datastores
    • Old vCenter with one remaining host running all of the VM’s
  • Destination setup: new vCenter with vSAN datastore

To migrate the virtual machines from the old environment (from the last remaining host to the two new hosts) I decided to take a look at the ‘Cross vCenter vMotion Utility‘. There is not a lot of documentation available at first sight but it is straightforward to set up and configure. Although I did find some things that are worth noting.

Step 1 : Running the jar

To start the Cross vCenter vMotion Utility one must run a jar file: ‘java -jar xvm-2.6.jar’.

I am running linux (Pop!_OS 18.04) as my OS. I have java version 8 and 11 installed with version 11 as default. Version 11 is not listed on the fling site as supported (Java Runtime Environment 1.8-10: See requirements). Running with version 11 (sudo java -jar xvm-2.6.jar) starts the local website on port 8080 (http://localhost:8080) but does not report back on the CLI.

Under the assumption that the java application started and failed rightaway, I decided to run it on my windows box which has Java Runtime environment 8 installed. The last line of feedback ‘Initialized controller with empty state’ was the same as on my linux machine. Navigating to localhost:8080 showed the Cross vCenter vMotion Utility web interface. I could now configure the application and run migrations.

It is only later when I closed the running instance on my linux box and restarting it that it showed me output on the CLI that the application started succesfully.

ps -df | grep -i java
kill -HUP 9159

Output after restart:

Step 2 : Configuration

  • Register connections
    1. Source vCenter
    2. Destination vCenter

Step 3 : Migration

  • Add migrations
    1. Source Site: source vCenter
    2. Target Site: destination vCenter
    3. Source Datacenter
    4. Virtual Machine(s): Select one or more virtual machines
    5. Placement Target: Cluster or Host
    6. Target Datastore
    7. Network Mapping(s): the utility will detect the source networks for all selected virtual machines and display a selection field for the target network

Issues

Storage vMotion?

Storage vMotion does not seem to be supported. I tried to svMotion my machines from their iSCSI based datastores to the newly created vSAN datastore but it failed.

Target Datastore: Shared datastore (same as source)

Choosing ‘Shared datastore (same as source)’ as Target Datastore fails and throws the following error:

I added the destination host and tried again but it also failed with several issues:

  • destination networks were not listed, only a subset were – although all were added to the distributed vSwitch
  • matching datastore was not found on the destination host

I could migrate to the new environment but had to select a destination datastore. This posed not much of a problem in my environment because the end goal was to get the virtual machine on the vSAN datastore.

Now after migrating most of the virtual machines, only two types of virtual machines were leftit felt like I could take a step back if needed. The a to migrate, the vCenter VM’s and the firewall VM’s. The old vCenter is not needed anymore, the new vCenter and the firewall VM’s are and once those are migrated I can go break down the last part of the old setup. The last host will be reset to default settings via the DCUI after which it can be added to the vSAN cluster and I can make the vSAN cluster setup complete. A tmp_vSAN_policy with no redundancy is not the way you (or me) want to run your environment, even if it is a lab environment.

Conclusion

I could not migrate from the old environment to the new environment while also doing a Storage vMotion, I did needed to go in steps.

Nevertheless I’m happy to have used the Cross vCenter vMotion Utility. It did save me a lot of work, required little setup and configuration. I didn’t need to change anything to the setup of my old nor my new environment.

Horizon Client Installer Failed

Adding the following symlinks made the failure message go away. I’m wondering though if the packages get updated in the repositories whether this will break the Multimedia Redirection (MMR). I guess I’ll notice some day.

Update (2019/12/20): Today I updated from version 5.2 to 5.3 and ran into the same issue again. I noticed that there are symbolic links present

Reconfigure diagnostic partition using Get-EsxCli -V2

Reconfigure diagnostic partition using Get-EsxCli -V2

The following powershell snippet is going to unconfigure the diagnostic coredump partition using the esxcli version 2 cmdlet. The second part will reconfigure the diagnostic partition with the ‘smart’ option so that an accessible partition is chosen.

If you want to configure a new diagnostic partition the you will find the necessary information in the following VMware knowledge base article: Configuring a diagnostic coredump partition on an ESXi 5.x/6.x host (2004299). There will be additional steps to supply the partition id.

The following powershell snippet is going to unconfigure the diagnostic coredump partition using the esxcli version 2 cmdlet. The second part will reconfigure the diagnostic partition with the ‘smart’ option so that an accessible partition is chosen.

If you want to configure a new diagnostic partition the you will find the necessary information in the following VMware knowledge base article: Configuring a diagnostic coredump partition on an ESXi 5.x/6.x host (2004299). There will be additional steps to supply the partition id.

$srv = Get-VMHost ESXiHost
$esxcli = Get-EsxCli -VMHost $srv -V2
$arg = $esxcli.system.coredump.partition.set.CreateArgs()
$arg.unconfigure = "true"
$esxcli.system.coredump.partition.set.Invoke($arg)
$arg = $esxcli.system.coredump.partition.set.CreateArgs()
$arg.unconfigure = "false"
$arg.enable = "true"
$arg.smart = "true"
$esxcli.system.coredump.partition.set.Invoke($arg)

First we connect to the esxi host directly and insert the connection details in the variable $srv:

$srv = Get-VMHost ESXiHost

Then we create a esxcli object $esxcli using the variable $srv we created previously:

$esxcli = Get-EsxCli -VMHost $srv -V2

Now we create a variable $arg to store the arguments we will provide later:

$arg = $esxcli.system.coredump.partition.set.CreateArgs()

Setting the $arg property ‘unconfigure’ to true will deactivate the diagnostic partition:

$arg.unconfigure = "true

The invoke command will invoke the command remotely on the esxi host. After execution the diagnostic partition is deactivated:

$esxcli.system.coredump.partition.set.Invoke($arg)

The second part starts with creating a new set of arguments:

$arg = $esxcli.system.coredump.partition.set.CreateArgs()

Reactivate the coredump, because we deactivated it before:

$arg.unconfigure = "false"

Enable the coredump partition:

$arg.enable = "true"

The ‘smart’ property will try to use an accessible partition:

$arg.smart = "true"

The last argument will configure the diagnostic partition using the supplied parameters:

$esxcli.system.coredump.partition.set.Invoke($arg)

Configuring Tesla M60 cards for NVIDIA GRID vGPU

Configuring Tesla M60 cards for NVIDIA GRID vGPU

There are a couple of steps which need to be taken to configure the Tesla M60 cards with NVIDIA GRID VGPU in a vSphere / Horizon environment. I have listed them here quick and dirty. They are an extract of the NVIDIA Virtual GPU Software User Guide.

  • On the host(s):
    • Install the vib
      • esxcli software vib install -v directory/NVIDIA-vGPUVMware_ESXi_6.0_Host_Driver_390.72-1OEM.600.0.0.2159203.vib
    • Reboot the host(s)
    • Check if the module is loaded
      • vmkload_mod -l | grep nvidia
    • Run the nvidia-smi command to verify the correct communictation with the device
    • Configuring Suspend and Resume for VMware vSphere
      • esxcli system module parameters set -m nvidia -p “NVreg_RegistryDwords=RMEnableVgpuMigration=1”
    • Reboot the host
    • Confirm that suspend and resume is configured
      • dmesg | grep NVRM
    • Check that the default graphics type is set to shared direct
    • If the graphics type were not set to shared direct, execute the following commands to stop and start the xorg and nv-hostengine services
      • /etc/init.d/xorg stop
      • nv-hostengine -t
      • nv-hostengine -d
      • /etc/init.d/xorg start
  • On the VM / Parent VM:
    • Configure the VM, beware that once the vGPU is configured that the console of the VM will not be visible/accessible through the vSphere Client. An alternate access method should already be foreseen
    • Edit the VM configuration to add a shared pci device, verify that NVIDIA GRID vGPU is selected
    • Choose the vGPU profile
      more info on the profiles can be found here under section ‘1.4.1 Virtual GPU Types’: https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html
    • Reserve all guest memory
  • On the Horizon pool
    • Configure the pool to use the NVIDIA GRID vGPU as 3D Renderer

Unsupported upgrade of VCSA 6.5 U2 to 6.7

Unsupported upgrade of VCSA 6.5 U2 to 6.7

We will upgrade the vCenter Server Appliance from 6.5 U2 to 6.7 though it is not supported. As this is not supported you will NOT want go ahead with this in a production environment. Maybe I will have regrets later on too … but this is my lab environment so the alternative is to redeploy a new VCSA.

I have applied the following knowledge base articles on the source VCSA

The first KB was applied because the installer is failing due to a lack of disk space on the source appliance. The installer gives the opportunity to supply a location on the source VCSA to export the necessary files that facilitate the upgrade.

The second KB was applied because the VMware Directory failed during the firstboot phase after the upgrade succeeded.

I downloaded the sources for VCSA 6.7.0 but had to go and download the sources for VCSA 6.7.0a. The VCSA 6.7.0 sources stalled at 5% on VMware Identity Management Service.

I also went to change the root password expiration to no and set the administrator@vsphere.local account password to only include alphabet characters.

The installer will also fail after the first phase if the VAMI port is not reachable, the first phase will finish succesfully though. I forgot to add an exception to my firewall. You can then continue the installer by going to the VAMI interface on port 5480.