How to request Let’s Encrypt certificates on the NSX Advanced Load Balancer

INTRO

Lately, I have been doing quite some work on VMware vSphere with Tanzu. A prerequisite to configure vSphere with Tanzu is a load balancer of some sort. Currently the following are supported, HAProxy, the NSX-T integrated load balancer or the NSX Advanced Load Balancer (ALB). (Support for the NSX ALB was added with the release of vSphere 7 U1.)

The endgoal of the setup is to host several websites in combination with a Horizon environment on a single IP. Because not all systems can handle Let’s Encrypt requests, eg UAG, I want one system that handles the certificate request and does the SSL offloading for the endpoints. So I was looking for a load balancer solution with Let’s Encrypt ability, The NSX Advanced Load Balancer (ALB) adds the ability to request Let’s Encrypt certificates through ControlScripts.

I already learned a lot on the NSX ALB and having some experience with other brands of load balancers certainly helped me to get up to speed quickly.

The goal of this post is to set up a standard Virtual Service (VS) and request a Let’s Encrypt certificate for that VS. You will see that it is quite easy.

Prequisites

I will not configure some necessary configuration settings. They are, however, required to successfully execute the steps below. I will assume the following prerequisites are in place.

The following post shows how to deploy the NSX Advanced Load Balancer and how to configure a ‘VMware vCenter/vSphere ESX’ cloud.

https://www.virtualizationhowto.com/2021/06/avi-load-balancer-vmware-standalone-install/

  1. The NSX ALB registered with a cloud. I use a ‘VMware vCenter/vSphere ESX’ cloud
  2. A public DNS entry for the Virtual Service. (Let’s Encrypt needs to be able to check your Virtual Service)
  3. Some way to get to the virtual service from the internet. I have setup a NAT rule on my firewall for this.
  4. Server Pool. (Needed to create the Virtual Service. It is obvious that the Virtual Service needs some endpoint to send the requests to.)
  5. Network config for VIP and SE. (Once you configure a ‘VMware vCenter/vSphere ESX’ cloud, you’ll have access to the networks known to vCenter. You will need to configure ‘Subnets’ and ‘IP Address Pools’ for the NSX ALB to use for the VSs.)
  6. IPAM/DNS Profile. (You need to add the Domain Names for the Virtual Services here.)

I will cover these in a later post but for now I added them as a prerequisite.

What does the ControlScript do?

The ControlScript generates a challenge token for the Let’s Encrypt servers to check the service. Secondly, it searches for a Virtual Service with an fqdn with the Common Name supplied on the certificate request. Once it finds that Virtual Service, it checks if it is listening on port 80. If not, it configures the Virtual Service to handle the request on port 80. Then it adds the challenge token to the Virtual Service. Finally, after a succesful certificate request the changes are cleared.

Download the ControlScript

Download the controlscript here (either copy the contents or download the file): https://github.com/avinetworks/devops/blob/master/cert_mgmt/letsencrypt_mgmt_profile.py

Add the Let’s Encrypt ControlScript to the NSX Advanced Load Balancer

Navigate to Templates > Scripts > ControlScripts and click CREATE

Create a ControlScript for Let's Encrypt on the NSX Advanced Load Balancer

Supply the script name, eg ControlScript_LetsEncrypt_VS, and choose either ‘Enter Text’ or ‘Upload File’. Now we will choose the ‘Enter Text’ option and paste the contents of the python script on github.

Upload a text based ControlScript to NSX Advanced Load Balancer

Create a Certificate Management profile

Navigate to Templates > Security > Certificate Mangement and click CREATE

Create a Certificate Management profile

Enter the Name ‘CertMgmt_LetsEncrypt_VS’ and select the Control Script ‘ControlScript_LetsEncrypt_VS’

Configure the Certificate Management profile to use the Let's Encrypt ControlScript

Click ‘Enable Custom Parameters’ and add the following:

NameValueComment
useradmin
password<enter your NSX ALB controller password for the admin user>(toggle Sensitive)
tenantadminthis is important, otherwise the script won’t have clue on which tenant it should be applied

Add the Custom Parameter ‘tenant’ even if you only have one tenant, the default tenant (admin). I have struggled a lot with the script failing without having a clue why that was. Ultimately, after a long search and monitoring the log through tail, there was something in the logs that pointed me in this direction.

Create the Custom Parameters for the ControlScript on the Certificate Management profile

There is a possibility to add a fourth parameter ‘dryrun’, with value true or false. This will toggle the script to use the Let’s Encrypt staging server.

Create the Virtual Service

Navigate to Applications > Virtual Services > CREATE VIRTUAL SERVICE and click ‘Advanced Setup’

Create a Virtual Service through advanced setup on NSX Advanced Load Balancer

Create the VS with the SNI, in this example I will create ‘vpn.vconsultants.be’. Configure the settings page and leave the other tabs with the default settings.

  1. Supply the VS name (I use the fqdn/SNI just for manageability)
  2. Leave the checkbox ‘Virtual Hosting VS’ unchecked (default). (We will setup a standard VS.)
  3. Leave the checkbox ‘Auto Allocate’ checked (default). (It takes an IP from the Network pool.)
  4. Change the ‘Application Profile’ to ‘System-Secure-HTTP’.
  5. Supply a ‘Floating IPv4’. (I use a static one so that I’m able to setup NAT to this IP on my firewall.)
  6. Select a ‘Network for VIP Address Allocation’. (The SE will create the VIP in this network.)
  7. Select a ‘IPv4 Subnet’. (Only the ones created in the Network config for VIP and SE will be available.)
  8. Change the ‘Application Domain Name’ so that it matches the fqdn of the SNI. (This will fill automatically based on the VS Name.)
  9. Check SSL and verify that the port changes to 443
  10. Select the correct Pool
  11. Change the ‘SSL Profile’ to ‘System-Standard’
Configure the NSX Advanced Load Balancer Virtual Service

Note: Item 7 is a bit awkward. Hovering over the question mark for help, it states that it is only applicable if the VirtualService belongs to an OpenStack or AWS cloud. When you don’t set this option, you cannot go forward. This confuses me somewhat, as I only use a vSphere cloud.

Request a Let’s Encrypt certificate for the NSX ALB Virtual Service

Navigate to Templates > Security > SSL/TLS Certificates > CREATE and click Application Certificate

Initiate a Let's Encrypt certificate request from the NSX Advanced Load Balancer.

Fill in the details for the Certificate Request (CSR) with the SNI for the certificate you want to request. The script will run when the SAVE button is clicked.

  1. Supply the Certificate name (I use the fqdn/SNI just for manageability)
  2. Select ‘Type’ ‘CSR’.
  3. Supply the certificate ‘Common Name’. This is where you supply the actual name of the certificate you want to request, in this case vpn.vconsultants.be.
  4. Supply the certificate ‘Common Name’ as ‘Subject Alternative Name’.
  5. I started to use ‘EC’ as the certificate ‘Algorithm’ over ‘RSA’
  6. Select a ‘Key Size’. Be aware that when choosing ‘EC’ as ‘Algorithm’, ‘SECP384R1’ is the latest that Let’s Encrypt supports for now.
  7. Select ‘Certificate Management Profile’ ‘CertMgmt_LetsEncrypt_VS’.
  8. Check ‘Enable OCSP Stapling’, this will speed up the certificate validation process.
Configure the Let's Encrypt certificate request.

Now watch the magic.

The Let's Encrypt certificate request from the NSX Advanced Load Balancer succeeded.

Note: I added the Root and Intermediates certificates to the NSX ALB controller to validate the certificate. That is why the color of the circle is green.

Add the Let’s Encrypt certificate to the NSX ALB Virtual Service

A final step to do in this setup is to apply the certificate on the VS.

Apply the Let's Encrypt certificate on the Virtual Service

In the end, you will have an NSX Advanced Load Balancer (ALB) Virtual Service configured with a Let’s Encrypt certificate.

Next POST

In the next post I’ll show the customized script that enables Let’s Encrypt Certificate Management for Enhanced Virtual Hosting (EVH) where the certificate will be requested for a EVH child Virtual Service.

my helm apps won’t deploy because of pvc issues

Today I was playing around with vSphere with Tanzu. I want to consume vSphere with Tanzu and therefore I try to deploy an app from the bitnami repository. This should be pretty easy to do. Well I’m still in the learning phase so bear with me if this is something obvious …

These are the steps I’m doing

  • Install helm
  • Add bitnami repo
  • Install app from the bitnami repo
  • Deploy an app from the bitnami repo on a Tanzu Kubernetes Grid (TKG) cluster (deployed on vSphere with Tanzu)

So I tried to deploy redis to the TKG cluster. It needs a Persistent Volume (PV) so at deploy time a Persistent Volume Claim (PVC) would be issued and a PV should be assigned. When I saw it took a while to get my redis app deployed I looked at the namespace – Monitor – Events – Kubernetes and saw that there was an error: ‘no persistent volumes available for this claim and no storage class is set’.

Persistent Volume Claim failure

Ok that is that, but what does that mean? I had no clue, so I just googled and came to @anthonyspiteri his blog post https://anthonyspiteri.net/no-persistent-volumes-available-claim-storage-class/ which shows that you can get around this by either specifying the storage class at helm install time or patching the TKG cluster.

In my case the issue was that I did not specify the defaultClass when I created the TKG cluster. I used the following yaml file to create the TKG cluster. The highlighted lines were not in the yaml file when I created the TKG cluster and these specify what storage class should be used by default.

So I executed (the k8s-01.yaml file has the above content)

and received the following error:

error: unable to recognize "k8s-01.yaml": no matches for kind "TanzuKubernetesCluster" in version "run.tanzu.vmware.com/v1alpha1"

As I was still in the TKG cluster context I could not change the TKG cluster spec. So I need to change the context to the namespace ‘demo’ (where I deployed my TKG cluster)

I reapplied the yaml file, changed the context again to the TKG cluster and issued the command:

Now we see that there is a default storage class for this cluster:

And when I launch the deploy again:

I see that the deploy is succeeding. Woohoo

UPDATE: @anthonyspiteri has come to the same conclusion in later blog posts

VCSA 7 U1 available updates error

Today I deployed a new VCSA 7 U1 and as U2 has GA’d recently I wanted to update the environment first. So I headed to the VAMI interface > Available Updates page. Immediately there was an error:

Error in method invocation ({‘id’: ‘com.vmware.appliance.update.manifest_verification_failed’, ‘default_message’: ‘Manifest verification failed’, ‘args’: []}, ‘Verification Failure\n’, ”)

I found some blogs that showed to delete upgrade status file ‘software_update_state.conf’ at /etc/applmgmt/appliance. While I tested with renaming this file to .old this did not resolve the error.

The file was recreated but held the same info, which is in JSON format and has an the following content:

“UP_TO_DATE”, it clearly is not. So I found this KB article. This is also where I got the solution for my install. I compared the url I found in the KB article with the one that is included by default in the update settings page.

The one on the KB page is the following:

https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8d167796-34d5-4899-be0a-6daade4005a3/6.7.0.31000.latest/

The one that is included in the VAMI interface is the following:

https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8d167796-34d5-4899-be0a-6daade4005a3/7.0.1.00200.latest/

In my case when I alleviated the .latest from that url, updates are detected and I can proceed.

So as you can see in the screenshot below (well not entirely but you will need to take my word for it), I selected ‘Specified’ and supplied the following url:

https://vapp-updates.vmware.com/vai-catalog/valm/vmw/8d167796-34d5-4899-be0a-6daade4005a3/7.0.1.00200

Update settings link

As soon I clicked ‘SAVE’, the updates became available to install.

Available updates success

VCSA does not boot due to file system errors

Due to a power failure of the storage where the vCenter Server Appliance resides, the VCSA does not boot. Connecting to the console shows the following output:

Failed to check /dev/log_vg/log

When you see this screen, none of the services are started as the appliance does not fully start. This implies that there is no means of connecting to the H5 client nor the VAMI interface on port 5480.

Why does the VCSA not boot and where do I start troubleshooting?

There are two important things to mentioned on the screenshot above, this is where we start:

  • Failed to start File System Check on /dev/log_vg/log
  • journalctl -xb

First we take a look at ‘journalctl -xb’. To do this we need to supply the root password and launch the BASH:

launch BASH

Now that have shell access we can take a look at ‘journalctl -xb’:

Type G to go to the bottom of the log file:

journalctl -xb

Work upwards, the most relevant logs will be at the bottom. For the sake of this blog post, I have type -S. This will turn on/off word wrap, in this case, I turned on word wrap.

File System Check

Going up a little I find these entries:

journalctl showing more info about the failed volume

There is a problem with a certain inode and File System Check (fsck) should be run. Let’s see how we can do that. Is it as simple as running:

It seems like it. Running the above command finds some errors and suggests to repair. I confirmed.

Other volumes

Let’s check the other logical volumes (lvm). First we will run ‘lsblk’ to take a look at the drive layout:

With lsblk we take a look at the drive layout

Remark: When we take a look at the type, we see the disks, eg. sda, sdb, etc… The difference between sda and the rest is that sda is partitioned with standard partitions and on the rest the disks an LVM has been created.

I checked all other volumes and found none of them were having issues.

Reboot

To reboot while you are in maintenance boot:

After the reboot, I could connect to the H5 client and clear the relevant errors.

Remark

This blog post is very similar to this one here. Although they are very much alike, the issues in the older blog post were on a standard partition on a VCSA 6.5 whereas the issues described and addressed in this post are on a VCSA 7.0 LVM physical volume.

esxtop output is not displaying as it should

When you connect to your ESXi host and you launch esxtop. You look at the esxtop output and it is not displaying as it should. Instead, it is displaying like in the below screenshot:

esxtop displaying incorrect

Your esxtop output will be displayed correctly if you are using a terminal emulator that defaults to xterm as the TERM environment variable. Some terminal emulators will use another terminal emulator value by default, eg. xterm-256color. ESXi does not map xterm-256color to one of the values it knows, so it doesn’t know how to display the output.

There is a KB article that explains how to resolve:

Output of esxtop defaults to non-interactive CSV with unknown TermInfo (2001448)

The value of the environment variable TERM is used by the server to control how input is recognized by the system, and what capabilities exist for output.

Let us have a look first what the TERM variable is in my case:

I am receiving the following output:

echo TERM output

My terminal emulator tries to connect to the endpoint (ESXi) with xterm-256color. Now let’s take a look at what values this endpoint does support:

terminfo_values

So all of the above is possible to assign to TERM. The value my terminal emulator uses is not among the supported terminfo types. So the ESXi host cannot map to any of the known and thus does not know how to display the esxtop info correctly.

When we update the TERM environment variable to xterm and try to run esxtop again, the output will show nicely formatted.

Let’s check esxtop again to make sure the outcome is as expected:

esxtop displaying correct

NSX-T password expiration alarms in the Home Lab

The challenge

I have a couple of NSX-T environments in my home lab. I logged on to one of them and saw a couple of open NSX-T password expiration alarms.

Password expiration alarms

CAUTION

Password expiration should be part of your password policy strategy. Disabling the password expiration on a production system is not a good strategy.

The solution

With my sharp googling skills, I found this reference in the NSX-T 3.0 docs:

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.0/installation/GUID-89E9BD91-6FD4-481A-A76F-7A20DB5B916C.html

So I changed the admin password ‘password-expiration’, not even bothering to open the event details. I just assumed this is about the admin user.

Done.

Not true. Some time later that day I found that the alarms were still open. I figured that this is some sort of timing issue, that the alarms were not automatically cleared yet. So I set them to resolved manually. Almost the same minute the alarms are triggered again, so no timing issue. If I only would have counted the alarms the first time it would have showed me that there more alarms than NSX-T components where I cleared the password expiration for the admin user.

Password expiration, read the details

It was only when I read the alarm in detail that I noticed the alarm is not the same one I saw before. This alarm was not triggered about the password expiration of the admin user but showed that it was for the audit user. The alarms are very much the same only the username is different, so easily overlooked.

So doing the math. Initially I had 8 open alarms, of which 3 were put to resolved automatically after changing the password expiration of the admin user. One on the NSX-T Manager and one on each of the 2 edge nodes. Which left 5 open alarms to take care of. Checking all the alarms gave me the following actions:

  • clear alarm for the root user on NSX-T Manager
  • clear alarms for the root user and the audit user on the NSX-T Edge 1 and 2

CAUTION

Password expiration should be part of your password policy strategy. Disabling the password expiration on a production system is not a good strategy.