Harold Preyers, Author at vConsultants blog

Deleting the datastore where a content library is hosted is probably not the best idea

by Harold Preyers | Apr 14, 2022 | Content Library, home lab, vCenter, VMware

Deleting the datastore where a content library is hosted is probably not the best idea but … yes stupid error and now what. If you are not faint of heart (and now how to take a snapshot), you can rectify this. You should contact GSS as there is not documented solution and this might break.

Take a snapshot and verify if the vCenter backups are in a healthy status. Yes? Ok go ahead.

Log on to the vCenter and create a new Content Library and name it ‘i-made-an-error’. Use the new datastore you want to use and keep the rest of the settings default as these don’t really matter.

Open an SSH session to the vCenter and connect to the Postgress DB ‘VCDB’

psql -d VCDB -U postgres

1	psql -d VCDB -U postgres

To show which tables are present within the database:

\d

Show an overview of the Content Libraries added ( make sure to add the trailing ;):

SELECT id,name FROM cl_library;

1	SELECT id,name FROM cl_library;

Show the Content Library entries in the vCenter database

Now that we have an overview of the Content Libraries, with the one that is throwing an error highlighted.

In the following overview we find the library id from the new Content Library we just added and also the corresponding storage id.

SELECT * FROM cl_library_storage;

1	SELECT * FROM cl_library_storage;

I will update the storage id from the faulty one we found on the previous screenshot with the one we found for the new Content Library.

UPDATE cl_library_storage
SET storage_id = 'ae80d942-d7e2-4ab8-b0c1-93405ff4db38'
WHERE library_id = '6c7851ed-6166-44b9-b439-5bcd2f9742eb';

UPDATE cl_library_storage

SET storage_id = 'ae80d942-d7e2-4ab8-b0c1-93405ff4db38'

WHERE library_id = '6c7851ed-6166-44b9-b439-5bcd2f9742eb';

Update the storage id for the faulty Content Library

There are a couple of places that helped me in solving this:

https://communities.vmware.com/t5/VMware-vCenter-Discussions/Content-library-item-delete-issue/td-p/2266050

https://tinkertry.com/how-to-remove-vmware-vsphere-zombie-datastore

https://vmninja.wordpress.com/2019/04/05/remove-inaccessible-datastore-from-inventory/

Enable Workload Management does not finish

by Harold Preyers | Mar 14, 2022 | AVI, NSX ALB, VMware, vSphereWithTanzu

Some time ago we were having issues in the Tanzu PoC class for partners we were teaching. One of the students had an environment where the Enable Workload Management process was unable to finish the creation of the Supervisor Cluster.

It was an interesting issue because when we verified all the settings we saw everything configured correctly on a UI level. Nevertheless when went to the virtualservice we saw that it was down because of the servers in the pool were not up.

When the Enable Workload Management is unable to finish, there are some usual suspects. Most of the time the details within the Enable Workload Management wizard are just not correct. Validation on the values supplied could be better I believe. You only know when it takes to long, that you need to start verifying the components. The following milestones can be checked.

Are the Supervisor Control Plane VMs created?
Do the Supervisor Control Plane VMs have the correct amount of IPs
Are the NSX ALB Service Engine VMs created?

During the troubleshooting, we verified these usual suspects. We also verified all values supplied in the different consoles, being the Workload Management configuration page in the vSphere client but also on the NSX ALB. It seemed that this student had done everything correct. We started to exclude issues with pinging, executing curl to the relevant ip’s and checking the logs.

At a moment we arrived at the Service Engines and went from there. At lunch time I stumbled onto this blog post from Nick Schmidt (a fellow vExpert), which made a jump in to the troubleshooting phase:

https://dev.to/ngschmidt/troubleshooting-with-vmware-nsx-alb-avi-vantage-23pc

This showed how to connect to the networking namespace on the Service Engine and this helped a lot.

If you do not connect to the networking namespace, you will see the configuration on an OS level. Within the networking namespace you troubleshoot within the correct context.

Although the web UI shows the correct values for the configured routes, they were not applied correct on the NSX ALB SE.

Here are the steps that I executed when connected to one of the NSX ALB Service Engines:

ifconfig --> shows the network configuration of the NSX ALB SE

1	ifconfig --> shows the network configuration of the NSX ALB SE

ip route --> shows the routes, only the management route was shown

1	ip route --> shows the routes, only the management route was shown

ip netns show --> shows the network namespaces, only one was shown in this environment, namely avi_ns1, there was also only one tenant

1	ip netns show --> shows the network namespaces, only one was shown in this environment, namely avi_ns1, there was also only one tenant

ip netns exec avi_ns1 bash --> launches a shell within the avi_ns1 namespace

1	ip netns exec avi_ns1 bash --> launches a shell within the avi_ns1 namespace

ip route --> shows the routes from the avi_ns1 namespace

1	ip route --> shows the routes from the avi_ns1 namespace

Now we saw that there was a route missing within this namespace. We went back to the web UI deleted the route and re-created, et voila the servers in the pool came up and therefor the virtualservice was alive.

vLCM fails to upgrade a firmware component

by Harold Preyers | Dec 9, 2021 | VMware

I recently experienced an issue within a HPE environment where vSphere Lifecycle Management (vLCM) fails to upgrade the firmware on a HP FlexFabric 534FLR-SFP+ Adapter.

On HPE Gen10 servers it is possible to leverage vSphere LifeCycle Management to manage not only the ESXi version but also the firmware and drivers of the different hardware components. vLCM leverages a vendor tool, in HPE’s case it is either HP OneView or HP Amplifier, to do the lift and shift for the firmware.

Apparently it fails when there are multiple adapters present in the system which have a firmware v7.15.97 or prior. The upgrade would succeed on one adapter but not on the subsequent adapter(s), see here. The KB is specifically mentioning HP OneView but as I experienced it is also affecting HP Amplifier, which makes sense.

The following screenshot shows two hosts out of compliance with the image, because of that specific firmware. Other hosts in that cluster upgraded the firmware on the adapter just fine. It really is due to the version to upgrade from.

vLCM Cluster Image settings and Compliance

Resolution

The article is providing a link to a firmware upgrade utility, which is for ESXi 6.0 / 6.5. You can download the 7.0 version here.

Now that we downloaded the firmware update utility, put the host into Maintenance Mode and copy it onto the ESXi host. Putting it in the /tmp directory gives the (dis)advantage that is tis removed when the machine is rebooted.

scp CP049023.zip root@host_fqdn:/tmp

1	scp CP049023.zip root@host_fqdn:/tmp

SSH to the host and install the firmware update utility (Smart Component):

cd /tmp
mkdir CP049023
unzip CP049023.zip -d ./CP049023
esxcli software component apply -d /tmp/CP049023/CP049023_VMw.zip

cd /tmp

mkdir CP049023

unzip CP049023.zip -d ./CP049023

esxcli software component apply -d /tmp/CP049023/CP049023_VMw.zip

This should be the output:

Installation Result
   Components Installed: Smart-Component-CP049023_1.28.50.6-7.0.0.15843807
   Components Removed:
   Components Skipped:
   Message: Operation finished successfully.
   Reboot Required: false

Installation Result

Components Installed: Smart-Component-CP049023_1.28.50.6-7.0.0.15843807

Components Removed:

Components Skipped:

Message: Operation finished successfully.

Reboot Required: false

Now go to the directory where the firmware update utility is installed and run it:

cd /opt/Smart_Component/CP049023
./Execute_Component

1 2	cd /opt/Smart_Component/CP049023 ./Execute_Component

This should be the output:

Command [ ./Execute_Component ]
Number of parameters passed in [ 0 ]
The parameters are [  ]
OS Version found  [7.0.2]
Process [7.0.2] with path [./ESXi_7.0]
Set Flash Engine files for path [./ESXi_7.0]
... leaving ./determine_which_OS.sh in /opt/Smart_Component/CP049023 ...
execute hpsetup with parameters [  ]

===============================================================
HPE QLogic NX2 Online Firmware Upgrade Utility for VMware
Version: 1.28.50

Performing Discovery operation......Please be patient..

Selecting HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter MAC: 1458D041DB10
Update MBI 7.10.72 to 7.18.80 y/n/q (y):y

Firmware update in progress......It will take a while....Please be patient..

Please reboot for the firmware flash to complete.

... END [ ./Execute_Component - Return value is 1 ] ...

Command [ ./Execute_Component ]

Number of parameters passed in [ 0 ]

The parameters are [ ]

OS Version found [7.0.2]

Process [7.0.2] with path [./ESXi_7.0]

Set Flash Engine files for path [./ESXi_7.0]

... leaving ./determine_which_OS.sh in /opt/Smart_Component/CP049023 ...

execute hpsetup with parameters [ ]

===============================================================

HPE QLogic NX2 Online Firmware Upgrade Utility for VMware

Version: 1.28.50

Performing Discovery operation......Please be patient..

Selecting HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter MAC: 1458D041DB10

Update MBI 7.10.72 to 7.18.80 y/n/q (y):y

Firmware update in progress......It will take a while....Please be patient..

Please reboot for the firmware flash to complete.

... END [ ./Execute_Component - Return value is 1 ] ...

If the Return value is 1, that is a good sign. I had to rerun it some times because of return value 0. I also had a return value of 106, which didn’t change after several runs. I rebooted that host, ran it again and then it went ok.

As a final step clean up the actions, so remove the firmware update utility:

esxcli software component remove -n Smart-Component-CP049023
Removal Result
   Components Installed:
   Components Removed: Smart-Component-CP049023_1.28.50.6-7.0.0.15843807
   Components Skipped:
   Message: Operation finished successfully.
   Reboot Required: false

esxcli software component remove -n Smart-Component-CP049023

Removal Result

Components Installed:

Components Removed: Smart-Component-CP049023_1.28.50.6-7.0.0.15843807

Components Skipped:

Message: Operation finished successfully.

Reboot Required: false

Reboot. When the host is back, Check Compliance again and you should be good to go.

How to request Let’s Encrypt certificates on the NSX Advanced Load Balancer

by Harold Preyers | Jun 10, 2021 | AVI, Let's Encrypt, NSX ALB, VMware

This is post 1 of 1 in the series “NSX ALB from Zero to Hero”

Table Of Contents

INTRO
Prequisites
What does the ControlScript do?
Download the ControlScript
Add the Let's Encrypt ControlScript to the NSX Advanced Load Balancer
Create a Certificate Management profile
Create the Virtual Service
Request a Let's Encrypt certificate for the NSX ALB Virtual Service
Add the Let's Encrypt certificate to the NSX ALB Virtual Service
Next POST

INTRO

Lately, I have been doing quite some work on VMware vSphere with Tanzu. A prerequisite to configure vSphere with Tanzu is a load balancer of some sort. Currently the following are supported, HAProxy, the NSX-T integrated load balancer or the NSX Advanced Load Balancer (ALB). (Support for the NSX ALB was added with the release of vSphere 7 U1.)

The endgoal of the setup is to host several websites in combination with a Horizon environment on a single IP. Because not all systems can handle Let’s Encrypt requests, eg UAG, I want one system that handles the certificate request and does the SSL offloading for the endpoints. So I was looking for a load balancer solution with Let’s Encrypt ability, The NSX Advanced Load Balancer (ALB) adds the ability to request Let’s Encrypt certificates through ControlScripts.

I already learned a lot on the NSX ALB and having some experience with other brands of load balancers certainly helped me to get up to speed quickly.

The goal of this post is to set up a standard Virtual Service (VS) and request a Let’s Encrypt certificate for that VS. You will see that it is quite easy.

Prequisites

I will not configure some necessary configuration settings. They are, however, required to successfully execute the steps below. I will assume the following prerequisites are in place.

The following post shows how to deploy the NSX Advanced Load Balancer and how to configure a ‘VMware vCenter/vSphere ESX’ cloud.

https://www.virtualizationhowto.com/2021/06/avi-load-balancer-vmware-standalone-install/

The NSX ALB registered with a cloud. I use a ‘VMware vCenter/vSphere ESX’ cloud
A public DNS entry for the Virtual Service. (Let’s Encrypt needs to be able to check your Virtual Service)
Some way to get to the virtual service from the internet. I have setup a NAT rule on my firewall for this.
Server Pool. (Needed to create the Virtual Service. It is obvious that the Virtual Service needs some endpoint to send the requests to.)
Network config for VIP and SE. (Once you configure a ‘VMware vCenter/vSphere ESX’ cloud, you’ll have access to the networks known to vCenter. You will need to configure ‘Subnets’ and ‘IP Address Pools’ for the NSX ALB to use for the VSs.)
IPAM/DNS Profile. (You need to add the Domain Names for the Virtual Services here.)

I will cover these in a later post but for now I added them as a prerequisite.

What does the ControlScript do?

The ControlScript generates a challenge token for the Let’s Encrypt servers to check the service. Secondly, it searches for a Virtual Service with an fqdn with the Common Name supplied on the certificate request. Once it finds that Virtual Service, it checks if it is listening on port 80. If not, it configures the Virtual Service to handle the request on port 80. Then it adds the challenge token to the Virtual Service. Finally, after a succesful certificate request the changes are cleared.

Download the ControlScript

Download the controlscript here (either copy the contents or download the file): https://github.com/avinetworks/devops/blob/master/cert_mgmt/letsencrypt_mgmt_profile.py

Add the Let’s Encrypt ControlScript to the NSX Advanced Load Balancer

Navigate to Templates > Scripts > ControlScripts and click CREATE

Supply the script name, eg ControlScript_LetsEncrypt_VS, and choose either ‘Enter Text’ or ‘Upload File’. Now we will choose the ‘Enter Text’ option and paste the contents of the python script on github.

Create a Certificate Management profile

Navigate to Templates > Security > Certificate Mangement and click CREATE

Enter the Name ‘CertMgmt_LetsEncrypt_VS’ and select the Control Script ‘ControlScript_LetsEncrypt_VS’

Click ‘Enable Custom Parameters’ and add the following:

Name	Value	Comment
user	admin
password	<enter your NSX ALB controller password for the admin user>	(toggle Sensitive)
tenant	admin	this is important, otherwise the script won’t have clue on which tenant it should be applied

Add the Custom Parameter ‘tenant’ even if you only have one tenant, the default tenant (admin). I have struggled a lot with the script failing without having a clue why that was. Ultimately, after a long search and monitoring the log through tail, there was something in the logs that pointed me in this direction.

There is a possibility to add a fourth parameter ‘dryrun’, with value true or false. This will toggle the script to use the Let’s Encrypt staging server.

Create the Virtual Service

Navigate to Applications > Virtual Services > CREATE VIRTUAL SERVICE and click ‘Advanced Setup’

Create the VS with the SNI, in this example I will create ‘vpn.vconsultants.be’. Configure the settings page and leave the other tabs with the default settings.

Supply the VS name (I use the fqdn/SNI just for manageability)
Leave the checkbox ‘Virtual Hosting VS’ unchecked (default). (We will setup a standard VS.)
Leave the checkbox ‘Auto Allocate’ checked (default). (It takes an IP from the Network pool.)
Change the ‘Application Profile’ to ‘System-Secure-HTTP’.
Supply a ‘Floating IPv4’. (I use a static one so that I’m able to setup NAT to this IP on my firewall.)
Select a ‘Network for VIP Address Allocation’. (The SE will create the VIP in this network.)
Select a ‘IPv4 Subnet’. (Only the ones created in the Network config for VIP and SE will be available.)
Change the ‘Application Domain Name’ so that it matches the fqdn of the SNI. (This will fill automatically based on the VS Name.)
Check SSL and verify that the port changes to 443
Select the correct Pool
Change the ‘SSL Profile’ to ‘System-Standard’

Note: Item 7 is a bit awkward. Hovering over the question mark for help, it states that it is only applicable if the VirtualService belongs to an OpenStack or AWS cloud. When you don’t set this option, you cannot go forward. This confuses me somewhat, as I only use a vSphere cloud.

Request a Let’s Encrypt certificate for the NSX ALB Virtual Service

Navigate to Templates > Security > SSL/TLS Certificates > CREATE and click Application Certificate

Fill in the details for the Certificate Request (CSR) with the SNI for the certificate you want to request. The script will run when the SAVE button is clicked.

Supply the Certificate name (I use the fqdn/SNI just for manageability)
Select ‘Type’ ‘CSR’.
Supply the certificate ‘Common Name’. This is where you supply the actual name of the certificate you want to request, in this case vpn.vconsultants.be.
Supply the certificate ‘Common Name’ as ‘Subject Alternative Name’.
I started to use ‘EC’ as the certificate ‘Algorithm’ over ‘RSA’
Select a ‘Key Size’. Be aware that when choosing ‘EC’ as ‘Algorithm’, ‘SECP384R1’ is the latest that Let’s Encrypt supports for now.
Select ‘Certificate Management Profile’ ‘CertMgmt_LetsEncrypt_VS’.
Check ‘Enable OCSP Stapling’, this will speed up the certificate validation process.

Now watch the magic.

Note: I added the Root and Intermediates certificates to the NSX ALB controller to validate the certificate. That is why the color of the circle is green.

Add the Let’s Encrypt certificate to the NSX ALB Virtual Service

A final step to do in this setup is to apply the certificate on the VS.

Apply the Let's Encrypt certificate on the Virtual Service

In the end, you will have an NSX Advanced Load Balancer (ALB) Virtual Service configured with a Let’s Encrypt certificate.

In the next post I’ll show the customized script that enables Let’s Encrypt Certificate Management for Enhanced Virtual Hosting (EVH) where the certificate will be requested for a EVH child Virtual Service.

my helm apps won’t deploy because of pvc issues

by Harold Preyers | Mar 22, 2021 | VMware, vSphereWithTanzu

Today I was playing around with vSphere with Tanzu. I want to consume vSphere with Tanzu and therefore I try to deploy an app from the bitnami repository. This should be pretty easy to do. Well I’m still in the learning phase so bear with me if this is something obvious …

These are the steps I’m doing

Install helm
Add bitnami repo
Install app from the bitnami repo
Deploy an app from the bitnami repo on a Tanzu Kubernetes Grid (TKG) cluster (deployed on vSphere with Tanzu)

So I tried to deploy redis to the TKG cluster. It needs a Persistent Volume (PV) so at deploy time a Persistent Volume Claim (PVC) would be issued and a PV should be assigned. When I saw it took a while to get my redis app deployed I looked at the namespace – Monitor – Events – Kubernetes and saw that there was an error: ‘no persistent volumes available for this claim and no storage class is set’.

Ok that is that, but what does that mean? I had no clue, so I just googled and came to @anthonyspiteri his blog post https://anthonyspiteri.net/no-persistent-volumes-available-claim-storage-class/ which shows that you can get around this by either specifying the storage class at helm install time or patching the TKG cluster.

In my case the issue was that I did not specify the defaultClass when I created the TKG cluster. I used the following yaml file to create the TKG cluster. The highlighted lines were not in the yaml file when I created the TKG cluster and these specify what storage class should be used by default.

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: k8s-01
  namespace: demo
spec:
  topology:
    controlPlane:
      count: 1
      class: guaranteed-small
      storageClass: storage-policy-tanzu
    workers:
      count: 3
      class: guaranteed-small
      storageClass: storage-policy-tanzu
  settings:
    storage:
      defaultClass: storage-policy-tanzu
  distribution:
    version: v1.18

apiVersion: run.tanzu.vmware.com/v1alpha1

kind: TanzuKubernetesCluster

metadata:

namespace: demo

spec:

topology:

controlPlane:

storageClass: storage-policy-tanzu

workers:

storageClass: storage-policy-tanzu

settings:

storage:

defaultClass: storage-policy-tanzu

distribution:

version: v1.18

So I executed (the k8s-01.yaml file has the above content)

kubectl apply -f k8s-01.yaml

1	kubectl apply -f k8s-01.yaml

and received the following error:

As I was still in the TKG cluster context I could not change the TKG cluster spec. So I need to change the context to the namespace ‘demo’ (where I deployed my TKG cluster)

kubectl config use-context demo

1	kubectl config use-context demo

I reapplied the yaml file, changed the context again to the TKG cluster and issued the command:

kubectl describe storageclass

1	kubectl describe storageclass

Now we see that there is a default storage class for this cluster:

And when I launch the deploy again:

kubectl run redis bitnami/redis

1	kubectl run redis bitnami/redis

I see that the deploy is succeeding. Woohoo

UPDATE: @anthonyspiteri has come to the same conclusion in later blog posts

« Older Entries

Next Entries »