vSphereWithTanzu Archives - vConsultants blog

Enable Workload Management does not finish

by Harold Preyers | Mar 14, 2022 | AVI, NSX ALB, VMware, vSphereWithTanzu

Some time ago we were having issues in the Tanzu PoC class for partners we were teaching. One of the students had an environment where the Enable Workload Management process was unable to finish the creation of the Supervisor Cluster.

It was an interesting issue because when we verified all the settings we saw everything configured correctly on a UI level. Nevertheless when went to the virtualservice we saw that it was down because of the servers in the pool were not up.

When the Enable Workload Management is unable to finish, there are some usual suspects. Most of the time the details within the Enable Workload Management wizard are just not correct. Validation on the values supplied could be better I believe. You only know when it takes to long, that you need to start verifying the components. The following milestones can be checked.

Are the Supervisor Control Plane VMs created?
Do the Supervisor Control Plane VMs have the correct amount of IPs
Are the NSX ALB Service Engine VMs created?

During the troubleshooting, we verified these usual suspects. We also verified all values supplied in the different consoles, being the Workload Management configuration page in the vSphere client but also on the NSX ALB. It seemed that this student had done everything correct. We started to exclude issues with pinging, executing curl to the relevant ip’s and checking the logs.

At a moment we arrived at the Service Engines and went from there. At lunch time I stumbled onto this blog post from Nick Schmidt (a fellow vExpert), which made a jump in to the troubleshooting phase:

https://dev.to/ngschmidt/troubleshooting-with-vmware-nsx-alb-avi-vantage-23pc

This showed how to connect to the networking namespace on the Service Engine and this helped a lot.

If you do not connect to the networking namespace, you will see the configuration on an OS level. Within the networking namespace you troubleshoot within the correct context.

Although the web UI shows the correct values for the configured routes, they were not applied correct on the NSX ALB SE.

Here are the steps that I executed when connected to one of the NSX ALB Service Engines:

ifconfig --> shows the network configuration of the NSX ALB SE

1	ifconfig --> shows the network configuration of the NSX ALB SE

ip route --> shows the routes, only the management route was shown

1	ip route --> shows the routes, only the management route was shown

ip netns show --> shows the network namespaces, only one was shown in this environment, namely avi_ns1, there was also only one tenant

1	ip netns show --> shows the network namespaces, only one was shown in this environment, namely avi_ns1, there was also only one tenant

ip netns exec avi_ns1 bash --> launches a shell within the avi_ns1 namespace

1	ip netns exec avi_ns1 bash --> launches a shell within the avi_ns1 namespace

ip route --> shows the routes from the avi_ns1 namespace

1	ip route --> shows the routes from the avi_ns1 namespace

Now we saw that there was a route missing within this namespace. We went back to the web UI deleted the route and re-created, et voila the servers in the pool came up and therefor the virtualservice was alive.

my helm apps won’t deploy because of pvc issues

by Harold Preyers | Mar 22, 2021 | VMware, vSphereWithTanzu

Today I was playing around with vSphere with Tanzu. I want to consume vSphere with Tanzu and therefore I try to deploy an app from the bitnami repository. This should be pretty easy to do. Well I’m still in the learning phase so bear with me if this is something obvious …

These are the steps I’m doing

Install helm
Add bitnami repo
Install app from the bitnami repo
Deploy an app from the bitnami repo on a Tanzu Kubernetes Grid (TKG) cluster (deployed on vSphere with Tanzu)

So I tried to deploy redis to the TKG cluster. It needs a Persistent Volume (PV) so at deploy time a Persistent Volume Claim (PVC) would be issued and a PV should be assigned. When I saw it took a while to get my redis app deployed I looked at the namespace – Monitor – Events – Kubernetes and saw that there was an error: ‘no persistent volumes available for this claim and no storage class is set’.

Ok that is that, but what does that mean? I had no clue, so I just googled and came to @anthonyspiteri his blog post https://anthonyspiteri.net/no-persistent-volumes-available-claim-storage-class/ which shows that you can get around this by either specifying the storage class at helm install time or patching the TKG cluster.

In my case the issue was that I did not specify the defaultClass when I created the TKG cluster. I used the following yaml file to create the TKG cluster. The highlighted lines were not in the yaml file when I created the TKG cluster and these specify what storage class should be used by default.

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: k8s-01
  namespace: demo
spec:
  topology:
    controlPlane:
      count: 1
      class: guaranteed-small
      storageClass: storage-policy-tanzu
    workers:
      count: 3
      class: guaranteed-small
      storageClass: storage-policy-tanzu
  settings:
    storage:
      defaultClass: storage-policy-tanzu
  distribution:
    version: v1.18

apiVersion: run.tanzu.vmware.com/v1alpha1

kind: TanzuKubernetesCluster

metadata:

namespace: demo

spec:

topology:

controlPlane:

storageClass: storage-policy-tanzu

workers:

storageClass: storage-policy-tanzu

settings:

storage:

defaultClass: storage-policy-tanzu

distribution:

version: v1.18

So I executed (the k8s-01.yaml file has the above content)

kubectl apply -f k8s-01.yaml

1	kubectl apply -f k8s-01.yaml

and received the following error:

As I was still in the TKG cluster context I could not change the TKG cluster spec. So I need to change the context to the namespace ‘demo’ (where I deployed my TKG cluster)

kubectl config use-context demo

1	kubectl config use-context demo

I reapplied the yaml file, changed the context again to the TKG cluster and issued the command:

kubectl describe storageclass

1	kubectl describe storageclass

Now we see that there is a default storage class for this cluster:

And when I launch the deploy again:

kubectl run redis bitnami/redis

1	kubectl run redis bitnami/redis

I see that the deploy is succeeding. Woohoo

UPDATE: @anthonyspiteri has come to the same conclusion in later blog posts

Enable Workload Management does not finish

my helm apps won’t deploy because of pvc issues

TOP POSTS

SUBSCRIBE