NSX Host Transport Nodes upgrade fails

While going through the latest lab upgrade round, I found myself running into an error when upgrading NSX. The NSX Edge Transport Nodes (ETN) upgrade successfully, however, the NSX Host Transport Nodes (HTN) portion fails.

Not that the solutions is so special but it had me running around a bit, therefore I wanted to share.

The upgrade returns the following error:

A general system error occurred: Image is not valid. Component NSX LCP Bundle(NSX LCP Bundle(4.1.0.2.0-8.0.21761693)) has unmet dependency nsx-python-greenlet-esxio because providing component(s) NSX LCP Bundle(NSX LCP Bundle(4.1.0.2.0-8.0.21761693)) are obsoleted.

At the same time the same error is listed on vCenter:

NSX LCP upgrade error on vCenter
NSX LCP upgrade error on vCenter

When analysing the vLCM configuration, there was nothing that pointed to the fact that the NSX LCP Bundle was causing an issue.

Through a reddit post, I landed on the following Broadcom KB: NSX Host Cluster Upgrade on SDDC Manager fails with “Set desired state operation failed. Image is not valid”

The procedure instructs to download the JSON export of the vLCM configuration:

I removed the highlighted nsx-lcp-bundle line, saved and imported the JSON again to vLCM.

I retried the upgrade again on NSX and could progress now.

NSX-T password expiration alarms in the Home Lab

The challenge

I have a couple of NSX-T environments in my home lab. I logged on to one of them and saw a couple of open NSX-T password expiration alarms.

Password expiration alarms

CAUTION

Password expiration should be part of your password policy strategy. Disabling the password expiration on a production system is not a good strategy.

The solution

With my sharp googling skills, I found this reference in the NSX-T 3.0 docs:

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.0/installation/GUID-89E9BD91-6FD4-481A-A76F-7A20DB5B916C.html

So I changed the admin password ‘password-expiration’, not even bothering to open the event details. I just assumed this is about the admin user.

Done.

Not true. Some time later that day I found that the alarms were still open. I figured that this is some sort of timing issue, that the alarms were not automatically cleared yet. So I set them to resolved manually. Almost the same minute the alarms are triggered again, so no timing issue. If I only would have counted the alarms the first time it would have showed me that there more alarms than NSX-T components where I cleared the password expiration for the admin user.

Password expiration, read the details

It was only when I read the alarm in detail that I noticed the alarm is not the same one I saw before. This alarm was not triggered about the password expiration of the admin user but showed that it was for the audit user. The alarms are very much the same only the username is different, so easily overlooked.

So doing the math. Initially I had 8 open alarms, of which 3 were put to resolved automatically after changing the password expiration of the admin user. One on the NSX-T Manager and one on each of the 2 edge nodes. Which left 5 open alarms to take care of. Checking all the alarms gave me the following actions:

  • clear alarm for the root user on NSX-T Manager
  • clear alarms for the root user and the audit user on the NSX-T Edge 1 and 2

CAUTION

Password expiration should be part of your password policy strategy. Disabling the password expiration on a production system is not a good strategy.