Due to a power failure of the storage where the vCenter Server Appliance resides, the VCSA does not boot. Connecting to the console shows the following output:
When you see this screen, none of the services are started as the appliance does not fully start. This implies that there is no means of connecting to the H5 client nor the VAMI interface on port 5480.
Why does the VCSA not boot and where do I start troubleshooting?
There are two important things to mentioned on the screenshot above, this is where we start:
Failed to start File System Check on /dev/log_vg/log
First we take a look at ‘journalctl -xb’. To do this we need to supply the root password and launch the BASH:
Now that have shell access we can take a look at ‘journalctl -xb’:
Type G to go to the bottom of the log file:
Work upwards, the most relevant logs will be at the bottom. For the sake of this blog post, I have type -S. This will turn on/off word wrap, in this case, I turned on word wrap.
File System Check
Going up a little I find these entries:
There is a problem with a certain inode and File System Check (fsck) should be run. Let’s see how we can do that. Is it as simple as running:
It seems like it. Running the above command finds some errors and suggests to repair. I confirmed.
Let’s check the other logical volumes (lvm). First we will run ‘lsblk’ to take a look at the drive layout:
Remark: When we take a look at the type, we see the disks, eg. sda, sdb, etc… The difference between sda and the rest is that sda is partitioned with standard partitions and on the rest the disks an LVM has been created.
I checked all other volumes and found none of them were having issues.
To reboot while you are in maintenance boot:
After the reboot, I could connect to the H5 client and clear the relevant errors.
This blog post is very similar to this one here. Although they are very much alike, the issues in the older blog post were on a standard partition on a VCSA 6.5 whereas the issues described and addressed in this post are on a VCSA 7.0 LVM physical volume.
So I changed the admin password ‘password-expiration’, not even bothering to open the event details. I just assumed this is about the admin user.
clearuser admin password-expiration
Not true. Some time later that day I found that the alarms were still open. I figured that this is some sort of timing issue, that the alarms were not automatically cleared yet. So I set them to resolved manually. Almost the same minute the alarms are triggered again, so no timing issue. If I only would have counted the alarms the first time it would have showed me that there more alarms than NSX-T components where I cleared the password expiration for the admin user.
It was only when I read the alarm in detail that I noticed the alarm is not the same one I saw before. This alarm was not triggered about the password expiration of the admin user but showed that it was for the audit user. The alarms are very much the same only the username is different, so easily overlooked.
So doing the math. Initially I had 8 open alarms, of which 3 were put to resolved automatically after changing the password expiration of the admin user. One on the NSX-T Manager and one on each of the 2 edge nodes. Which left 5 open alarms to take care of. Checking all the alarms gave me the following actions:
clear alarm for the root user on NSX-T Manager
clear alarms for the root user and the audit user on the NSX-T Edge 1 and 2
Password expiration should be part of your password policy strategy. Disabling the password expiration on a production system is not a good strategy.
I have been working on a script to deploy environments on a regular basis in my homelab. While I have made great progress I have not been able to get this completed due to the lack of time. It did up my powershell script writing skills.
A while ago I followed a webinar about VMware Cloud Foundation Lab Constructor (VLC in short). This will deploy a VCF environment in a decent amount of time. With little effort I have been able to get this up and running multiple times. There are some pitfalls I ran into. My goal is to get to learn more about VCF, NSX-T and K8s all in a VMware Validated Design (VVD) setup.
You can get access too by completing the registration form at tiny.cc/getVLC.
The following files are included in the download:
VCF Lab Constructor Install Guide 39.pdf
As I already have a DNS infrastructure in place I used ‘Example_DNS_Entries.txt‘ as a reference to create all the necessary DNS entries.
Read the documentation pdf FIRST. It will give you a good insight in what will be set up, won’t be set up and how everything will be set up. I’m not planning to repeat info that is included in the documentation. The only thing that I have copied from this pdf is the disclaimer because I feel it is important:
Below I have included the various configuration files and split them to show the different parts and also show where I deferred from the default. There are the configuration files that the VLC script will use:
default-vcf-ems.json → changed all ip addresses, gateways, hostnames, networks and licenses
default_mgmt_hosthw.json → changed the amount of CPUs (8 → 12), the amount of RAM (32 and 64 → 80) and the disk sizes(50,150 and 175 → 150)
add_3_hosts.json → changed the hostname, management IP and IP gateway
To deploy VCF and be able to deploy NSX-T you will need a good amount of resources. The mimimum of host resources to be able to deploy NSX-T is 12vCPUs (There is a workaround to lower the vCPU requirements for NSX-T) and 80GB of RAM due to the NSX-T requirements.
The configuration files
The first file is the ‘default_mgmt_hosthw.json’. This file describes the specs for the (virtual) hardware for the management domain hosts:
default management host hardware json
The second file is the ‘default-vcf-ems.json’. This file describes the configuration for all software components for the management domain:
default VCF EMS JSON
The last configuration file is ‘add_3_hosts.json’. This configuration file is optional and can be used to prepare three extra hosts for the first workload domain:
Add 3 host JSON
Where did I change the defaults
There are some settings that I changed from the defaults aside from changing the names and network settings:
in the ‘default_mgmt_hosthw.json’ I have changed the CPU to 12 to be able to deploy NSX-T
in the ‘default_mgmt_hosthw.json’ I have changed the RAM 80 to be able to deploy NSX-T
How do we start
If you are meeting the prerequirements it is fairly simple. Fire up the ‘VLCGui.ps1’. This will present the following gui which will give the ability to supply all the necessary information and to connect to your physical environment. It speaks for itself, just make sure the Cluster, Network Name and Datastore field are higlighted blue like the following.
I hope to expand this inital post with a couple of follow-up posts. These are the topics that I’m currently thinking about:
Whilst upgrading the home lab I also decided to rebuild from scratch. There were some challenges to overcome because I have running VMs I don’t want to shut while migrating.
My current home lab setup and the go to setup is documented here (work in progress). Basically it comes down to:
Original setup: three hosts backed with iSCSI storage for running the VMs
New vCenter with two of the three hosts configured for vSAN with connection to the iSCSI datastores
Old vCenter with one remaining host running all of the VMs
Destination setup: new vCenter with vSAN datastore
To migrate the virtual machines from the old environment (from the last remaining host to the two new hosts) I decided to take a look at the ‘Cross vCenter vMotion Utility‘. There is not a lot of documentation available at first sight but it is straightforward to set up and configure. Although I did find some things that are worth noting.
Step 1 : Running the jar
To start the Cross vCenter vMotion Utility one must run a jar file: ‘java -jar xvm-2.6.jar’.
I am running linux (Pop!_OS 18.04) as my OS. I have java version 8 and 11 installed with version 11 as default. Version 11 is not listed on the fling site as supported (Java Runtime Environment 1.8-10: See requirements). Running with version 11 (sudo java -jar xvm-2.6.jar) starts the local website on port 8080 (http://localhost:8080) but does not report back on the CLI.
Under the assumption that the java application started and failed right away, I decided to run it on my windows box which has Java Runtime environment 8 installed. The last line of feedback ‘Initialized controller with empty state’ was the same as on my linux machine. Navigating to localhost:8080 showed the Cross vCenter vMotion Utility web interface. I could now configure the application and run migrations.
It is only later when I closed the running instance on my linux box and restarting it that it showed me output on the CLI that the application started successfully.
Output after restart:
Step 2 : Configuration
Step 3 : Migration
Source Site: source vCenter
Target Site: destination vCenter
Virtual Machine(s): Select one or more virtual machines
Placement Target: Cluster or Host
Network Mapping(s): the utility will detect the source networks for all selected virtual machines and display a selection field for the target network
Storage vMotion does not seem to be supported. I tried to svMotion my machines from their iSCSI based datastores to the newly created vSAN datastore but it failed.
Target Datastore: Shared datastore (same as source)
Choosing ‘Shared datastore (same as source)’ as Target Datastore fails and throws the following error:
I added the destination host and tried again but it also failed with several issues:
destination networks were not listed, only a subset were – although all were added to the distributed vSwitch
matching datastore was not found on the destination host
I could migrate to the new environment but had to select a destination datastore. This posed not much of a problem in my environment because the end goal was to get the virtual machine on the vSAN datastore.
After migrating most of the virtual machines, only two types of virtual machines were left, it felt like I could take a step back if needed. The following types were left to migrate, the vCenter VMs and the firewall VMs. The old vCenter is not needed anymore, the new vCenter and the firewall VMs are and once those are migrated I can go break down the last part of the old setup. The last host will be reset to default settings via the DCUI after which it can be added to the vSAN cluster and I can make the vSAN cluster setup complete. A tmp_vSAN_policy with no redundancy is not the way you (or me) want to run your environment, even if it is a lab environment.
I could not migrate from the old environment to the new environment while also doing a Storage vMotion, I needed to go in steps.
Nevertheless I’m happy to have used the Cross vCenter vMotion Utility. It did save me a lot of work, required little setup and configuration. I didn’t need to change anything to the setup of my old nor my new environment.