vShield Endpoint SVM status vCenter alarm

vCenter is showing an alarm on the TrendMicro Deep Security Virtual Appliance (DSVA): ‘vShield Endpoint SVM status’

vShield Endpoint SVM status alarm

Checking vShield for errors:

The DSVA VA console window shows: (as to where it should show a red/grey screen)

Let’s go for some log file analysis

To get a login prompt: Alt + F2

Login with user dsva and password dsva (this is the default)

The log file we are going to check is the messages log file at /var/log/messages

(why less is more: you get almost all the vi commands)

To go to the last line:

For some reason the ovf file is not like it is expected. The appliance is not able to set some ovf settings, in this case the network interfaces.

To exit the log file display mode:

To gain root privileges:

Enter the dsva user password

Navigate to the /var/opt/ds_agent/slowpath directory

Create the dsva-ovf.env file (if the file exists, delete the existing file first):

Reboot the appliance, once rebooted give it 5 minutes and the alarm should clear automatically:

Start or stop ESXi services using PowerCLI

Start the ssh service on all hosts:

Thanks to Alan Renouf at virtu-al.net, where I found this snippet: https://www.virtu-al.net/2010/11/23/enabling-esx-ssh-via-powercli/

If you want to start the ssh service on a single host, change ESXiHostName to your ESXi FQDN:

If you want to stop the ssh service on all hosts:

If you have multiple cluster in vCenter, are connected to multiple vCenters, be sure to launch the command only to the necessary hosts:

  • Get-Cluster -Name ClusterName will filter to the specified Cluster
  • Get-VMHost -Name ESXiHostName will filter to the specified ESXi
  • Get-VMHost -Server vCenterServerName will filter to the specified vCenter server

These are other services I frequently use:

  • DCUI (Direct Console UI)
  • lwsmd (Active Directory Service)
  • ntpd (NTP Daemon)
  • sfcbd-watchdog (CIM Server)
  • snmpd (SNMP Server)
  • TSM (ESXi Shell)
  • TSM-SSH (SSH)
  • vmsyslogd (Syslog Server)
  • vmware-fdm (vSphere High Availability Agent)
  • vpxa (VMware vCenter Agent)
  • xorg (X.Org Server)

There are other services available but I have never used them in this context (yet):

  • lbtd (Load-Based Teaming Daemon)
  • pcscd (PC/SC Smart Card Daemon)
  • vprobed (VProbe Daemon)

Change the startup policy for a service:

  • Automatic: Start automatically if any ports are open, and stop when all ports are closed
  • On: Start and stop with host
  • Off: Start and stop manually

From IOmeter to VMware I/O Analyzer fling

VMware I/O Analyzer is a tool to launch orchestrated tests against a storage solution available from the VMware flings website. It can be used as a single appliance where the worker process and the analytics is done within. Additional appliances can be deployed to act as Worker VMs. The Analyzer VM launches IOmeter tests (on the Worker VMs) and after test completion it collects the data. All configuration is done from a web interface on the Analyzer VM.

This post is describing how I deployed VMware I/O Analyzer and how I got to a test with maximized IOs. The first tests were conducted launching a IOmeter from within a virtual machine on the vSAN datastore and showed more or less 300 IOs being generated. In the end 18 Worker VMs with 8 disks each on a 6 host vSAN cluster were used generating 340K+ IOPS. The purpose was to create a baseline for a VSAN datastore maximum IOPs.

Hardware used

6 hosts
1 disk group
1 800GB SSD drive5 1,2 TB 10K SAS
vSphere 5.5 U3

General

The VM OS disks should not be put on the vSAN datastore you want to test, if not the generated IOPs will be part of your report. To keep the Analyser VM IOPS out of the performance graphs, put it on a different datastore.

Deploy one Analyser VM. Deploy a Worker VM per ESXi host. You should end up with as much Worker VMs as you have hosts in your cluster.

I changed the IP of all VMs to static as there was no DHCP server available in the subnet. This means that no DNS entries were required.

Preferably you will want to change the Analyser VM to a static IP as you will manage the solution from a web browser. The Worker VMs you can leave as is if there is DHCP server available. You will need dns entries and change the configuration used here.

To work easily set the Worker VMs on static IPs or create dns aliases as you will be doing a lot of work on the Worker VMs. I prefer static IPs because they add no complexity due to name resolving, etc…

Prerequisites

Download ova from: https://labs.vmware.com/flings/i-o-analyzer

Deploy

Deploying the Analyser VM:

Deploy ovf template. Choose your settings in regards to the recommendations above.

Delete the 100MB disk (second disk) from the virtual machine.

Start the Analyser VM via vSphere client and the open console

Login with root – vmware

A terminal window will be opened upon login

To configure static IP:

Change /etc/sysconfig/network/ifcfg-eth0 with your preferred text editor.

Assuming the subnet you’re deploying the vm is 192.168.1.0/24

Change the following lines highlighted to your needs:

Leave the other lines as is.

Save and close the file (:wq)

Now we will configure the default gateway

Assuming your default gateway is 192.168.1.1

Add / Change the following line:

Save and close the file (:wq)

Restart the network service:

Check if the VM is reachable.

Now shutdown the VM.

Deploying the Worker VM:

Clone the Analyser VM.

Add a Hard Disk of 1GB.

Choose advanced and put the 1GB disk on the VSAN datastore.

I needed to configure static IPs on the Worker VMs, so I had to start each VM and change the IP address. After changing the network settings, shut down the VM and create a new clone. Not changing the IPs will give duplicate IPs.

Ease of access configuration

Two ease of access configurations were applied. The first is configured for easy copying from the Analyzer VM to the Worker VMs. The second because all appliances need to be logged onto for the VMware IO Analyzer solution to work. All commands are executed on the Analyzer VM and then copied to the Worker VMs.

Setup ssh keyless authentication

Generate a key pair

ssh-copy-id will copy your public key to the target machine

The root account password of the destination will need to be supplied for each of the above lines.

BE AWARE: This has the following security downside. If the root account is compromised on the Analyzer vm all worker vms should be considered compromised too.

Autologon

Change autologon=”” to autologon=”root” in the displaymanager (/etc/sysconfig/displaymanager) file with the following command:

This will force the machine to login with root after boot.

Copy the file to all workers:

Affinity rules

TIP: Create affinity rules in vCenter to keep the Worker VMs on dedicated hosts, otherwise the configuration on the VMware I/O Analyzer dashboard will be outdated soon. The consequence is that certain Worker VMs will not be launching their IOmeter profiles and therefor the reports will not be correct.

Configuration

Prerequisites

Enable the SSH service on the ESXi hosts via the vSphere (Web) Client or through Powershell.

The powershell way: (be aware to filter your hosts if needed). There is a dedicated post about starting and stopping ESXi services through powershell here.

Dashboard

Add the hosts to the host list.

Search for the Worker VMs in the list and add preferred IO test.

There are a lot of standard tests included in the appliance. The one that should be generating the most IOPs is 4k, 100% read and 0% random.

Optimized setup

To reach an optimized setup, three Worker VMs per host were deployed and 7 additional disks were added.

Adding the extra disks via PowerCLI:

The following specification was created on the Analyzer VM…

… and copied over to the Worker VMs

Troubleshooting

I found that looking at the console of the Worker VMs is interesting for troubleshooting. You can see the IOmeter tests being launched. This was very usefull in the process of creating the IOmeter profile. You don’t need to wait untill the test is finished to see it has failed. Stopping IOmeter tests from the console gives the opportunity to look at, edit and save the launched profile.

VMware SRM 5.8.1 Embedded Database refuses to uninstall

VMware SRM 5.8.1 Embedded Database refuses to uninstall. Clicking uninstall in ‘Control Panel – Programs and Features’ showed a progress bar going forward and then rolling back. Afterwards ‘Control Panel – Programs and Features’ showed that the embedded PostgreSQL was still installed.

So I tried it through the command line with the purpose of generating a log file:

The log file shows a 1603 error code in the end but msiexec error 1603 is a very generic failure error which does not give a direction to search for.

Microsoft msiexec error codes:

ERROR_INSTALL_FAILURE 1603 A fatal error occurred during installation.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa376931(v=vs.85).aspx

Going up in the log file, somewhere halfway there is a remark that the “C:\ProgramData\VMware\VMware vCenter Site Recovery Manager Embedded Database\data\postgresql.conf” file cannot be found.

I created the postgresql.conf file in the “C:\ProgramData\VMware\VMware vCenter Site Recovery Manager Embedded Database\data\” and tried the msiexec uninstall again. Now the uninstall succeeded.

UCS Manager errors due to Firmware Packages removal

UCS Manager is showing 309 errors because a Firmware Packages have been deleted but the references in the Host Firmware Packages (HFP) still exist.

ucs_errors

Appears that all of the errors show a cause of ‘image-deleted’. In the ‘Affected object’ the path where the error is originating is shown. In the first error it shows ‘org-root/fw-host-pack-HFP-2.2.7/pack-image-Cisco Systems|R200-1120402W|blade-controller’ The first portion ‘org-root/fw-host-pack-HFP-2.2.7’ is important because this is the path. The second part ‘pack-image-Cisco Systems|R200-1120402W|blade-controller’ is the component image which is missing.

A HFP resides in the ‘Servers’ tab. The referenced one can be found in ‘Servers – Policies – root – Host Firmware Packages – HFP-2.2.7’

ucs_faults_summary

Going to the referenced ‘Host Firmware Packages’ some of the components have a presence status ‘Missing’

hfp_227_detail

Below is a screenshot of the existing ‘Firmware Packages’. You can see that the ‘Firmware Package’ 2.2.6f exists for the ‘B Series’ and for the ‘Infrastructure’ but not for the ‘C Series’.

Important to notice is that ‘Rack Package’ 2.2.7b’ is not present for the ‘C Series’ as you can see in the next screenshot.

fp_overview

Going to the Host Firmware Package general page and looking in the assigned versions. You can see that ‘Rack Package’ 2.2(7b)C is assigned. In the above screenshot we saw that this package is not in the UCS Manager anymore.

hfp_227_selected

The rack package is empty. It was on ‘Rack Package’ 2.2(7b)C but because the Firmware Package was removed from UCS Manager this is showing blank.

hfp_227_modify_package_versions

Use ‘Show Policy Usage’ to look if the Host Firmware Package is used somewhere.

hfp_227_show_policy_usage

The Host Firmware Package is used in Service Profile Template ‘HP_FW_TEST_Cisco_Support_Case’

hfp_227_policy_usage_detail_in_use

Navigate to the Service Profile Template

spt_overview

Verify the Policy Usage

spt_hp_fw_test_cisco_support_case_show_policy_usage

It is not in use, so it is safe to delete the Service Profile Template

spt_hp_fw_test_cisco_support_case_policy_usage_detail

Delete the Service Profile Template

spt_delete

Going back to the Host Firmware Package and looking again at the policy usage

hfp_227_show_policy_usage

 

It is not in use anymore

hfp_227_policy_usage_detail_empty_detail

Delete the Host Firmware Package

hfp_227_delete

The previous actions made UCS Manager go down to 193 errors. The next ones are about the Host Firmware Package ‘default’. I don’t want to delete this Host Firmware Package ‘default’, so I will adapt this one so it doesn’t throw any errors anymore.

The following screenshot is not entirely correct as the package was already changed to the correct one (‘2.2(5d)C’ was ‘2.2(6f)C’ as you will see in a later screenshot) but I still wanted to show that I checked the Policy Usage first:

hfp_default_show_policy_usage

The Host Firmware Package ‘default’ is not used in a Service Profile Template, so it safe to change the assigned ‘Rack Package’.

hfp_default_policy_usage

Modify the Package version from the Rack package that was deleted

hfp_default_modify_package_versions

Set it to one that still exists

hfp_default_modify_package_versions_detail

Going back to the errors lead me to the next Host Firmware Package package, which was on a sub-organisation level. Looking at the components in the Host Firmware Package I see a presence of ‘Missing’ again

hfp_default_modify_package_versions

First I was going to modify the package version but I went to look if it was in use first.

hfp_dca_hyp_227d_show_policy_usage_in_use

It was in use by a Service Profile Template and went to see if the Service Profile Template was in use

spt_hp_update_ssd_to_fw_dm0t_show_policy_usage

It was not so I deleted the Service Profile Template and went back up the chain. The Host Firmware Package was not in use anymore. So I deleted the Host Firmware Package

hfp_dca_hyp_227d_show_policy_usage_empty

All references to Firmware Packages were corrected

As a result all faults are cleared:

ucs_errors_empty