Monday, 26 October 2020

Troubleshooting HCX Connectivity and performance issue into VMware Cloud on AWS

When working with customers on VMware Cloud on AWS POC's or Pilots a lot of the success criteria typically includes using Hybrid Cloud Extension (HCX) to migrate workloads from on-premises into VMware Cloud on AWS either using bulk migration or live vMotion. This sometimes involves troubleshooting connectivity and performance issue so I figured I would show the typical process that I follow to try and narrow down and identify the issue.

HCX Service Mesh Appliances Tunnel Status

The first check I always perform is to ensure the Service mesh tunnels are all UP for both the Interconnect Appliance as well as the Network Extension Appliance. This can be found within the HCX Plugin under interconnects -> Service Mesh -> Appliances:


Expand out the Interconnect and Network Extension Appliance (If used) and verify that the tunnel status is UP.

Diagnostics

The next test I usually perform is to run a Diagnostics. This usually takes about 3 minutes and will verify the required connectivity is in place between all of the components, including HCX Manager, Interconnect and Network Extension Appliances, vCenter, ESXi hosts etc.

Simply click on the Run Diagnostics button and once completed you can click on the here link to view the results:


From the results view you can see green tick icons around all the interfaces that have been tested. Please note that your view might be different depending on the number of interfaces per appliance you have:


Verify that everything is green and if needed, you can click into the green tick icon and verify what tests have actually been performed and their results:


HCX Central CLI (CCLI)

If I need to actually connect to the various appliances to perform performance or troubleshooting tests such as ping, traceroute, iperf3 etc then I used the HCX Central CLI or CCLI for short. In order to access the CCLI you need to SSH into HCX Manager and then run the CCLI command:


Once you are in CCLI you can type list to view the various appliances and then type go with the ID of the appliance you wish to connect to:



Once you have connected to an appliance if you press the TAB key you will get a list of all the commands that are available to you:


My typical go-to troubleshooting commands that I regularly use are:

HC - The HC command stands for Health Check and will perform a quick Health Check to ensure the appliance is not resourced constraint and all the services are working as expected:


SSH - To open a shell prompt on the appliance where I can perform ping and traceroute tests as well as use iPerf3 if I need to test performance between the various components on-premises. You can also view specific log files if needed:


Perftest All - This will command will run various performance tests which include uplink and tunnel connectivity and performance. The command can take around 12 minutes to complete and once finished I would typically copy/paste into a text document to review it properly:


Once it completes you will get a summary of the results which can be quickly used to ensure the appliance has suitable throughput into VMC:


You can also run the various performance tests separately if you want to focus on a specific test rather than waiting for them all to complete. Just type perftest and then press TAB to view all the options:


All the commands above can be run on both the Interconnect and Network Extension Appliances so depending on what issue you are facing you might want to run them on both appliances.

Please review the official HCX Troubleshooting pages for more information

No comments:

Post a comment