Thursday 31 August 2023

VMware Cloud on AWS and native AWS route tables

During my time working with customers who use VMware Cloud on AWS and wish to integrate with native AWS services, I frequently see issues with network connectivity when multiple route tables are in use. Some customers have an automated way of deploying VPCs that use multiple route tables, perhaps for public and private subnet architectures. A prerequisite of deploying a VMware Cloud on AWS SDDC is to connect it to a customer-owned VPC, sometimes referred to as the connected VPC:

During deployment time we will read the connected VPC default CIDR range and add a static route into the NSX T0 router to send all traffic destined to that CIDR across the Elastic Network Interface (ENI) which connects the SDDC to the Connected VPC:

It's worth noting at this point that if you add additional CDIRs to the connected VPC these will NOT be accessible across the ENI as there will be NO route to those CIDRs in the T0 route table. We only learn the default CIDR at deployment time. 

The issue that I see with customers is sometimes after a host failure or routine SDDC maintenance there are connectivity issues to/from certain workloads. The first question I always ask is if the customer has multiple route tables in the connected VPC and the majority of times the response is yes.

You can have multiple route tables associated with a VPC but only one can be the main route table:

When you create a new segment in the SDDC we update the main route table in the connected VPC with the next hop ENI adapter. This way any traffic destined to that CIDR will always be directed to the SDDC host that is running the active NSX edge VM which effectively is the T0 router:

We don't update any additional routes tables:

Traffic from a VM to a native EC2 instance will travel through the NSX T1 and T0 and then across the ENI into the connected VPC route table, subnet and then to the EC2 instance. The return traffic follows the same path:

What I have seen with customers who have VPC's with multiple route tables is that they typically manually add static routes into the additional route tables pointing to the same ENI as the CIDRs in the main route table:

Now this will work and because the active NSX edge is on the host with target ENI so traffic (If permitted) will flow between VMs in the SDDC and native AWS services in the connected VPC.

Problems start occurring after SDDC maintenance or sometimes if the customer experiences a host failure. During SDDC maintenance the NSX Active Edge VM is migrated to a new host with a different ENI whilst the original host is patched. If the host running the NSX Active Edge VM were to fail and be removed then the NSX Active Edge VM will either be migrated or powered on on a different host depending on the type of failure. In both scenarios the NSX Active Edge VM is now on a different host which has a different ENI. Our backend service will pick up on this change and update the main route table to point to the new ENI of the host the NSX Active Edge VM is now running on. We will NOT update additional route tables where the routes were manually added. 

To solve this problem we introduced Shared Prefix Lists in SDDC v1.20 which allows you to create a VMware managed prefix list in the customer AWS account:

This allows customers to use the prefix list in a route table instead of static entries. VMware learns which route tables are using the prefix list and anytime the active NSX edge VM moves to a different host the target ENI in all resources is updated. To enable managed prefix list mode simply log into the NSX Manager via the Cloud Service Portal or directly via the private IP address and browse to Connected VPC and then toggle the AWS Managed Prefix List Mode:

Click Enable:

Once the process starts you will need to log into the AWS account associated with your SDDC and access Resource Access Manager. Inside there you will see a Resource Share request:

Once you click on the request Accept the resource share:

The process will continue and might take around 5 to 10 minutes to fully complete. You can keep checking in the NSX Manager UI for it to finish.

Once finished you will see additional information about the prefix list including which route tables are currently using it. By default, this will just be the main route table:

If we check the main route table we will see the prefix list being used instead of the individual CIDRs with the target being the active ENI of the host running the active NSX edge VM:

We can also look at the prefix list and see all the CIDRs that are currently advertised from the SDDC. Anytime a new segment is added to the SDDC the prefix list will be updated with the new CIDR:

Now we can use the prefix list in the additional route table instead of the manual entries. Go to the additional route table within the AWS console and edit the routes. You should now be able to select the prefix list and point it to the ENI of the host running the active NSX edge VM:

If you go back to NSX and refresh the Connected VPC page you will see that now we have two route tables programmed with the prefix list. We now know that we need to update both route table entries with the new host ENI of the active NSX edge VM in the event of an SDDC upgrade or host failure. 

No comments:

Post a Comment